Molitor, John
2012-03-01
Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
Bayesian posterior distributions without Markov chains.
Cole, Stephen R; Chu, Haitao; Greenland, Sander; Hamra, Ghassan; Richardson, David B
2012-03-01
Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976-1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984-1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
Using SAS PROC MCMC for Item Response Theory Models
Samonte, Kelli
2014-01-01
Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834
Using SAS PROC MCMC for Item Response Theory Models
ERIC Educational Resources Information Center
Ames, Allison J.; Samonte, Kelli
2015-01-01
Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Of bugs and birds: Markov Chain Monte Carlo for hierarchical modeling in wildlife research
Link, W.A.; Cam, E.; Nichols, J.D.; Cooch, E.G.
2002-01-01
Markov chain Monte Carlo (MCMC) is a statistical innovation that allows researchers to fit far more complex models to data than is feasible using conventional methods. Despite its widespread use in a variety of scientific fields, MCMC appears to be underutilized in wildlife applications. This may be due to a misconception that MCMC requires the adoption of a subjective Bayesian analysis, or perhaps simply to its lack of familiarity among wildlife researchers. We introduce the basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling), stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathematical sophistication. We illustrate the use of MCMC with an analysis of the association between latent factors governing individual heterogeneity in breeding and survival rates of kittiwakes (Rissa tridactyla). We conclude with a discussion of the importance of individual heterogeneity for understanding population dynamics and designing management plans.
Modeling and Bayesian parameter estimation for shape memory alloy bending actuators
NASA Astrophysics Data System (ADS)
Crews, John H.; Smith, Ralph C.
2012-04-01
In this paper, we employ a homogenized energy model (HEM) for shape memory alloy (SMA) bending actuators. Additionally, we utilize a Bayesian method for quantifying parameter uncertainty. The system consists of a SMA wire attached to a flexible beam. As the actuator is heated, the beam bends, providing endoscopic motion. The model parameters are fit to experimental data using an ordinary least-squares approach. The uncertainty in the fit model parameters is then quantified using Markov Chain Monte Carlo (MCMC) methods. The MCMC algorithm provides bounds on the parameters, which will ultimately be used in robust control algorithms. One purpose of the paper is to test the feasibility of the Random Walk Metropolis algorithm, the MCMC method used here.
NASA Astrophysics Data System (ADS)
Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.
2017-11-01
This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.
Markov Chain Monte Carlo: an introduction for epidemiologists
Hamra, Ghassan; MacLehose, Richard; Richardson, David
2013-01-01
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit. PMID:23569196
Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses
Lanfear, Robert; Hua, Xia; Warren, Dan L.
2016-01-01
Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794
NASA Astrophysics Data System (ADS)
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
Approximate Bayesian computation for spatial SEIR(S) epidemic models.
Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A
2018-02-01
Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao
Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less
Profile-Based LC-MS Data Alignment—A Bayesian Approach
Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.
2014-01-01
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872
ERIC Educational Resources Information Center
Kim, Jee-Seon; Bolt, Daniel M.
2007-01-01
The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…
Gradient-free MCMC methods for dynamic causal modelling
Sengupta, Biswa; Friston, Karl J.; Penny, Will D.
2015-03-14
Here, we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density -- albeit at almost 1000% increase in computational time, in comparisonmore » to the most efficient algorithm (i.e., the adaptive MCMC sampler).« less
Population forecasts for Bangladesh, using a Bayesian methodology.
Mahsin, Md; Hossain, Syed Shahadat
2012-12-01
Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.
Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models.
Chung, Yujin; Hey, Jody
2017-06-01
We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Fitting Residual Error Structures for Growth Models in SAS PROC MCMC
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In behavioral sciences broadly, estimating growth models with Bayesian methods is becoming increasingly common, especially to combat small samples common with longitudinal data. Although Mplus is becoming an increasingly common program for applied research employing Bayesian methods, the limited selection of prior distributions for the elements of…
ERIC Educational Resources Information Center
Kieftenbeld, Vincent; Natesan, Prathiba
2012-01-01
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…
Gradient-free MCMC methods for dynamic causal modelling.
Sengupta, Biswa; Friston, Karl J; Penny, Will D
2015-05-15
In this technical note we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density - albeit at almost 1000% increase in computational time, in comparison to the most efficient algorithm (i.e., the adaptive MCMC sampler). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Cheong, R. Y.; Gabda, D.
2017-09-01
Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.
Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence
Ogunsakin, Ropo Ebenezer; Siaka, Lougue
2017-01-01
Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396
Bayesian analysis of the flutter margin method in aeroelasticity
Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit
2016-08-27
A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less
Wang, Tingting; Chen, Yi-Ping Phoebe; Bowman, Phil J; Goddard, Michael E; Hayes, Ben J
2016-09-21
Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.
NASA Astrophysics Data System (ADS)
Sheng, Zheng
2013-02-01
The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor
2018-02-01
Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Incorporating approximation error in surrogate based Bayesian inversion
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.; Li, W.; Wu, L.
2015-12-01
There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.
Hydrologic Model Selection using Markov chain Monte Carlo methods
NASA Astrophysics Data System (ADS)
Marshall, L.; Sharma, A.; Nott, D.
2002-12-01
Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.
NASA Astrophysics Data System (ADS)
Fer, I.; Kelly, R.; Andrews, T.; Dietze, M.; Richardson, A. D.
2016-12-01
Our ability to forecast ecosystems is limited by how well we parameterize ecosystem models. Direct measurements for all model parameters are not always possible and inverse estimation of these parameters through Bayesian methods is computationally costly. A solution to computational challenges of Bayesian calibration is to approximate the posterior probability surface using a Gaussian Process that emulates the complex process-based model. Here we report the integration of this method within an ecoinformatics toolbox, Predictive Ecosystem Analyzer (PEcAn), and its application with two ecosystem models: SIPNET and ED2.1. SIPNET is a simple model, allowing application of MCMC methods both to the model itself and to its emulator. We used both approaches to assimilate flux (CO2 and latent heat), soil respiration, and soil carbon data from Bartlett Experimental Forest. This comparison showed that emulator is reliable in terms of convergence to the posterior distribution. A 10000-iteration MCMC analysis with SIPNET itself required more than two orders of magnitude greater computation time than an MCMC run of same length with its emulator. This difference would be greater for a more computationally demanding model. Validation of the emulator-calibrated SIPNET against both the assimilated data and out-of-sample data showed improved fit and reduced uncertainty around model predictions. We next applied the validated emulator method to the ED2, whose complexity precludes standard Bayesian data assimilation. We used the ED2 emulator to assimilate demographic data from a network of inventory plots. For validation of the calibrated ED2, we compared the model to results from Empirical Succession Mapping (ESM), a novel synthesis of successional patterns in Forest Inventory and Analysis data. Our results revealed that while the pre-assimilation ED2 formulation cannot capture the emergent demographic patterns from ESM analysis, constrained model parameters controlling demographic processes increased their agreement considerably.
NASA Astrophysics Data System (ADS)
Zhang, Guannan; Lu, Dan; Ye, Ming; Gunzburger, Max; Webster, Clayton
2013-10-01
Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.
Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris
2015-10-01
Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong; Kemna, Andreas; Hubbard, Susan S.
2008-05-15
We develop a Bayesian model to invert spectral induced polarization (SIP) data for Cole-Cole parameters using Markov chain Monte Carlo (MCMC) sampling methods. We compare the performance of the MCMC based stochastic method with an iterative Gauss-Newton based deterministic method for Cole-Cole parameter estimation through inversion of synthetic and laboratory SIP data. The Gauss-Newton based method can provide an optimal solution for given objective functions under constraints, but the obtained optimal solution generally depends on the choice of initial values and the estimated uncertainty information is often inaccurate or insufficient. In contrast, the MCMC based inversion method provides extensive globalmore » information on unknown parameters, such as the marginal probability distribution functions, from which we can obtain better estimates and tighter uncertainty bounds of the parameters than with the deterministic method. Additionally, the results obtained with the MCMC method are independent of the choice of initial values. Because the MCMC based method does not explicitly offer single optimal solution for given objective functions, the deterministic and stochastic methods can complement each other. For example, the stochastic method can first be used to obtain the means of the unknown parameters by starting from an arbitrary set of initial values and the deterministic method can then be initiated using the means as starting values to obtain the optimal estimates of the Cole-Cole parameters.« less
Geometric MCMC for infinite-dimensional inverse problems
NASA Astrophysics Data System (ADS)
Beskos, Alexandros; Girolami, Mark; Lan, Shiwei; Farrell, Patrick E.; Stuart, Andrew M.
2017-04-01
Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank-Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.
Bayesian linkage and segregation analysis: factoring the problem.
Matthysse, S
2000-01-01
Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian linkage and segregation analysis is one of integration in high-dimensional spaces. In this paper, three available techniques for Bayesian linkage and segregation analysis are discussed: Markov Chain Monte Carlo (MCMC), importance sampling, and exact calculation. The contribution of each to the overall integration will be explicitly discussed.
Bayesian Orbit Computation Tools for Objects on Geocentric Orbits
NASA Astrophysics Data System (ADS)
Virtanen, J.; Granvik, M.; Muinonen, K.; Oszkiewicz, D.
2013-08-01
We consider the space-debris orbital inversion problem via the concept of Bayesian inference. The methodology has been put forward for the orbital analysis of solar system small bodies in early 1990's [7] and results in a full solution of the statistical inverse problem given in terms of a posteriori probability density function (PDF) for the orbital parameters. We demonstrate the applicability of our statistical orbital analysis software to Earth orbiting objects, both using well-established Monte Carlo (MC) techniques (for a review, see e.g. [13] as well as recently developed Markov-chain MC (MCMC) techniques (e.g., [9]). In particular, we exploit the novel virtual observation MCMC method [8], which is based on the characterization of the phase-space volume of orbital solutions before the actual MCMC sampling. Our statistical methods and the resulting PDFs immediately enable probabilistic impact predictions to be carried out. Furthermore, this can be readily done also for very sparse data sets and data sets of poor quality - providing that some a priori information on the observational uncertainty is available. For asteroids, impact probabilities with the Earth from the discovery night onwards have been provided, e.g., by [11] and [10], the latter study includes the sampling of the observational-error standard deviation as a random variable.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.
2016-01-01
This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
NASA Astrophysics Data System (ADS)
Shafii, M.; Tolson, B.; Matott, L. S.
2012-04-01
Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
NASA Astrophysics Data System (ADS)
Wang, Hongrui; Wang, Cheng; Wang, Ying; Gao, Xiong; Yu, Chen
2017-06-01
This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLE confidence interval and thus more precise estimation by using the related information from regional gage stations. The Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.
Ma, Jianzhong; Amos, Christopher I; Warwick Daw, E
2007-09-01
Although extended pedigrees are often sampled through probands with extreme levels of a quantitative trait, Markov chain Monte Carlo (MCMC) methods for segregation and linkage analysis have not been able to perform ascertainment corrections. Further, the extent to which ascertainment of pedigrees leads to biases in the estimation of segregation and linkage parameters has not been previously studied for MCMC procedures. In this paper, we studied these issues with a Bayesian MCMC approach for joint segregation and linkage analysis, as implemented in the package Loki. We first simulated pedigrees ascertained through individuals with extreme values of a quantitative trait in spirit of the sequential sampling theory of Cannings and Thompson [Cannings and Thompson [1977] Clin. Genet. 12:208-212]. Using our simulated data, we detected no bias in estimates of the trait locus location. However, in addition to allele frequencies, when the ascertainment threshold was higher than or close to the true value of the highest genotypic mean, bias was also found in the estimation of this parameter. When there were multiple trait loci, this bias destroyed the additivity of the effects of the trait loci, and caused biases in the estimation all genotypic means when a purely additive model was used for analyzing the data. To account for pedigree ascertainment with sequential sampling, we developed a Bayesian ascertainment approach and implemented Metropolis-Hastings updates in the MCMC samplers used in Loki. Ascertainment correction greatly reduced biases in parameter estimates. Our method is designed for multiple, but a fixed number of trait loci. Copyright (c) 2007 Wiley-Liss, Inc.
Item Response Theory Equating Using Bayesian Informative Priors.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Patz, Richard J.
This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Comparing hierarchical models via the marginalized deviance information criterion.
Quintero, Adrian; Lesaffre, Emmanuel
2018-07-20
Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.
A Bayesian approach to modeling diffraction profiles and application to ferroelectric materials
Iamsasri, Thanakorn; Guerrier, Jonathon; Esteves, Giovanni; ...
2017-02-01
A new statistical approach for modeling diffraction profiles is introduced, using Bayesian inference and a Markov chain Monte Carlo (MCMC) algorithm. This method is demonstrated by modeling the degenerate reflections during application of an electric field to two different ferroelectric materials: thin-film lead zirconate titanate (PZT) of composition PbZr 0.3Ti 0.7O 3and a bulk commercial PZT polycrystalline ferroelectric. Here, the new method offers a unique uncertainty quantification of the model parameters that can be readily propagated into new calculated parameters.
NASA Astrophysics Data System (ADS)
Bakoban, Rana A.
2017-08-01
The coefficient of variation [CV] has several applications in applied statistics. So in this paper, we adopt Bayesian and non-Bayesian approaches for the estimation of CV under type-II censored data from extension exponential distribution [EED]. The point and interval estimate of the CV are obtained for each of the maximum likelihood and parametric bootstrap techniques. Also the Bayesian approach with the help of MCMC method is presented. A real data set is presented and analyzed, hence the obtained results are used to assess the obtained theoretical results.
Wang, Hongrui; Wang, Cheng; Wang, Ying; ...
2017-04-05
This paper presents a Bayesian approach using Metropolis-Hastings Markov Chain Monte Carlo algorithm and applies this method for daily river flow rate forecast and uncertainty quantification for Zhujiachuan River using data collected from Qiaotoubao Gage Station and other 13 gage stations in Zhujiachuan watershed in China. The proposed method is also compared with the conventional maximum likelihood estimation (MLE) for parameter estimation and quantification of associated uncertainties. While the Bayesian method performs similarly in estimating the mean value of daily flow rate, it performs over the conventional MLE method on uncertainty quantification, providing relatively narrower reliable interval than the MLEmore » confidence interval and thus more precise estimation by using the related information from regional gage stations. As a result, the Bayesian MCMC method might be more favorable in the uncertainty analysis and risk management.« less
Bayesian seismic tomography by parallel interacting Markov chains
NASA Astrophysics Data System (ADS)
Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas
2014-05-01
The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.
Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.
dos Reis, Mario; Yang, Ziheng
2011-07-01
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.
Dong, Linsong; Wang, Zhiyong
2018-06-11
Genomic prediction is feasible for estimating genomic breeding values because of dense genome-wide markers and credible statistical methods, such as Genomic Best Linear Unbiased Prediction (GBLUP) and various Bayesian methods. Compared with GBLUP, Bayesian methods propose more flexible assumptions for the distributions of SNP effects. However, most Bayesian methods are performed based on Markov chain Monte Carlo (MCMC) algorithms, leading to computational efficiency challenges. Hence, some fast Bayesian approaches, such as fast BayesB (fBayesB), were proposed to speed up the calculation. This study proposed another fast Bayesian method termed fast BayesC (fBayesC). The prior distribution of fBayesC assumes that a SNP with probability γ has a non-zero effect which comes from a normal density with a common variance. The simulated data from QTLMAS XII workshop and actual data on large yellow croaker were used to compare the predictive results of fBayesB, fBayesC and (MCMC-based) BayesC. The results showed that when γ was set as a small value, such as 0.01 in the simulated data or 0.001 in the actual data, fBayesB and fBayesC yielded lower prediction accuracies (abilities) than BayesC. In the actual data, fBayesC could yield very similar predictive abilities as BayesC when γ ≥ 0.01. When γ = 0.01, fBayesB could also yield similar results as fBayesC and BayesC. However, fBayesB could not yield an explicit result when γ ≥ 0.1, but a similar situation was not observed for fBayesC. Moreover, the computational speed of fBayesC was significantly faster than that of BayesC, making fBayesC a promising method for genomic prediction.
NASA Astrophysics Data System (ADS)
Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.
2011-12-01
Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.
Bayesian Group Bridge for Bi-level Variable Selection.
Mallick, Himel; Yi, Nengjun
2017-06-01
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
ERIC Educational Resources Information Center
Martin-Fernandez, Manuel; Revuelta, Javier
2017-01-01
This study compares the performance of two estimation algorithms of new usage, the Metropolis-Hastings Robins-Monro (MHRM) and the Hamiltonian MCMC (HMC), with two consolidated algorithms in the psychometric literature, the marginal likelihood via EM algorithm (MML-EM) and the Markov chain Monte Carlo (MCMC), in the estimation of multidimensional…
NASA Astrophysics Data System (ADS)
Ruggeri, Paolo; Irving, James; Holliger, Klaus
2015-08-01
We critically examine the performance of sequential geostatistical resampling (SGR) as a model proposal mechanism for Bayesian Markov-chain-Monte-Carlo (MCMC) solutions to near-surface geophysical inverse problems. Focusing on a series of simple yet realistic synthetic crosshole georadar tomographic examples characterized by different numbers of data, levels of data error and degrees of model parameter spatial correlation, we investigate the efficiency of three different resampling strategies with regard to their ability to generate statistically independent realizations from the Bayesian posterior distribution. Quite importantly, our results show that, no matter what resampling strategy is employed, many of the examined test cases require an unreasonably high number of forward model runs to produce independent posterior samples, meaning that the SGR approach as currently implemented will not be computationally feasible for a wide range of problems. Although use of a novel gradual-deformation-based proposal method can help to alleviate these issues, it does not offer a full solution. Further, we find that the nature of the SGR is found to strongly influence MCMC performance; however no clear rule exists as to what set of inversion parameters and/or overall proposal acceptance rate will allow for the most efficient implementation. We conclude that although the SGR methodology is highly attractive as it allows for the consideration of complex geostatistical priors as well as conditioning to hard and soft data, further developments are necessary in the context of novel or hybrid MCMC approaches for it to be considered generally suitable for near-surface geophysical inversions.
Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J
2016-02-01
A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.
Quantifying MCMC exploration of phylogenetic tree space.
Whidden, Chris; Matsen, Frederick A
2015-05-01
In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.
Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R
2004-11-22
We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
NASA Astrophysics Data System (ADS)
Siripatana, Adil; Mayo, Talea; Sraj, Ihab; Knio, Omar; Dawson, Clint; Le Maitre, Olivier; Hoteit, Ibrahim
2017-08-01
Bayesian estimation/inversion is commonly used to quantify and reduce modeling uncertainties in coastal ocean model, especially in the framework of parameter estimation. Based on Bayes rule, the posterior probability distribution function (pdf) of the estimated quantities is obtained conditioned on available data. It can be computed either directly, using a Markov chain Monte Carlo (MCMC) approach, or by sequentially processing the data following a data assimilation approach, which is heavily exploited in large dimensional state estimation problems. The advantage of data assimilation schemes over MCMC-type methods arises from the ability to algorithmically accommodate a large number of uncertain quantities without significant increase in the computational requirements. However, only approximate estimates are generally obtained by this approach due to the restricted Gaussian prior and noise assumptions that are generally imposed in these methods. This contribution aims at evaluating the effectiveness of utilizing an ensemble Kalman-based data assimilation method for parameter estimation of a coastal ocean model against an MCMC polynomial chaos (PC)-based scheme. We focus on quantifying the uncertainties of a coastal ocean ADvanced CIRCulation (ADCIRC) model with respect to the Manning's n coefficients. Based on a realistic framework of observation system simulation experiments (OSSEs), we apply an ensemble Kalman filter and the MCMC method employing a surrogate of ADCIRC constructed by a non-intrusive PC expansion for evaluating the likelihood, and test both approaches under identical scenarios. We study the sensitivity of the estimated posteriors with respect to the parameters of the inference methods, including ensemble size, inflation factor, and PC order. A full analysis of both methods, in the context of coastal ocean model, suggests that an ensemble Kalman filter with appropriate ensemble size and well-tuned inflation provides reliable mean estimates and uncertainties of Manning's n coefficients compared to the full posterior distributions inferred by MCMC.
Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
Norris, Peter M; da Silva, Arlindo M
2016-07-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
NASA Technical Reports Server (NTRS)
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
Norris, Peter M.; da Silva, Arlindo M.
2018-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Trans-dimensional MCMC methods for fully automatic motion analysis in tagged MRI.
Smal, Ihor; Carranza-Herrezuelo, Noemí; Klein, Stefan; Niessen, Wiro; Meijering, Erik
2011-01-01
Tagged magnetic resonance imaging (tMRI) is a well-known noninvasive method allowing quantitative analysis of regional heart dynamics. Its clinical use has so far been limited, in part due to the lack of robustness and accuracy of existing tag tracking algorithms in dealing with low (and intrinsically time-varying) image quality. In this paper, we propose a novel probabilistic method for tag tracking, implemented by means of Bayesian particle filtering and a trans-dimensional Markov chain Monte Carlo (MCMC) approach, which efficiently combines information about the imaging process and tag appearance with prior knowledge about the heart dynamics obtained by means of non-rigid image registration. Experiments using synthetic image data (with ground truth) and real data (with expert manual annotation) from preclinical (small animal) and clinical (human) studies confirm that the proposed method yields higher consistency, accuracy, and intrinsic tag reliability assessment in comparison with other frequently used tag tracking methods.
NASA Astrophysics Data System (ADS)
Lu, Dan; Ricciuto, Daniel; Walker, Anthony; Safta, Cosmin; Munger, William
2017-09-01
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results in a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. The result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.
Searching for efficient Markov chain Monte Carlo proposal kernels
Yang, Ziheng; Rodríguez, Carlos E.
2013-01-01
Markov chain Monte Carlo (MCMC) or the Metropolis–Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis–Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals. PMID:24218600
Annealed Importance Sampling Reversible Jump MCMC algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios; Andrieu, Christophe
2013-03-20
It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
NASA Astrophysics Data System (ADS)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less
Exact Bayesian Inference for Phylogenetic Birth-Death Models.
Parag, K V; Pybus, O G
2018-04-26
Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
2017-04-12
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2010-11-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2011-01-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266
Hybrid Gibbs Sampling and MCMC for CMB Analysis at Small Angular Scales
NASA Technical Reports Server (NTRS)
Jewell, Jeffrey B.; Eriksen, H. K.; Wandelt, B. D.; Gorski, K. M.; Huey, G.; O'Dwyer, I. J.; Dickinson, C.; Banday, A. J.; Lawrence, C. R.
2008-01-01
A) Gibbs Sampling has now been validated as an efficient, statistically exact, and practically useful method for "low-L" (as demonstrated on WMAP temperature polarization data). B) We are extending Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters for the entire range of angular scales relevant for Planck. C) Made possible by inclusion of foreground model parameters in Gibbs sampling and hybrid MCMC and Gibbs sampling for the low signal to noise (high-L) regime. D) Future items to be included in the Bayesian framework include: 1) Integration with Hybrid Likelihood (or posterior) code for cosmological parameters; 2) Include other uncertainties in instrumental systematics? (I.e. beam uncertainties, noise estimation, calibration errors, other).
Hamilton, B H
1999-08-01
The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...
2015-02-05
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
Bayesian Estimation of the Logistic Positive Exponent IRT Model
ERIC Educational Resources Information Center
Bolfarine, Heleno; Bazan, Jorge Luis
2010-01-01
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less
Lu, Dan; Ricciuto, Daniel M.; Walker, Anthony P.; ...
2017-09-27
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this work, a differential evolution adaptive Metropolis (DREAM) algorithm is used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The calibration of DREAM results inmore » a better model fit and predictive performance compared to the popular adaptive Metropolis (AM) scheme. Moreover, DREAM indicates that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identifies one mode. The application suggests that DREAM is very suitable to calibrate complex terrestrial ecosystem models, where the uncertain parameter size is usually large and existence of local optima is always a concern. In addition, this effort justifies the assumptions of the error model used in Bayesian calibration according to the residual analysis. Here, the result indicates that a heteroscedastic, correlated, Gaussian error model is appropriate for the problem, and the consequent constructed likelihood function can alleviate the underestimation of parameter uncertainty that is usually caused by using uncorrelated error models.« less
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)
NASA Astrophysics Data System (ADS)
Kasibhatla, P.
2004-12-01
In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.
Liang, Faming; Kim, Jinsu; Song, Qifan
2016-01-01
Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.
A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data
Kim, Jinsu; Song, Qifan
2016-01-01
Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
A Bayesian approach to the modelling of α Cen A
NASA Astrophysics Data System (ADS)
Bazot, M.; Bourguignon, S.; Christensen-Dalsgaard, J.
2012-12-01
Determining the physical characteristics of a star is an inverse problem consisting of estimating the parameters of models for the stellar structure and evolution, and knowing certain observable quantities. We use a Bayesian approach to solve this problem for α Cen A, which allows us to incorporate prior information on the parameters to be estimated, in order to better constrain the problem. Our strategy is based on the use of a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior probability densities of the stellar parameters: mass, age, initial chemical composition, etc. We use the stellar evolutionary code ASTEC to model the star. To constrain this model both seismic and non-seismic observations were considered. Several different strategies were tested to fit these values, using either two free parameters or five free parameters in ASTEC. We are thus able to show evidence that MCMC methods become efficient with respect to more classical grid-based strategies when the number of parameters increases. The results of our MCMC algorithm allow us to derive estimates for the stellar parameters and robust uncertainties thanks to the statistical analysis of the posterior probability densities. We are also able to compute odds for the presence of a convective core in α Cen A. When using core-sensitive seismic observational constraints, these can rise above ˜40 per cent. The comparison of results to previous studies also indicates that these seismic constraints are of critical importance for our knowledge of the structure of this star.
Comparison of sampling techniques for Bayesian parameter estimation
NASA Astrophysics Data System (ADS)
Allison, Rupert; Dunkley, Joanna
2014-02-01
The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413
pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa
NASA Astrophysics Data System (ADS)
Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.
2011-07-01
Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.
Resolution analysis of marine seismic full waveform data by Bayesian inversion
NASA Astrophysics Data System (ADS)
Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.
2015-12-01
The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.
NASA Astrophysics Data System (ADS)
Abhinav, S.; Manohar, C. S.
2018-03-01
The problem of combined state and parameter estimation in nonlinear state space models, based on Bayesian filtering methods, is considered. A novel approach, which combines Rao-Blackwellized particle filters for state estimation with Markov chain Monte Carlo (MCMC) simulations for parameter identification, is proposed. In order to ensure successful performance of the MCMC samplers, in situations involving large amount of dynamic measurement data and (or) low measurement noise, the study employs a modified measurement model combined with an importance sampling based correction. The parameters of the process noise covariance matrix are also included as quantities to be identified. The study employs the Rao-Blackwellization step at two stages: one, associated with the state estimation problem in the particle filtering step, and, secondly, in the evaluation of the ratio of likelihoods in the MCMC run. The satisfactory performance of the proposed method is illustrated on three dynamical systems: (a) a computational model of a nonlinear beam-moving oscillator system, (b) a laboratory scale beam traversed by a loaded trolley, and (c) an earthquake shake table study on a bending-torsion coupled nonlinear frame subjected to uniaxial support motion.
Bayesian median regression for temporal gene expression data
NASA Astrophysics Data System (ADS)
Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.
2007-09-01
Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.
Dimension-independent likelihood-informed MCMC
Cui, Tiangang; Law, Kody J. H.; Marzouk, Youssef M.
2015-10-08
Many Bayesian inference problems require exploring the posterior distribution of highdimensional parameters that represent the discretization of an underlying function. Our work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. There are two distinct lines of research that intersect in the methods we develop here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian informationmore » and any associated lowdimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent and likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Finally, we use two nonlinear inverse problems in order to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.« less
A Bayesian approach to modeling 2D gravity data using polygon states
NASA Astrophysics Data System (ADS)
Titus, W. J.; Titus, S.; Davis, J. R.
2015-12-01
We present a Bayesian Markov chain Monte Carlo (MCMC) method for the 2D gravity inversion of a localized subsurface object with constant density contrast. Our models have four parameters: the density contrast, the number of vertices in a polygonal approximation of the object, an upper bound on the ratio of the perimeter squared to the area, and the vertices of a polygon container that bounds the object. Reasonable parameter values can be estimated prior to inversion using a forward model and geologic information. In addition, we assume that the field data have a common random uncertainty that lies between two bounds but that it has no systematic uncertainty. Finally, we assume that there is no uncertainty in the spatial locations of the measurement stations. For any set of model parameters, we use MCMC methods to generate an approximate probability distribution of polygons for the object. We then compute various probability distributions for the object, including the variance between the observed and predicted fields (an important quantity in the MCMC method), the area, the center of area, and the occupancy probability (the probability that a spatial point lies within the object). In addition, we compare probabilities of different models using parallel tempering, a technique which also mitigates trapping in local optima that can occur in certain model geometries. We apply our method to several synthetic data sets generated from objects of varying shape and location. We also analyze a natural data set collected across the Rio Grande Gorge Bridge in New Mexico, where the object (i.e. the air below the bridge) is known and the canyon is approximately 2D. Although there are many ways to view results, the occupancy probability proves quite powerful. We also find that the choice of the container is important. In particular, large containers should be avoided, because the more closely a container confines the object, the better the predictions match properties of object.
Yen, A M-F; Liou, H-H; Lin, H-L; Chen, T H-H
2006-01-01
The study aimed to develop a predictive model to deal with data fraught with heterogeneity that cannot be explained by sampling variation or measured covariates. The random-effect Poisson regression model was first proposed to deal with over-dispersion for data fraught with heterogeneity after making allowance for measured covariates. Bayesian acyclic graphic model in conjunction with Markov Chain Monte Carlo (MCMC) technique was then applied to estimate the parameters of both relevant covariates and random effect. Predictive distribution was then generated to compare the predicted with the observed for the Bayesian model with and without random effect. Data from repeated measurement of episodes among 44 patients with intractable epilepsy were used as an illustration. The application of Poisson regression without taking heterogeneity into account to epilepsy data yielded a large value of heterogeneity (heterogeneity factor = 17.90, deviance = 1485, degree of freedom (df) = 83). After taking the random effect into account, the value of heterogeneity factor was greatly reduced (heterogeneity factor = 0.52, deviance = 42.5, df = 81). The Pearson chi2 for the comparison between the expected seizure frequencies and the observed ones at two and three months of the model with and without random effect were 34.27 (p = 1.00) and 1799.90 (p < 0.0001), respectively. The Bayesian acyclic model using the MCMC method was demonstrated to have great potential for disease prediction while data show over-dispersion attributed either to correlated property or to subject-to-subject variability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Dan; Ricciuto, Daniel; Walker, Anthony
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less
Lu, Dan; Ricciuto, Daniel; Walker, Anthony; ...
2017-02-22
Calibration of terrestrial ecosystem models is important but challenging. Bayesian inference implemented by Markov chain Monte Carlo (MCMC) sampling provides a comprehensive framework to estimate model parameters and associated uncertainties using their posterior distributions. The effectiveness and efficiency of the method strongly depend on the MCMC algorithm used. In this study, a Differential Evolution Adaptive Metropolis (DREAM) algorithm was used to estimate posterior distributions of 21 parameters for the data assimilation linked ecosystem carbon (DALEC) model using 14 years of daily net ecosystem exchange data collected at the Harvard Forest Environmental Measurement Site eddy-flux tower. The DREAM is a multi-chainmore » method and uses differential evolution technique for chain movement, allowing it to be efficiently applied to high-dimensional problems, and can reliably estimate heavy-tailed and multimodal distributions that are difficult for single-chain schemes using a Gaussian proposal distribution. The results were evaluated against the popular Adaptive Metropolis (AM) scheme. DREAM indicated that two parameters controlling autumn phenology have multiple modes in their posterior distributions while AM only identified one mode. The calibration of DREAM resulted in a better model fit and predictive performance compared to the AM. DREAM provides means for a good exploration of the posterior distributions of model parameters. Lastly, it reduces the risk of false convergence to a local optimum and potentially improves the predictive performance of the calibrated model.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
NASA Astrophysics Data System (ADS)
Pascoe, D. J.; Anfinogentov, S.; Nisticò, G.; Goddard, C. R.; Nakariakov, V. M.
2017-04-01
Context. The strong damping of kink oscillations of coronal loops can be explained by mode coupling. The damping envelope depends on the transverse density profile of the loop. Observational measurements of the damping envelope have been used to determine the transverse loop structure which is important for understanding other physical processes such as heating. Aims: The general damping envelope describing the mode coupling of kink waves consists of a Gaussian damping regime followed by an exponential damping regime. Recent observational detection of these damping regimes has been employed as a seismological tool. We extend the description of the damping behaviour to account for additional physical effects, namely a time-dependent period of oscillation, the presence of additional longitudinal harmonics, and the decayless regime of standing kink oscillations. Methods: We examine four examples of standing kink oscillations observed by the Atmospheric Imaging Assembly (AIA) onboard the Solar Dynamics Observatory (SDO). We use forward modelling of the loop position and investigate the dependence on the model parameters using Bayesian inference and Markov chain Monte Carlo (MCMC) sampling. Results: Our improvements to the physical model combined with the use of Bayesian inference and MCMC produce improved estimates of model parameters and their uncertainties. Calculation of the Bayes factor also allows us to compare the suitability of different physical models. We also use a new method based on spline interpolation of the zeroes of the oscillation to accurately describe the background trend of the oscillating loop. Conclusions: This powerful and robust method allows for accurate seismology of coronal loops, in particular the transverse density profile, and potentially reveals additional physical effects.
Bayesian inference on EMRI signals using low frequency approximations
NASA Astrophysics Data System (ADS)
Ali, Asad; Christensen, Nelson; Meyer, Renate; Röver, Christian
2012-07-01
Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation methods presented in this paper are general in their nature, and can be applied in any other scenario such as AdLIGO, AdVIRGO and Einstein Telescope with their respective response functions.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2012-12-01
In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.
A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets
NASA Astrophysics Data System (ADS)
JafarGandomi, Arash; Binley, Andrew
2013-09-01
We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is successful not only in enhancing the subsurface information but also as a survey design tool to identify the appropriate combination of the geophysical tools and show whether application of an individual method for further investigation of a specific site is beneficial.
NASA Astrophysics Data System (ADS)
Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir
2017-06-01
We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.
Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.
2009-01-01
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Finite element model updating using the shadow hybrid Monte Carlo technique
NASA Astrophysics Data System (ADS)
Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.
2015-02-01
Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.
Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study.
Heron, Elizabeth A; Finkenstädt, Bärbel; Rand, David A
2007-10-01
In this study, we address the problem of estimating the parameters of regulatory networks and provide the first application of Markov chain Monte Carlo (MCMC) methods to experimental data. As a case study, we consider a stochastic model of the Hes1 system expressed in terms of stochastic differential equations (SDEs) to which rigorous likelihood methods of inference can be applied. When fitting continuous-time stochastic models to discretely observed time series the lengths of the sampling intervals are important, and much of our study addresses the problem when the data are sparse. We estimate the parameters of an autoregulatory network providing results both for simulated and real experimental data from the Hes1 system. We develop an estimation algorithm using MCMC techniques which are flexible enough to allow for the imputation of latent data on a finer time scale and the presence of prior information about parameters which may be informed from other experiments as well as additional measurement error.
NASA Astrophysics Data System (ADS)
Gaál, Ladislav; Szolgay, Ján.; Bacigál, Tomáå.¡; Kohnová, Silvia
2010-05-01
Copula-based estimation methods of hydro-climatological extremes have increasingly been gaining attention of researchers and practitioners in the last couple of years. Unlike the traditional estimation methods which are based on bivariate cumulative distribution functions (CDFs), copulas are a relatively flexible tool of statistics that allow for modelling dependencies between two or more variables such as flood peaks and flood volumes without making strict assumptions on the marginal distributions. The dependence structure and the reliability of the joint estimates of hydro-climatological extremes, mainly in the right tail of the joint CDF not only depends on the particular copula adopted but also on the data available for the estimation of the marginal distributions of the individual variables. Generally, data samples for frequency modelling have limited temporal extent, which is a considerable drawback of frequency analyses in practice. Therefore, it is advised to deal with statistical methods that improve any part of the process of copula construction and result in more reliable design values of hydrological variables. The scarcity of the data sample mostly in the extreme tail of the joint CDF can be bypassed, e.g., by using a considerably larger amount of simulated data by rainfall-runoff analysis or by including historical information on the variables under study. The latter approach of data extension is used here to make the quantile estimates of the individual marginals of the copula more reliable. In the presented paper it is proposed to use historical information in the frequency analysis of the marginal distributions in the framework of Bayesian Monte Carlo Markov Chain (MCMC) simulations. Generally, a Bayesian approach allows for a straightforward combination of different sources of information on floods (e.g. flood data from systematic measurements and historical flood records, respectively) in terms of a product of the corresponding likelihood functions. On the other hand, the MCMC algorithm is a numerical approach for sampling from the likelihood distributions. The Bayesian MCMC methods therefore provide an attractive way to estimate the uncertainty in parameters and quantile metrics of frequency distributions. The applicability of the method is demonstrated in a case study of the hydroelectric power station Orlík on the Vltava River. This site has a key role in the flood prevention of Prague, the capital city of the Czech Republic. The record length of the available flood data is 126 years from the period 1877-2002, while the flood event observed in 2002 that caused extensive damages and numerous casualties is treated as a historic one. To estimate the joint probabilities of flood peaks and volumes, different copulas are fitted and their goodness-of-fit are evaluated by bootstrap simulations. Finally, selected quantiles of flood volumes conditioned on given flood peaks are derived and compared with those obtained by the traditional method used in the practice of water management specialists of the Vltava River.
Kassian, Alexei
2015-01-01
A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.
Kassian, Alexei
2015-01-01
A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes.
Cognitive diagnosis modelling incorporating item response times.
Zhan, Peida; Jiao, Hong; Liao, Dandan
2018-05-01
To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Modeling two strains of disease via aggregate-level infectivity curves.
Romanescu, Razvan; Deardon, Rob
2016-04-01
Well formulated models of disease spread, and efficient methods to fit them to observed data, are powerful tools for aiding the surveillance and control of infectious diseases. Our project considers the problem of the simultaneous spread of two related strains of disease in a context where spatial location is the key driver of disease spread. We start our modeling work with the individual level models (ILMs) of disease transmission, and extend these models to accommodate the competing spread of the pathogens in a two-tier hierarchical population (whose levels we refer to as 'farm' and 'animal'). The postulated interference mechanism between the two strains is a period of cross-immunity following infection. We also present a framework for speeding up the computationally intensive process of fitting the ILM to data, typically done using Markov chain Monte Carlo (MCMC) in a Bayesian framework, by turning the inference into a two-stage process. First, we approximate the number of animals infected on a farm over time by infectivity curves. These curves are fit to data sampled from farms, using maximum likelihood estimation, then, conditional on the fitted curves, Bayesian MCMC inference proceeds for the remaining parameters. Finally, we use posterior predictive distributions of salient epidemic summary statistics, in order to assess the model fitted.
Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.
Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria
2010-08-06
Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.
On the Bayesian Treed Multivariate Gaussian Process with Linear Model of Coregionalization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Lin, Guang
2015-02-01
The Bayesian treed Gaussian process (BTGP) has gained popularity in recent years because it provides a straightforward mechanism for modeling non-stationary data and can alleviate computational demands by fitting models to less data. The extension of BTGP to the multivariate setting requires us to model the cross-covariance and to propose efficient algorithms that can deal with trans-dimensional MCMC moves. In this paper we extend the cross-covariance of the Bayesian treed multivariate Gaussian process (BTMGP) to that of linear model of Coregionalization (LMC) cross-covariances. Different strategies have been developed to improve the MCMC mixing and invert smaller matrices in the Bayesianmore » inference. Moreover, we compare the proposed BTMGP with existing multiple BTGP and BTMGP in test cases and multiphase flow computer experiment in a full scale regenerator of a carbon capture unit. The use of the BTMGP with LMC cross-covariance helped to predict the computer experiments relatively better than existing competitors. The proposed model has a wide variety of applications, such as computer experiments and environmental data. In the case of computer experiments we also develop an adaptive sampling strategy for the BTMGP with LMC cross-covariance function.« less
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC
Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-01-01
Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.
Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-08-28
We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
NASA Astrophysics Data System (ADS)
Laloy, Eric; Beerten, Koen; Vanacker, Veerle; Christl, Marcus; Rogiers, Bart; Wouters, Laurent
2017-07-01
The rate at which low-lying sandy areas in temperate regions, such as the Campine Plateau (NE Belgium), have been eroding during the Quaternary is a matter of debate. Current knowledge on the average pace of landscape evolution in the Campine area is largely based on geological inferences and modern analogies. We performed a Bayesian inversion of an in situ-produced 10Be concentration depth profile to infer the average long-term erosion rate together with two other parameters: the surface exposure age and the inherited 10Be concentration. Compared to the latest advances in probabilistic inversion of cosmogenic radionuclide (CRN) data, our approach has the following two innovative components: it (1) uses Markov chain Monte Carlo (MCMC) sampling and (2) accounts (under certain assumptions) for the contribution of model errors to posterior uncertainty. To investigate to what extent our approach differs from the state of the art in practice, a comparison against the Bayesian inversion method implemented in the CRONUScalc program is made. Both approaches identify similar maximum a posteriori (MAP) parameter values, but posterior parameter and predictive uncertainty derived using the method taken in CRONUScalc is moderately underestimated. A simple way for producing more consistent uncertainty estimates with the CRONUScalc-like method in the presence of model errors is therefore suggested. Our inferred erosion rate of 39 ± 8. 9 mm kyr-1 (1σ) is relatively large in comparison with landforms that erode under comparable (paleo-)climates elsewhere in the world. We evaluate this value in the light of the erodibility of the substrate and sudden base level lowering during the Middle Pleistocene. A denser sampling scheme of a two-nuclide concentration depth profile would allow for better inferred erosion rate resolution, and including more uncertain parameters in the MCMC inversion.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
NASA Astrophysics Data System (ADS)
Duan, Y.; Durand, M. T.; Jezek, K. C.; Yardim, C.; Bringer, A.; Aksoy, M.; Johnson, J. T.
2017-12-01
The ultra-wideband software-defined microwave radiometer (UWBRAD) is designed to provide ice sheet internal temperature product via measuring low frequency microwave emission. Twelve channels ranging from 0.5 to 2.0 GHz are covered by the instrument. A Greenland air-borne demonstration was demonstrated in September 2016, provided first demonstration of Ultra-wideband radiometer observations of geophysical scenes, including ice sheets. Another flight is planned for September 2017 for acquiring measurements in central ice sheet. A Bayesian framework is designed to retrieve the ice sheet internal temperature from simulated UWBRAD brightness temperature (Tb) measurements over Greenland flight path with limited prior information of the ground. A 1-D heat-flow model, the Robin Model, was used to model the ice sheet internal temperature profile with ground information. Synthetic UWBRAD Tb observations was generated via the partially coherent radiation transfer model, which utilizes the Robin model temperature profile and an exponential fit of ice density from Borehole measurement as input, and corrupted with noise. The effective surface temperature, geothermal heat flux, the variance of upper layer ice density, and the variance of fine scale density variation at deeper ice sheet were treated as unknown variables within the retrieval framework. Each parameter is defined with its possible range and set to be uniformly distributed. The Markov Chain Monte Carlo (MCMC) approach is applied to make the unknown parameters randomly walk in the parameter space. We investigate whether the variables can be improved over priors using the MCMC approach and contribute to the temperature retrieval theoretically. UWBRAD measurements near camp century from 2016 was also treated with the MCMC to examine the framework with scattering effect. The fine scale density fluctuation is an important parameter. It is the most sensitive yet highly unknown parameter in the estimation framework. Including the fine scale density fluctuation greatly improved the retrieval results. The ice sheet vertical temperature profile, especially the 10m temperature, can be well retrieved via the MCMC process. Future retrieval work will apply the Bayesian approach to UWBRAD airborne measurements.
No control genes required: Bayesian analysis of qRT-PCR data.
Matz, Mikhail V; Wright, Rachel M; Scott, James G
2013-01-01
Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.
Receptive Field Inference with Localized Priors
Park, Mijung; Pillow, Jonathan W.
2011-01-01
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets. PMID:22046110
Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel
2011-05-23
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Modeling error distributions of growth curve models through Bayesian methods.
Zhang, Zhiyong
2016-06-01
Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.
Using Stan for Item Response Theory Models
ERIC Educational Resources Information Center
Ames, Allison J.; Au, Chi Hang
2018-01-01
Stan is a flexible probabilistic programming language providing full Bayesian inference through Hamiltonian Monte Carlo algorithms. The benefits of Hamiltonian Monte Carlo include improved efficiency and faster inference, when compared to other MCMC software implementations. Users can interface with Stan through a variety of computing…
Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich
2009-02-10
Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.
Jin, Ick Hoon; Yuan, Ying; Liang, Faming
2013-10-01
Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Chunyuan; Stevens, Andrew J.; Chen, Changyou
2016-08-10
Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (SG-MCMC) to learn weight uncertainty in DNNs. It yields principled Bayesian interpretations for the commonly used Dropout/DropConnect techniques and incorporates them into the SG-MCMC framework. Extensive experiments on 2D &more » 3D shape datasets and various DNN models demonstrate the superiority of the proposed approach over stochastic optimization. Our approach yields higher recognition accuracy when used in conjunction with Dropout and Batch-Normalization.« less
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng
2015-01-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
2011-01-01
Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Real-time individual predictions of prostate cancer recurrence using joint models
Taylor, Jeremy M. G.; Park, Yongseok; Ankerst, Donna P.; Proust-Lima, Cecile; Williams, Scott; Kestin, Larry; Bae, Kyoungwha; Pickles, Tom; Sandler, Howard
2012-01-01
Summary Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient. PMID:23379600
NASA Astrophysics Data System (ADS)
Liu, Y.; Pau, G. S. H.; Finsterle, S.
2015-12-01
Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simulated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231
NASA Astrophysics Data System (ADS)
Köpke, Corinna; Irving, James; Elsheikh, Ahmed H.
2018-06-01
Bayesian solutions to geophysical and hydrological inverse problems are dependent upon a forward model linking subsurface physical properties to measured data, which is typically assumed to be perfectly known in the inversion procedure. However, to make the stochastic solution of the inverse problem computationally tractable using methods such as Markov-chain-Monte-Carlo (MCMC), fast approximations of the forward model are commonly employed. This gives rise to model error, which has the potential to significantly bias posterior statistics if not properly accounted for. Here, we present a new methodology for dealing with the model error arising from the use of approximate forward solvers in Bayesian solutions to hydrogeophysical inverse problems. Our approach is geared towards the common case where this error cannot be (i) effectively characterized through some parametric statistical distribution; or (ii) estimated by interpolating between a small number of computed model-error realizations. To this end, we focus on identification and removal of the model-error component of the residual during MCMC using a projection-based approach, whereby the orthogonal basis employed for the projection is derived in each iteration from the K-nearest-neighboring entries in a model-error dictionary. The latter is constructed during the inversion and grows at a specified rate as the iterations proceed. We demonstrate the performance of our technique on the inversion of synthetic crosshole ground-penetrating radar travel-time data considering three different subsurface parameterizations of varying complexity. Synthetic data are generated using the eikonal equation, whereas a straight-ray forward model is assumed for their inversion. In each case, our developed approach enables us to remove posterior bias and obtain a more realistic characterization of uncertainty.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Huiying; Ray, Jaideep; Hou, Zhangshuan
In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic amplitude versus angle (AVA) and controlled source electromagnetic (CSEM) data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo (MCMC) sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis (DREAM) and Adaptive Metropolis (AM) samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and CSEM data. The multi-chain MCMC is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration,more » the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic AVA and CSEM joint inversion provides better estimation of reservoir saturations than the seismic AVA-only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated – reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.« less
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan; ...
2017-09-07
In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ray, Jaideep; Lefantzi, Sophia; Arunajatesan, Srinivasan
In this paper, we demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynolds-averaged Navier–Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscositymore » model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models (“curve-fits”). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. Finally, we find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.« less
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module
NASA Astrophysics Data System (ADS)
Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan
2017-11-01
We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.
NASA Astrophysics Data System (ADS)
Smith, T.; Marshall, L.
2007-12-01
In many mountainous regions, the single most important parameter in forecasting the controls on regional water resources is snowpack (Williams et al., 1999). In an effort to bridge the gap between theoretical understanding and functional modeling of snow-driven watersheds, a flexible hydrologic modeling framework is being developed. The aim is to create a suite of models that move from parsimonious structures, concentrated on aggregated watershed response, to those focused on representing finer scale processes and distributed response. This framework will operate as a tool to investigate the link between hydrologic model predictive performance, uncertainty, model complexity, and observable hydrologic processes. Bayesian methods, and particularly Markov chain Monte Carlo (MCMC) techniques, are extremely useful in uncertainty assessment and parameter estimation of hydrologic models. However, these methods have some difficulties in implementation. In a traditional Bayesian setting, it can be difficult to reconcile multiple data types, particularly those offering different spatial and temporal coverage, depending on the model type. These difficulties are also exacerbated by sensitivity of MCMC algorithms to model initialization and complex parameter interdependencies. As a way of circumnavigating some of the computational complications, adaptive MCMC algorithms have been developed to take advantage of the information gained from each successive iteration. Two adaptive algorithms are compared is this study, the Adaptive Metropolis (AM) algorithm, developed by Haario et al (2001), and the Delayed Rejection Adaptive Metropolis (DRAM) algorithm, developed by Haario et al (2006). While neither algorithm is truly Markovian, it has been proven that each satisfies the desired ergodicity and stationarity properties of Markov chains. Both algorithms were implemented as the uncertainty and parameter estimation framework for a conceptual rainfall-runoff model based on the Probability Distributed Model (PDM), developed by Moore (1985). We implement the modeling framework in Stringer Creek watershed in the Tenderfoot Creek Experimental Forest (TCEF), Montana. The snowmelt-driven watershed offers that additional challenge of modeling snow accumulation and melt and current efforts are aimed at developing a temperature- and radiation-index snowmelt model. Auxiliary data available from within TCEF's watersheds are used to support in the understanding of information value as it relates to predictive performance. Because the model is based on lumped parameters, auxiliary data are hard to incorporate directly. However, these additional data offer benefits through the ability to inform prior distributions of the lumped, model parameters. By incorporating data offering different information into the uncertainty assessment process, a cross-validation technique is engaged to better ensure that modeled results reflect real process complexity.
Vehtari, Aki; Mäkinen, Ville-Petteri; Soininen, Pasi; Ingman, Petri; Mäkelä, Sanna M; Savolainen, Markku J; Hannuksela, Minna L; Kaski, Kimmo; Ala-Korpela, Mika
2007-01-01
Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics. PMID:17493257
Stochastic static fault slip inversion from geodetic data with non-negativity and bound constraints
NASA Astrophysics Data System (ADS)
Nocquet, J.-M.
2018-07-01
Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modelling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a truncated multivariate normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulae for the single, 2-D or n-D marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations. Posterior mean and covariance can also be efficiently derived. I show that the maximum posterior (MAP) can be obtained using a non-negative least-squares algorithm for the single truncated case or using the bounded-variable least-squares algorithm for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modelling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC-based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the MAP is extremely fast.
Teaching Markov Chain Monte Carlo: Revealing the Basic Ideas behind the Algorithm
ERIC Educational Resources Information Center
Stewart, Wayne; Stewart, Sepideh
2014-01-01
For many scientists, researchers and students Markov chain Monte Carlo (MCMC) simulation is an important and necessary tool to perform Bayesian analyses. The simulation is often presented as a mathematical algorithm and then translated into an appropriate computer program. However, this can result in overlooking the fundamental and deeper…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xuesong; Liang, Faming; Yu, Beibei
2011-11-09
Estimating uncertainty of hydrologic forecasting is valuable to water resources and other relevant decision making processes. Recently, Bayesian Neural Networks (BNNs) have been proved powerful tools for quantifying uncertainty of streamflow forecasting. In this study, we propose a Markov Chain Monte Carlo (MCMC) framework to incorporate the uncertainties associated with input, model structure, and parameter into BNNs. This framework allows the structure of the neural networks to change by removing or adding connections between neurons and enables scaling of input data by using rainfall multipliers. The results show that the new BNNs outperform the BNNs that only consider uncertainties associatedmore » with parameter and model structure. Critical evaluation of posterior distribution of neural network weights, number of effective connections, rainfall multipliers, and hyper-parameters show that the assumptions held in our BNNs are not well supported. Further understanding of characteristics of different uncertainty sources and including output error into the MCMC framework are expected to enhance the application of neural networks for uncertainty analysis of hydrologic forecasting.« less
Occupancy in community-level studies
MacKenzie, Darryl I.; Nichols, James; Royle, Andy; Pollock, Kenneth H.; Bailey, Larissa L.; Hines, James
2018-01-01
Another type of multi-species studies, are those focused on community-level metrics such as species richness. In this chapter we detail how some of the single-species occupancy models described in earlier chapters have been applied, or extended, for use in such studies, while accounting for imperfect detection. We highlight how Bayesian methods using MCMC are particularly useful in such settings to easily calculate relevant community-level summaries based on presence/absence data. These modeling approaches can be used to assess richness at a single point in time, or to investigate changes in the species pool over time.
MC3: Multi-core Markov-chain Monte Carlo code
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan
2016-10-01
MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.
Buckland, Stephen T.; King, Ruth; Toms, Mike P.
2015-01-01
The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero‐inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean‐variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. PMID:25737026
Exact posterior computation in non-conjugate Gaussian location-scale parameters models
NASA Astrophysics Data System (ADS)
Andrade, J. A. A.; Rathie, P. N.
2017-12-01
In Bayesian analysis the class of conjugate models allows to obtain exact posterior distributions, however this class quite restrictive in the sense that it involves only a few distributions. In fact, most of the practical applications involves non-conjugate models, thus approximate methods, such as the MCMC algorithms, are required. Although these methods can deal with quite complex structures, some practical problems can make their applications quite time demanding, for example, when we use heavy-tailed distributions, convergence may be difficult, also the Metropolis-Hastings algorithm can become very slow, in addition to the extra work inevitably required on choosing efficient candidate generator distributions. In this work, we draw attention to the special functions as a tools for Bayesian computation, we propose an alternative method for obtaining the posterior distribution in Gaussian non-conjugate models in an exact form. We use complex integration methods based on the H-function in order to obtain the posterior distribution and some of its posterior quantities in an explicit computable form. Two examples are provided in order to illustrate the theory.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
NASA Astrophysics Data System (ADS)
Han, Feng; Zheng, Yi
2018-06-01
Significant Input uncertainty is a major source of error in watershed water quality (WWQ) modeling. It remains challenging to address the input uncertainty in a rigorous Bayesian framework. This study develops the Bayesian Analysis of Input and Parametric Uncertainties (BAIPU), an approach for the joint analysis of input and parametric uncertainties through a tight coupling of Markov Chain Monte Carlo (MCMC) analysis and Bayesian Model Averaging (BMA). The formal likelihood function for this approach is derived considering a lag-1 autocorrelated, heteroscedastic, and Skew Exponential Power (SEP) distributed error model. A series of numerical experiments were performed based on a synthetic nitrate pollution case and on a real study case in the Newport Bay Watershed, California. The Soil and Water Assessment Tool (SWAT) and Differential Evolution Adaptive Metropolis (DREAM(ZS)) were used as the representative WWQ model and MCMC algorithm, respectively. The major findings include the following: (1) the BAIPU can be implemented and used to appropriately identify the uncertain parameters and characterize the predictive uncertainty; (2) the compensation effect between the input and parametric uncertainties can seriously mislead the modeling based management decisions, if the input uncertainty is not explicitly accounted for; (3) the BAIPU accounts for the interaction between the input and parametric uncertainties and therefore provides more accurate calibration and uncertainty results than a sequential analysis of the uncertainties; and (4) the BAIPU quantifies the credibility of different input assumptions on a statistical basis and can be implemented as an effective inverse modeling approach to the joint inference of parameters and inputs.
A Robust Deconvolution Method based on Transdimensional Hierarchical Bayesian Inference
NASA Astrophysics Data System (ADS)
Kolb, J.; Lekic, V.
2012-12-01
Analysis of P-S and S-P conversions allows us to map receiver side crustal and lithospheric structure. This analysis often involves deconvolution of the parent wave field from the scattered wave field as a means of suppressing source-side complexity. A variety of deconvolution techniques exist including damped spectral division, Wiener filtering, iterative time-domain deconvolution, and the multitaper method. All of these techniques require estimates of noise characteristics as input parameters. We present a deconvolution method based on transdimensional Hierarchical Bayesian inference in which both noise magnitude and noise correlation are used as parameters in calculating the likelihood probability distribution. Because the noise for P-S and S-P conversion analysis in terms of receiver functions is a combination of both background noise - which is relatively easy to characterize - and signal-generated noise - which is much more difficult to quantify - we treat measurement errors as an known quantity, characterized by a probability density function whose mean and variance are model parameters. This transdimensional Hierarchical Bayesian approach has been successfully used previously in the inversion of receiver functions in terms of shear and compressional wave speeds of an unknown number of layers [1]. In our method we used a Markov chain Monte Carlo (MCMC) algorithm to find the receiver function that best fits the data while accurately assessing the noise parameters. In order to parameterize the receiver function we model the receiver function as an unknown number of Gaussians of unknown amplitude and width. The algorithm takes multiple steps before calculating the acceptance probability of a new model, in order to avoid getting trapped in local misfit minima. Using both observed and synthetic data, we show that the MCMC deconvolution method can accurately obtain a receiver function as well as an estimate of the noise parameters given the parent and daughter components. Furthermore, we demonstrate that this new approach is far less susceptible to generating spurious features even at high noise levels. Finally, the method yields not only the most-likely receiver function, but also quantifies its full uncertainty. [1] Bodin, T., M. Sambridge, H. Tkalčić, P. Arroucau, K. Gallagher, and N. Rawlinson (2012), Transdimensional inversion of receiver functions and surface wave dispersion, J. Geophys. Res., 117, B02301
Leão, William L.; Chen, Ming-Hui
2017-01-01
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor’s 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model. PMID:29333210
Bayesian comparison of protein structures using partial Procrustes distance.
Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi
2017-09-26
An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.
Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S
2016-06-01
Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.
Leão, William L; Abanto-Valle, Carlos A; Chen, Ming-Hui
2017-01-01
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor's 500 index (S&P 500). The empirical results reveal that the stochastic volatility-in-mean model with correlated errors and GH-ST distribution leads to a significant improvement in the goodness-of-fit for the S&P 500 index returns dataset over the usual normal model.
NASA Astrophysics Data System (ADS)
Rosas-Carbajal, Marina; Linde, Niklas; Kalscheuer, Thomas; Vrugt, Jasper A.
2014-03-01
Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.
No Control Genes Required: Bayesian Analysis of qRT-PCR Data
Matz, Mikhail V.; Wright, Rachel M.; Scott, James G.
2013-01-01
Background Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. Results In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Conclusions Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R. PMID:23977043
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jennings, Elise; Wolf, Rachel; Sako, Masao
2016-11-09
Cosmological parameter estimation techniques that robustly account for systematic measurement uncertainties will be crucial for the next generation of cosmological surveys. We present a new analysis method, superABC, for obtaining cosmological constraints from Type Ia supernova (SN Ia) light curves using Approximate Bayesian Computation (ABC) without any likelihood assumptions. The ABC method works by using a forward model simulation of the data where systematic uncertainties can be simulated and marginalized over. A key feature of the method presented here is the use of two distinct metrics, the `Tripp' and `Light Curve' metrics, which allow us to compare the simulated data to the observed data set. The Tripp metric takes as input the parameters of models fit to each light curve with the SALT-II method, whereas the Light Curve metric uses the measured fluxes directly without model fitting. We apply the superABC sampler to a simulated data set ofmore » $$\\sim$$1000 SNe corresponding to the first season of the Dark Energy Survey Supernova Program. Varying $$\\Omega_m, w_0, \\alpha$$ and $$\\beta$$ and a magnitude offset parameter, with no systematics we obtain $$\\Delta(w_0) = w_0^{\\rm true} - w_0^{\\rm best \\, fit} = -0.036\\pm0.109$$ (a $$\\sim11$$% 1$$\\sigma$$ uncertainty) using the Tripp metric and $$\\Delta(w_0) = -0.055\\pm0.068$$ (a $$\\sim7$$% 1$$\\sigma$$ uncertainty) using the Light Curve metric. Including 1% calibration uncertainties in four passbands, adding 4 more parameters, we obtain $$\\Delta(w_0) = -0.062\\pm0.132$$ (a $$\\sim14$$% 1$$\\sigma$$ uncertainty) using the Tripp metric. Overall we find a $17$% increase in the uncertainty on $$w_0$$ with systematics compared to without. We contrast this with a MCMC approach where systematic effects are approximately included. We find that the MCMC method slightly underestimates the impact of calibration uncertainties for this simulated data set.« less
Parameter Estimation of Partial Differential Equation Models.
Xun, Xiaolei; Cao, Jiguo; Mallick, Bani; Carroll, Raymond J; Maity, Arnab
2013-01-01
Partial differential equation (PDE) models are commonly used to model complex dynamic systems in applied sciences such as biology and finance. The forms of these PDE models are usually proposed by experts based on their prior knowledge and understanding of the dynamic system. Parameters in PDE models often have interesting scientific interpretations, but their values are often unknown, and need to be estimated from the measurements of the dynamic system in the present of measurement errors. Most PDEs used in practice have no analytic solutions, and can only be solved with numerical methods. Currently, methods for estimating PDE parameters require repeatedly solving PDEs numerically under thousands of candidate parameter values, and thus the computational load is high. In this article, we propose two methods to estimate parameters in PDE models: a parameter cascading method and a Bayesian approach. In both methods, the underlying dynamic process modeled with the PDE model is represented via basis function expansion. For the parameter cascading method, we develop two nested levels of optimization to estimate the PDE parameters. For the Bayesian method, we develop a joint model for data and the PDE, and develop a novel hierarchical model allowing us to employ Markov chain Monte Carlo (MCMC) techniques to make posterior inference. Simulation studies show that the Bayesian method and parameter cascading method are comparable, and both outperform other available methods in terms of estimation accuracy. The two methods are demonstrated by estimating parameters in a PDE model from LIDAR data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hou Fengji; Hogg, David W.; Goodman, Jonathan
Markov chain Monte Carlo (MCMC) proves to be powerful for Bayesian inference and in particular for exoplanet radial velocity fitting because MCMC provides more statistical information and makes better use of data than common approaches like chi-square fitting. However, the nonlinear density functions encountered in these problems can make MCMC time-consuming. In this paper, we apply an ensemble sampler respecting affine invariance to orbital parameter extraction from radial velocity data. This new sampler has only one free parameter, and does not require much tuning for good performance, which is important for automatization. The autocorrelation time of this sampler is approximatelymore » the same for all parameters and far smaller than Metropolis-Hastings, which means it requires many fewer function calls to produce the same number of independent samples. The affine-invariant sampler speeds up MCMC by hundreds of times compared with Metropolis-Hastings in the same computing situation. This novel sampler would be ideal for projects involving large data sets such as statistical investigations of planet distribution. The biggest obstacle to ensemble samplers is the existence of multiple local optima; we present a clustering technique to deal with local optima by clustering based on the likelihood of the walkers in the ensemble. We demonstrate the effectiveness of the sampler on real radial velocity data.« less
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.
2014-11-01
Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.
Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A
2017-06-15
Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Miftahurrohmah, Brina; Iriawan, Nur; Fithriasari, Kartika
2017-06-01
Stocks are known as the financial instruments traded in the capital market which have a high level of risk. Their risks are indicated by their uncertainty of their return which have to be accepted by investors in the future. The higher the risk to be faced, the higher the return would be gained. Therefore, the measurements need to be made against the risk. Value at Risk (VaR) as the most popular risk measurement method, is frequently ignore when the pattern of return is not uni-modal Normal. The calculation of the risks using VaR method with the Normal Mixture Autoregressive (MNAR) approach has been considered. This paper proposes VaR method couple with the Mixture Laplace Autoregressive (MLAR) that would be implemented for analysing the first three biggest capitalization Islamic stock return in JII, namely PT. Astra International Tbk (ASII), PT. Telekomunikasi Indonesia Tbk (TLMK), and PT. Unilever Indonesia Tbk (UNVR). Parameter estimation is performed by employing Bayesian Markov Chain Monte Carlo (MCMC) approaches.
Convergence analysis of surrogate-based methods for Bayesian inverse problems
NASA Astrophysics Data System (ADS)
Yan, Liang; Zhang, Yuan-Xiang
2017-12-01
The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.
Emulation: A fast stochastic Bayesian method to eliminate model space
NASA Astrophysics Data System (ADS)
Roberts, Alan; Hobbs, Richard; Goldstein, Michael
2010-05-01
Joint inversion of large 3D datasets has been the goal of geophysicists ever since the datasets first started to be produced. There are two broad approaches to this kind of problem, traditional deterministic inversion schemes and more recently developed Bayesian search methods, such as MCMC (Markov Chain Monte Carlo). However, using both these kinds of schemes has proved prohibitively expensive, both in computing power and time cost, due to the normally very large model space which needs to be searched using forward model simulators which take considerable time to run. At the heart of strategies aimed at accomplishing this kind of inversion is the question of how to reliably and practicably reduce the size of the model space in which the inversion is to be carried out. Here we present a practical Bayesian method, known as emulation, which can address this issue. Emulation is a Bayesian technique used with considerable success in a number of technical fields, such as in astronomy, where the evolution of the universe has been modelled using this technique, and in the petroleum industry where history matching is carried out of hydrocarbon reservoirs. The method of emulation involves building a fast-to-compute uncertainty-calibrated approximation to a forward model simulator. We do this by modelling the output data from a number of forward simulator runs by a computationally cheap function, and then fitting the coefficients defining this function to the model parameters. By calibrating the error of the emulator output with respect to the full simulator output, we can use this to screen out large areas of model space which contain only implausible models. For example, starting with what may be considered a geologically reasonable prior model space of 10000 models, using the emulator we can quickly show that only models which lie within 10% of that model space actually produce output data which is plausibly similar in character to an observed dataset. We can thus much more tightly constrain the input model space for a deterministic inversion or MCMC method. By using this technique jointly on several datasets (specifically seismic, gravity, and magnetotelluric (MT) describing the same region), we can include in our modelling uncertainties in the data measurements, the relationships between the various physical parameters involved, as well as the model representation uncertainty, and at the same time further reduce the range of plausible models to several percent of the original model space. Being stochastic in nature, the output posterior parameter distributions also allow our understanding of/beliefs about a geological region can be objectively updated, with full assessment of uncertainties, and so the emulator is also an inversion-type tool in it's own right, with the advantage (as with any Bayesian method) that our uncertainties from all sources (both data and model) can be fully evaluated.
NASA Astrophysics Data System (ADS)
Ravenna, Matteo; Lebedev, Sergei
2018-04-01
Seismic anisotropy provides important information on the deformation history of the Earth's interior. Rayleigh and Love surface-waves are sensitive to and can be used to determine both radial and azimuthal shear-wave anisotropies at depth, but parameter trade-offs give rise to substantial model non-uniqueness. Here, we explore the trade-offs between isotropic and anisotropic structure parameters and present a suite of methods for the inversion of surface-wave, phase-velocity curves for radial and azimuthal anisotropies. One Markov chain Monte Carlo (McMC) implementation inverts Rayleigh and Love dispersion curves for a radially anisotropic shear velocity profile of the crust and upper mantle. Another McMC implementation inverts Rayleigh phase velocities and their azimuthal anisotropy for profiles of vertically polarized shear velocity and its depth-dependent azimuthal anisotropy. The azimuthal anisotropy inversion is fully non-linear, with the forward problem solved numerically at different azimuths for every model realization, which ensures that any linearization biases are avoided. The computations are performed in parallel, in order to reduce the computing time. The often challenging issue of data noise estimation is addressed by means of a Hierarchical Bayesian approach, with the variance of the noise treated as an unknown during the radial anisotropy inversion. In addition to the McMC inversions, we also present faster, non-linear gradient-search inversions for the same anisotropic structure. The results of the two approaches are mutually consistent; the advantage of the McMC inversions is that they provide a measure of uncertainty of the models. Applying the method to broad-band data from the Baikal-central Mongolia region, we determine radial anisotropy from the crust down to the transition-zone depths. Robust negative anisotropy (Vsh < Vsv) in the asthenosphere, at 100-300 km depths, presents strong new evidence for a vertical component of asthenospheric flow. This is consistent with an upward flow from below the thick lithosphere of the Siberian Craton to below the thinner lithosphere of central Mongolia, likely to give rise to decompression melting and the scattered, sporadic volcanism observed in the Baikal Rift area, as proposed previously. Inversion of phase-velocity data from west-central Italy for azimuthal anisotropy reveals a clear change in the shear-wave fast-propagation direction at 70-100 km depths, near the lithosphere-asthenosphere boundary. The orientation of the fabric in the lithosphere is roughly E-W, parallel to the direction of stretching over the last 10 m.y. The orientation of the fabric in the asthenosphere is NW-SE, matching the fast directions inferred from shear-wave splitting and probably indicating the direction of the asthenospheric flow.
NASA Technical Reports Server (NTRS)
DeLannoy, Gabrielle J. M.; Reichle, Rolf H.; Vrugt, Jasper A.
2013-01-01
Uncertainties in L-band (1.4 GHz) radiative transfer modeling (RTM) affect the simulation of brightness temperatures (Tb) over land and the inversion of satellite-observed Tb into soil moisture retrievals. In particular, accurate estimates of the microwave soil roughness, vegetation opacity and scattering albedo for large-scale applications are difficult to obtain from field studies and often lack an uncertainty estimate. Here, a Markov Chain Monte Carlo (MCMC) simulation method is used to determine satellite-scale estimates of RTM parameters and their posterior uncertainty by minimizing the misfit between long-term averages and standard deviations of simulated and observed Tb at a range of incidence angles, at horizontal and vertical polarization, and for morning and evening overpasses. Tb simulations are generated with the Goddard Earth Observing System (GEOS-5) and confronted with Tb observations from the Soil Moisture Ocean Salinity (SMOS) mission. The MCMC algorithm suggests that the relative uncertainty of the RTM parameter estimates is typically less than 25 of the maximum a posteriori density (MAP) parameter value. Furthermore, the actual root-mean-square-differences in long-term Tb averages and standard deviations are found consistent with the respective estimated total simulation and observation error standard deviations of m3.1K and s2.4K. It is also shown that the MAP parameter values estimated through MCMC simulation are in close agreement with those obtained with Particle Swarm Optimization (PSO).
NASA Astrophysics Data System (ADS)
Irving, J.; Koepke, C.; Elsheikh, A. H.
2017-12-01
Bayesian solutions to geophysical and hydrological inverse problems are dependent upon a forward process model linking subsurface parameters to measured data, which is typically assumed to be known perfectly in the inversion procedure. However, in order to make the stochastic solution of the inverse problem computationally tractable using, for example, Markov-chain-Monte-Carlo (MCMC) methods, fast approximations of the forward model are commonly employed. This introduces model error into the problem, which has the potential to significantly bias posterior statistics and hamper data integration efforts if not properly accounted for. Here, we present a new methodology for addressing the issue of model error in Bayesian solutions to hydrogeophysical inverse problems that is geared towards the common case where these errors cannot be effectively characterized globally through some parametric statistical distribution or locally based on interpolation between a small number of computed realizations. Rather than focusing on the construction of a global or local error model, we instead work towards identification of the model-error component of the residual through a projection-based approach. In this regard, pairs of approximate and detailed model runs are stored in a dictionary that grows at a specified rate during the MCMC inversion procedure. At each iteration, a local model-error basis is constructed for the current test set of model parameters using the K-nearest neighbour entries in the dictionary, which is then used to separate the model error from the other error sources before computing the likelihood of the proposed set of model parameters. We demonstrate the performance of our technique on the inversion of synthetic crosshole ground-penetrating radar traveltime data for three different subsurface parameterizations of varying complexity. The synthetic data are generated using the eikonal equation, whereas a straight-ray forward model is assumed in the inversion procedure. In each case, the developed model-error approach enables to remove posterior bias and obtain a more realistic characterization of uncertainty.
Ali, Syed Shujait; Yu, Yan; Pfosser, Martin; Wetschnig, Wolfgang
2012-01-01
Background and Aims Subfamily Hyacinthoideae (Hyacinthaceae) comprises more than 400 species. Members are distributed in sub-Saharan Africa, Madagascar, India, eastern Asia, the Mediterranean region and Eurasia. Hyacinthoideae, like many other plant lineages, show disjunct distribution patterns. The aim of this study was to reconstruct the biogeographical history of Hyacinthoideae based on phylogenetic analyses, to find the possible ancestral range of Hyacinthoideae and to identify factors responsible for the current disjunct distribution pattern. Methods Parsimony and Bayesian approaches were applied to obtain phylogenetic trees, based on sequences of the trnL-F region. Biogeographical inferences were obtained by applying statistical dispersal-vicariance analysis (S-DIVA) and Bayesian binary MCMC (BBM) analysis implemented in RASP (Reconstruct Ancestral State in Phylogenies). Key Results S-DIVA and BBM analyses suggest that the Hyacinthoideae clade seem to have originated in sub-Saharan Africa. Dispersal and vicariance played vital roles in creating the disjunct distribution pattern. Results also suggest an early dispersal to the Mediterranean region, and thus the northward route (from sub-Saharan Africa to Mediterranean) of dispersal is plausible for members of subfamily Hyacinthoideae. Conclusions Biogeographical analyses reveal that subfamily Hyacinthoideae has originated in sub-Saharan Africa. S-DIVA indicates an early dispersal event to the Mediterranean region followed by a vicariance event, which resulted in Hyacintheae and Massonieae tribes. By contrast, BBM analysis favours dispersal to the Mediterranean region, eastern Asia and Europe. Biogeographical analysis suggests that sub-Saharan Africa and the Mediterranean region have played vital roles as centres of diversification and radiation within subfamily Hyacinthoideae. In this bimodal distribution pattern, sub-Saharan Africa is the primary centre of diversity and the Mediterranean region is the secondary centre of diversity. Sub-Saharan Africa was the source area for radiation toward Madagascar, the Mediterranean region and India. Radiations occurred from the Mediterranean region to eastern Asia, Europe, western Asia and India. PMID:22039008
Bayesian inversion using a geologically realistic and discrete model space
NASA Astrophysics Data System (ADS)
Jaeggli, C.; Julien, S.; Renard, P.
2017-12-01
Since the early days of groundwater modeling, inverse methods play a crucial role. Many research and engineering groups aim to infer extensive knowledge of aquifer parameters from a sparse set of observations. Despite decades of dedicated research on this topic, there are still several major issues to be solved. In the hydrogeological framework, one is often confronted with underground structures that present very sharp contrasts of geophysical properties. In particular, subsoil structures such as karst conduits, channels, faults, or lenses, strongly influence groundwater flow and transport behavior of the underground. For this reason it can be essential to identify their location and shape very precisely. Unfortunately, when inverse methods are specially trained to consider such complex features, their computation effort often becomes unaffordably high. The following work is an attempt to solve this dilemma. We present a new method that is, in some sense, a compromise between the ergodicity of Markov chain Monte Carlo (McMC) methods and the efficient handling of data by the ensemble based Kalmann filters. The realistic and complex random fields are generated by a Multiple-Point Statistics (MPS) tool. Nonetheless, it is applicable with any conditional geostatistical simulation tool. Furthermore, the algorithm is independent of any parametrization what becomes most important when two parametric systems are equivalent (permeability and resistivity, speed and slowness, etc.). When compared to two existing McMC schemes, the computational effort was divided by a factor of 12.
Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach
Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel
2014-01-01
Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474
Swallow, Ben; Buckland, Stephen T; King, Ruth; Toms, Mike P
2016-03-01
The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero-inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean-variance relationships a priori. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase. © 2015 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Stochastic static fault slip inversion from geodetic data with non-negativity and bounds constraints
NASA Astrophysics Data System (ADS)
Nocquet, J.-M.
2018-04-01
Despite surface displacements observed by geodesy are linear combinations of slip at faults in an elastic medium, determining the spatial distribution of fault slip remains a ill-posed inverse problem. A widely used approach to circumvent the illness of the inversion is to add regularization constraints in terms of smoothing and/or damping so that the linear system becomes invertible. However, the choice of regularization parameters is often arbitrary, and sometimes leads to significantly different results. Furthermore, the resolution analysis is usually empirical and cannot be made independently of the regularization. The stochastic approach of inverse problems (Tarantola & Valette 1982; Tarantola 2005) provides a rigorous framework where the a priori information about the searched parameters is combined with the observations in order to derive posterior probabilities of the unkown parameters. Here, I investigate an approach where the prior probability density function (pdf) is a multivariate Gaussian function, with single truncation to impose positivity of slip or double truncation to impose positivity and upper bounds on slip for interseismic modeling. I show that the joint posterior pdf is similar to the linear untruncated Gaussian case and can be expressed as a Truncated Multi-Variate Normal (TMVN) distribution. The TMVN form can then be used to obtain semi-analytical formulas for the single, two-dimensional or n-dimensional marginal pdf. The semi-analytical formula involves the product of a Gaussian by an integral term that can be evaluated using recent developments in TMVN probabilities calculations (e.g. Genz & Bretz 2009). Posterior mean and covariance can also be efficiently derived. I show that the Maximum Posterior (MAP) can be obtained using a Non-Negative Least-Squares algorithm (Lawson & Hanson 1974) for the single truncated case or using the Bounded-Variable Least-Squares algorithm (Stark & Parker 1995) for the double truncated case. I show that the case of independent uniform priors can be approximated using TMVN. The numerical equivalence to Bayesian inversions using Monte Carlo Markov Chain (MCMC) sampling is shown for a synthetic example and a real case for interseismic modeling in Central Peru. The TMVN method overcomes several limitations of the Bayesian approach using MCMC sampling. First, the need of computer power is largely reduced. Second, unlike Bayesian MCMC based approach, marginal pdf, mean, variance or covariance are obtained independently one from each other. Third, the probability and cumulative density functions can be obtained with any density of points. Finally, determining the Maximum Posterior (MAP) is extremely fast.
RadVel: The Radial Velocity Modeling Toolkit
NASA Astrophysics Data System (ADS)
Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan
2018-04-01
RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.
A Bayesian method for detecting pairwise associations in compositional data
Ventz, Steffen; Huttenhower, Curtis
2017-01-01
Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats. PMID:29140991
Honest Importance Sampling with Multiple Markov Chains
Tan, Aixin; Doss, Hani; Hobert, James P.
2017-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.
Tan, Aixin; Doss, Hani; Hobert, James P
2015-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
Park, Subok; Clarkson, Eric
2010-01-01
The Bayesian ideal observer is optimal among all observers and sets an absolute upper bound for the performance of any observer in classification tasks [Van Trees, Detection, Estimation, and Modulation Theory, Part I (Academic, 1968).]. Therefore, the ideal observer should be used for objective image quality assessment whenever possible. However, computation of ideal-observer performance is difficult in practice because this observer requires the full description of unknown, statistical properties of high-dimensional, complex data arising in real life problems. Previously, Markov-chain Monte Carlo (MCMC) methods were developed by Kupinski et al. [J. Opt. Soc. Am. A 20, 430(2003) ] and by Park et al. [J. Opt. Soc. Am. A 24, B136 (2007) and IEEE Trans. Med. Imaging 28, 657 (2009) ] to estimate the performance of the ideal observer and the channelized ideal observer (CIO), respectively, in classification tasks involving non-Gaussian random backgrounds. However, both algorithms had the disadvantage of long computation times. We propose a fast MCMC for real-time estimation of the likelihood ratio for the CIO. Our simulation results show that our method has the potential to speed up ideal-observer performance in tasks involving complex data when efficient channels are used for the CIO. PMID:19884916
Fortunato, Laura; Holden, Clare; Mace, Ruth
2006-12-01
Significant amounts of wealth have been exchanged as part of marriage settlements throughout history. Although various models have been proposed for interpreting these practices, their development over time has not been investigated systematically. In this paper we use a Bayesian MCMC phylogenetic comparative approach to reconstruct the evolution of two forms of wealth transfers at marriage, dowry and bridewealth, for 51 Indo-European cultural groups. Results indicate that dowry is more likely to have been the ancestral practice, and that a minimum of four changes to bridewealth is necessary to explain the observed distribution of the two states across the cultural groups.
Bayesian tomography by interacting Markov chains
NASA Astrophysics Data System (ADS)
Romary, T.
2017-12-01
In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.
Estimating the Properties of Hard X-Ray Solar Flares by Constraining Model Parameters
NASA Technical Reports Server (NTRS)
Ireland, J.; Tolbert, A. K.; Schwartz, R. A.; Holman, G. D.; Dennis, B. R.
2013-01-01
We wish to better constrain the properties of solar flares by exploring how parameterized models of solar flares interact with uncertainty estimation methods. We compare four different methods of calculating uncertainty estimates in fitting parameterized models to Ramaty High Energy Solar Spectroscopic Imager X-ray spectra, considering only statistical sources of error. Three of the four methods are based on estimating the scale-size of the minimum in a hypersurface formed by the weighted sum of the squares of the differences between the model fit and the data as a function of the fit parameters, and are implemented as commonly practiced. The fourth method is also based on the difference between the data and the model, but instead uses Bayesian data analysis and Markov chain Monte Carlo (MCMC) techniques to calculate an uncertainty estimate. Two flare spectra are modeled: one from the Geostationary Operational Environmental Satellite X1.3 class flare of 2005 January 19, and the other from the X4.8 flare of 2002 July 23.We find that the four methods give approximately the same uncertainty estimates for the 2005 January 19 spectral fit parameters, but lead to very different uncertainty estimates for the 2002 July 23 spectral fit. This is because each method implements different analyses of the hypersurface, yielding method-dependent results that can differ greatly depending on the shape of the hypersurface. The hypersurface arising from the 2005 January 19 analysis is consistent with a normal distribution; therefore, the assumptions behind the three non- Bayesian uncertainty estimation methods are satisfied and similar estimates are found. The 2002 July 23 analysis shows that the hypersurface is not consistent with a normal distribution, indicating that the assumptions behind the three non-Bayesian uncertainty estimation methods are not satisfied, leading to differing estimates of the uncertainty. We find that the shape of the hypersurface is crucial in understanding the output from each uncertainty estimation technique, and that a crucial factor determining the shape of hypersurface is the location of the low-energy cutoff relative to energies where the thermal emission dominates. The Bayesian/MCMC approach also allows us to provide detailed information on probable values of the low-energy cutoff, Ec, a crucial parameter in defining the energy content of the flare-accelerated electrons. We show that for the 2002 July 23 flare data, there is a 95% probability that Ec lies below approximately 40 keV, and a 68% probability that it lies in the range 7-36 keV. Further, the low-energy cutoff is more likely to be in the range 25-35 keV than in any other 10 keV wide energy range. The low-energy cutoff for the 2005 January 19 flare is more tightly constrained to 107 +/- 4 keV with 68% probability.
NASA Astrophysics Data System (ADS)
Gaál, Ladislav; Szolgay, Ján; Kohnová, Silvia; Hlavčová, Kamila; Viglione, Alberto
2010-01-01
The paper deals with at-site flood frequency estimation in the case when also information on hydrological events from the past with extraordinary magnitude are available. For the joint frequency analysis of systematic observations and historical data, respectively, the Bayesian framework is chosen, which, through adequately defined likelihood functions, allows for incorporation of different sources of hydrological information, e.g., maximum annual flood peaks, historical events as well as measurement errors. The distribution of the parameters of the fitted distribution function and the confidence intervals of the flood quantiles are derived by means of the Markov chain Monte Carlo simulation (MCMC) technique. The paper presents a sensitivity analysis related to the choice of the most influential parameters of the statistical model, which are the length of the historical period
Bayesian Analysis of Biogeography when the Number of Areas is Large
Landis, Michael J.; Matzke, Nicholas J.; Moore, Brian R.; Huelsenbeck, John P.
2013-01-01
Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a “data-augmentation” approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea. [ancestral area analysis; Bayesian biogeographic inference; data augmentation; historical biogeography; Markov chain Monte Carlo.] PMID:23736102
An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations
Majumdar, Arunabha; Haldar, Tanushree; Bhattacharya, Sourabh; Witte, John S.
2018-01-01
Simultaneous analysis of genetic associations with multiple phenotypes may reveal shared genetic susceptibility across traits (pleiotropy). For a locus exhibiting overall pleiotropy, it is important to identify which specific traits underlie this association. We propose a Bayesian meta-analysis approach (termed CPBayes) that uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. This method uses a unified Bayesian statistical framework based on a spike and slab prior. CPBayes performs a fully Bayesian analysis by employing the Markov Chain Monte Carlo (MCMC) technique Gibbs sampling. It takes into account heterogeneity in the size and direction of the genetic effects across traits. It can be applied to both cohort data and separate studies of multiple traits having overlapping or non-overlapping subjects. Simulations show that CPBayes can produce higher accuracy in the selection of associated traits underlying a pleiotropic signal than the subset-based meta-analysis ASSET. We used CPBayes to undertake a genome-wide pleiotropic association study of 22 traits in the large Kaiser GERA cohort and detected six independent pleiotropic loci associated with at least two phenotypes. This includes a locus at chromosomal region 1q24.2 which exhibits an association simultaneously with the risk of five different diseases: Dermatophytosis, Hemorrhoids, Iron Deficiency, Osteoporosis and Peripheral Vascular Disease. We provide an R-package ‘CPBayes’ implementing the proposed method. PMID:29432419
NASA Astrophysics Data System (ADS)
Ojha, Maheswar; Maiti, Saumen
2016-03-01
A novel approach based on the concept of Bayesian neural network (BNN) has been implemented for classifying sediment boundaries using downhole log data obtained during Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. The Bayesian framework in conjunction with Markov Chain Monte Carlo (MCMC)/hybrid Monte Carlo (HMC) learning paradigm has been applied to constrain the lithology boundaries using density, density porosity, gamma ray, sonic P-wave velocity and electrical resistivity at the Hole U1344A. We have demonstrated the effectiveness of our supervised classification methodology by comparing our findings with a conventional neural network and a Bayesian neural network optimized by scaled conjugate gradient method (SCG), and tested the robustness of the algorithm in the presence of red noise in the data. The Bayesian results based on the HMC algorithm (BNN.HMC) resolve detailed finer structures at certain depths in addition to main lithology such as silty clay, diatom clayey silt and sandy silt. Our method also recovers the lithology information from a depth ranging between 615 and 655 m Wireline log Matched depth below Sea Floor of no core recovery zone. Our analyses demonstrate that the BNN based approach renders robust means for the classification of complex lithology successions at the Hole U1344A, which could be very useful for other studies and understanding the oceanic crustal inhomogeneity and structural discontinuities.
Two new methods to fit models for network meta-analysis with random inconsistency effects.
Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang
2016-07-28
Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Sarkar, Avik
2014-05-16
Computer experiments (numerical simulations) are widely used in scientific research to study and predict the behavior of complex systems, which usually have responses consisting of a set of distinct outputs. The computational cost of the simulations at high resolution are often expensive and become impractical for parametric studies at different input values. To overcome these difficulties we develop a Bayesian treed multivariate Gaussian process (BTMGP) as an extension of the Bayesian treed Gaussian process (BTGP) in order to model and evaluate a multivariate process. A suitable choice of covariance function and the prior distributions facilitates the different Markov chain Montemore » Carlo (MCMC) movements. We utilize this model to sequentially sample the input space for the most informative values, taking into account model uncertainty and expertise gained. A simulation study demonstrates the use of the proposed method and compares it with alternative approaches. We apply the sequential sampling technique and BTMGP to model the multiphase flow in a full scale regenerator of a carbon capture unit. The application presented in this paper is an important tool for research into carbon dioxide emissions from thermal power plants.« less
2011-01-01
Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.
2014-12-01
Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
Estimation of under-reporting in epidemics using approximations.
Gamado, Kokouvi; Streftaris, George; Zachary, Stan
2017-06-01
Under-reporting in epidemics, when it is ignored, leads to under-estimation of the infection rate and therefore of the reproduction number. In the case of stochastic models with temporal data, a usual approach for dealing with such issues is to apply data augmentation techniques through Bayesian methodology. Departing from earlier literature approaches implemented using reversible jump Markov chain Monte Carlo (RJMCMC) techniques, we make use of approximations to obtain faster estimation with simple MCMC. Comparisons among the methods developed here, and with the RJMCMC approach, are carried out and highlight that approximation-based methodology offers useful alternative inference tools for large epidemics, with a good trade-off between time cost and accuracy.
EXOFIT: orbital parameters of extrasolar planets from radial velocities
NASA Astrophysics Data System (ADS)
Balan, Sreekumar T.; Lahav, Ofer
2009-04-01
Retrieval of orbital parameters of extrasolar planets poses considerable statistical challenges. Due to sparse sampling, measurement errors, parameters degeneracy and modelling limitations, there are no unique values of basic parameters, such as period and eccentricity. Here, we estimate the orbital parameters from radial velocity data in a Bayesian framework by utilizing Markov Chain Monte Carlo (MCMC) simulations with the Metropolis-Hastings algorithm. We follow a methodology recently proposed by Gregory and Ford. Our implementation of MCMC is based on the object-oriented approach outlined by Graves. We make our resulting code, EXOFIT, publicly available with this paper. It can search for either one or two planets as illustrated on mock data. As an example we re-analysed the orbital solution of companions to HD 187085 and HD 159868 from the published radial velocity data. We confirm the degeneracy reported for orbital parameters of the companion to HD 187085, and show that a low-eccentricity orbit is more probable for this planet. For HD 159868, we obtained slightly different orbital solution and a relatively high `noise' factor indicating the presence of an unaccounted signal in the radial velocity data. EXOFIT is designed in such a way that it can be extended for a variety of probability models, including different Bayesian priors.
NASA Astrophysics Data System (ADS)
Zaib Jadoon, Khan; Umer Altaf, Muhammad; McCabe, Matthew Francis; Hoteit, Ibrahim; Muhammad, Nisar; Moghadas, Davood; Weihermüller, Lutz
2017-10-01
A substantial interpretation of electromagnetic induction (EMI) measurements requires quantifying optimal model parameters and uncertainty of a nonlinear inverse problem. For this purpose, an adaptive Bayesian Markov chain Monte Carlo (MCMC) algorithm is used to assess multi-orientation and multi-offset EMI measurements in an agriculture field with non-saline and saline soil. In MCMC the posterior distribution is computed using Bayes' rule. The electromagnetic forward model based on the full solution of Maxwell's equations was used to simulate the apparent electrical conductivity measured with the configurations of EMI instrument, the CMD Mini-Explorer. Uncertainty in the parameters for the three-layered earth model are investigated by using synthetic data. Our results show that in the scenario of non-saline soil, the parameters of layer thickness as compared to layers electrical conductivity are not very informative and are therefore difficult to resolve. Application of the proposed MCMC-based inversion to field measurements in a drip irrigation system demonstrates that the parameters of the model can be well estimated for the saline soil as compared to the non-saline soil, and provides useful insight about parameter uncertainty for the assessment of the model outputs.
Shao, Kan; Small, Mitchell J
2011-10-01
A methodology is presented for assessing the information value of an additional dosage experiment in existing bioassay studies. The analysis demonstrates the potential reduction in the uncertainty of toxicity metrics derived from expanded studies, providing insights for future studies. Bayesian methods are used to fit alternative dose-response models using Markov chain Monte Carlo (MCMC) simulation for parameter estimation and Bayesian model averaging (BMA) is used to compare and combine the alternative models. BMA predictions for benchmark dose (BMD) are developed, with uncertainty in these predictions used to derive the lower bound BMDL. The MCMC and BMA results provide a basis for a subsequent Monte Carlo analysis that backcasts the dosage where an additional test group would have been most beneficial in reducing the uncertainty in the BMD prediction, along with the magnitude of the expected uncertainty reduction. Uncertainty reductions are measured in terms of reduced interval widths of predicted BMD values and increases in BMDL values that occur as a result of this reduced uncertainty. The methodology is illustrated using two existing data sets for TCDD carcinogenicity, fitted with two alternative dose-response models (logistic and quantal-linear). The example shows that an additional dose at a relatively high value would have been most effective for reducing the uncertainty in BMA BMD estimates, with predicted reductions in the widths of uncertainty intervals of approximately 30%, and expected increases in BMDL values of 5-10%. The results demonstrate that dose selection for studies that subsequently inform dose-response models can benefit from consideration of how these models will be fit, combined, and interpreted. © 2011 Society for Risk Analysis.
Impact of Federal drug law enforcement on the supply of heroin in Australia.
Smithson, Michael; McFadden, Michael; Mwesigye, Sue-Ellen
2005-08-01
To conduct an empirical investigation of the efficacy of law enforcement in reducing heroin supply in Australia. Specifically, this paper addresses the question of whether heroin purity levels in the Australian Capital Territory (ACT) could be predicted by heroin seizures at the national level by the Australian Federal Police (AFP) in the preceding year. We considered two forms of evidence. First, a Bayesian Markov Chain Monte Carlo (MCMC) change-point model was used to discover (a) if there was a substantial increase in heroin seizures by the AFP, (b) when the increase began and (c) whether it occurred after increased funding to the Australian Federal Police for the purpose of drug law enforcement. Second, standard time-series methods were used to ascertain whether fluctuations in heroin seizure weights or the frequency of large-scale seizures after the aforementioned changes in seizure levels predicted fluctuations in heroin purity levels in the ACT after autocorrelation had been removed from the purity series. A Bayesian MCMC change-point model supported the hypothesis that heroin seizures rapidly increased about a year before the estimated decline in heroin purity and after the increased funding of AFP. The autoregression models suggested that 10-20% of the variance in the residuals of the heroin purity series was predicted by appropriately lagged residuals of the seizure-number and log-weight series, after autocorrelation had been removed. The overall results are consistent with the hypothesis that large-scale heroin seizures by the AFP reduce street-level heroin supply a year or so later, although the short-term dynamics suggest an 'opponent' response to residual fluctuations in seizures. To our knowledge, this is first time a connection has been identified between large-scale heroin seizures and street-level supply.
Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula
NASA Astrophysics Data System (ADS)
Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.
2016-03-01
A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.
Bayesian Treed Calibration: An Application to Carbon Capture With AX Sorbent
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Lai, Kevin
2017-01-02
In cases where field or experimental measurements are not available, computer models can model real physical or engineering systems to reproduce their outcomes. They are usually calibrated in light of experimental data to create a better representation of the real system. Statistical methods, based on Gaussian processes, for calibration and prediction have been especially important when the computer models are expensive and experimental data limited. In this paper, we develop the Bayesian treed calibration (BTC) as an extension of standard Gaussian process calibration methods to deal with non-stationarity computer models and/or their discrepancy from the field (or experimental) data. Ourmore » proposed method partitions both the calibration and observable input space, based on a binary tree partitioning, into sub-regions where existing model calibration methods can be applied to connect a computer model with the real system. The estimation of the parameters in the proposed model is carried out using Markov chain Monte Carlo (MCMC) computational techniques. Different strategies have been applied to improve mixing. We illustrate our method in two artificial examples and a real application that concerns the capture of carbon dioxide with AX amine based sorbents. The source code and the examples analyzed in this paper are available as part of the supplementary materials.« less
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2016-12-01
Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.
Bayesian analysis of biogeography when the number of areas is large.
Landis, Michael J; Matzke, Nicholas J; Moore, Brian R; Huelsenbeck, John P
2013-11-01
Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the number of areas that can be specified. We propose a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on a "data-augmentation" approach, in which we first populate the tree with a history of biogeographic events that is consistent with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a mechanistic interpretation of the instantaneous-rate matrix, which specifies both the exponential waiting times between biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared. Our approach is implemented in the program, BayArea.
NASA Astrophysics Data System (ADS)
Marie, S.; Irving, J. D.; Looms, M. C.; Nielsen, L.; Holliger, K.
2011-12-01
Geophysical methods such as ground-penetrating radar (GPR) can provide valuable information on the hydrological properties of the vadose zone. In particular, there is evidence to suggest that the stochastic inversion of such data may allow for significant reductions in uncertainty regarding subsurface van-Genuchten-Mualem (VGM) parameters, which characterize unsaturated hydrodynamic behaviour as defined by the combination of the water retention and hydraulic conductivity functions. A significant challenge associated with the use of geophysical methods in a hydrological context is that they generally exhibit an indirect and/or weak sensitivity to the hydraulic parameters of interest. A novel and increasingly popular means of addressing this issue involves the acquisition of geophysical data in a time-lapse fashion while changes occur in the hydrological condition of the probed subsurface region. Another significant challenge when attempting to use geophysical data for the estimation of subsurface hydrological properties is the inherent non-linearity and non-uniqueness of the corresponding inverse problems. Stochastic inversion approaches have the advantage of providing a comprehensive exploration of the model space, which makes them ideally suited for addressing such issues. In this work, we present the stochastic inversion of time-lapse zero-offset-profile (ZOP) crosshole GPR traveltime data, collected during a forced infiltration experiment at the Arreneas field site in Denmark, in order to estimate subsurface VGM parameters and their corresponding uncertainties. We do this using a Bayesian Markov-chain-Monte-Carlo (MCMC) inversion approach. We find that the Bayesian-MCMC methodology indeed allows for a substantial refinement in the inferred posterior parameter distributions of the VGM parameters as compared to the corresponding priors. To further understand the potential impact on capturing the underlying hydrological behaviour, we also explore how the posterior VGM parameter distributions affect the hydrodynamic characteristics. In doing so, we find clear evidence that the approach pursued in this study allows for effective characterization of the hydrological behaviour of the probed subsurface region.
Cosmological parameter estimation using Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Prasad, J.; Souradeep, T.
2014-03-01
Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.
RadVel: General toolkit for modeling Radial Velocities
NASA Astrophysics Data System (ADS)
Fulton, Benjamin J.; Petigura, Erik A.; Blunt, Sarah; Sinukoff, Evan
2018-01-01
RadVel models Keplerian orbits in radial velocity (RV) time series. The code is written in Python with a fast Kepler's equation solver written in C. It provides a framework for fitting RVs using maximum a posteriori optimization and computing robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel can perform Bayesian model comparison and produces publication quality plots and LaTeX tables.
Grummer, Jared A; Bryson, Robert W; Reeder, Tod W
2014-03-01
Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.
NASA Astrophysics Data System (ADS)
Berradja, Khadidja; Boughanmi, Nabil
2016-09-01
In dynamic cardiac PET FDG studies the assessment of myocardial metabolic rate of glucose (MMRG) requires the knowledge of the blood input function (IF). IF can be obtained by manual or automatic blood sampling and cross calibrated with PET. These procedures are cumbersome, invasive and generate uncertainties. The IF is contaminated by spillover of radioactivity from the adjacent myocardium and this could cause important error in the estimated MMRG. In this study, we show that the IF can be extracted from the images in a rat heart study with 18F-fluorodeoxyglucose (18F-FDG) by means of Independent Component Analysis (ICA) based on Bayesian theory and Markov Chain Monte Carlo (MCMC) sampling method (BICA). Images of the heart from rats were acquired with the Sherbrooke small animal PET scanner. A region of interest (ROI) was drawn around the rat image and decomposed into blood and tissue using BICA. The Statistical study showed that there is a significant difference (p < 0.05) between MMRG obtained with IF extracted by BICA with respect to IF extracted from measured images corrupted with spillover.
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification
NASA Astrophysics Data System (ADS)
Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea
2010-09-01
A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.
NASA Astrophysics Data System (ADS)
Liu, Boda; Liang, Yan
2017-04-01
Markov chain Monte Carlo (MCMC) simulation is a powerful statistical method in solving inverse problems that arise from a wide range of applications. In Earth sciences applications of MCMC simulations are primarily in the field of geophysics. The purpose of this study is to introduce MCMC methods to geochemical inverse problems related to trace element fractionation during mantle melting. MCMC methods have several advantages over least squares methods in deciphering melting processes from trace element abundances in basalts and mantle rocks. Here we use an MCMC method to invert for extent of melting, fraction of melt present during melting, and extent of chemical disequilibrium between the melt and residual solid from REE abundances in clinopyroxene in abyssal peridotites from Mid-Atlantic Ridge, Central Indian Ridge, Southwest Indian Ridge, Lena Trough, and American-Antarctic Ridge. We consider two melting models: one with exact analytical solution and the other without. We solve the latter numerically in a chain of melting models according to the Metropolis-Hastings algorithm. The probability distribution of inverted melting parameters depends on assumptions of the physical model, knowledge of mantle source composition, and constraints from the REE data. Results from MCMC inversion are consistent with and provide more reliable uncertainty estimates than results based on nonlinear least squares inversion. We show that chemical disequilibrium is likely to play an important role in fractionating LREE in residual peridotites during partial melting beneath mid-ocean ridge spreading centers. MCMC simulation is well suited for more complicated but physically more realistic melting problems that do not have analytical solutions.
What are hierarchical models and how do we analyze them?
Royle, Andy
2016-01-01
In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)
INFERENCE FOR INDIVIDUAL-LEVEL MODELS OF INFECTIOUS DISEASES IN LARGE POPULATIONS.
Deardon, Rob; Brooks, Stephen P; Grenfell, Bryan T; Keeling, Matthew J; Tildesley, Michael J; Savill, Nicholas J; Shaw, Darren J; Woolhouse, Mark E J
2010-01-01
Individual Level Models (ILMs), a new class of models, are being applied to infectious epidemic data to aid in the understanding of the spatio-temporal dynamics of infectious diseases. These models are highly flexible and intuitive, and can be parameterised under a Bayesian framework via Markov chain Monte Carlo (MCMC) methods. Unfortunately, this parameterisation can be difficult to implement due to intense computational requirements when calculating the full posterior for large, or even moderately large, susceptible populations, or when missing data are present. Here we detail a methodology that can be used to estimate parameters for such large, and/or incomplete, data sets. This is done in the context of a study of the UK 2001 foot-and-mouth disease (FMD) epidemic.
NASA Astrophysics Data System (ADS)
Rajaona, Harizo; Septier, François; Armand, Patrick; Delignon, Yves; Olry, Christophe; Albergel, Armand; Moussafir, Jacques
2015-12-01
In the eventuality of an accidental or intentional atmospheric release, the reconstruction of the source term using measurements from a set of sensors is an important and challenging inverse problem. A rapid and accurate estimation of the source allows faster and more efficient action for first-response teams, in addition to providing better damage assessment. This paper presents a Bayesian probabilistic approach to estimate the location and the temporal emission profile of a pointwise source. The release rate is evaluated analytically by using a Gaussian assumption on its prior distribution, and is enhanced with a positivity constraint to improve the estimation. The source location is obtained by the means of an advanced iterative Monte-Carlo technique called Adaptive Multiple Importance Sampling (AMIS), which uses a recycling process at each iteration to accelerate its convergence. The proposed methodology is tested using synthetic and real concentration data in the framework of the Fusion Field Trials 2007 (FFT-07) experiment. The quality of the obtained results is comparable to those coming from the Markov Chain Monte Carlo (MCMC) algorithm, a popular Bayesian method used for source estimation. Moreover, the adaptive processing of the AMIS provides a better sampling efficiency by reusing all the generated samples.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Yu; Hou, Zhangshuan; Huang, Maoyi
2013-12-10
This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Two inversion strategies, the deterministic least-square fitting and stochastic Markov-Chain Monte-Carlo (MCMC) - Bayesian inversion approaches, are evaluated by applying them to CLM4 at selected sites. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find thatmore » using model parameters calibrated by the least-square fitting provides little improvements in the model simulations but the sampling-based stochastic inversion approaches are consistent - as more information comes in, the predictive intervals of the calibrated parameters become narrower and the misfits between the calculated and observed responses decrease. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to the different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.« less
NASA Astrophysics Data System (ADS)
Sun, Y.; Hou, Z.; Huang, M.; Tian, F.; Leung, L. Ruby
2013-12-01
This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Both deterministic least-square fitting and stochastic Markov-chain Monte Carlo (MCMC)-Bayesian inversion approaches are evaluated by applying them to CLM4 at selected sites with different climate and soil conditions. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find that using model parameters calibrated by the sampling-based stochastic inversion approaches provides significant improvements in the model simulations compared to using default CLM4 parameter values, and that as more information comes in, the predictive intervals (ranges of posterior distributions) of the calibrated parameters become narrower. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.
Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca
2002-09-01
In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
Simultaneous fitting of genomic-BLUP and Bayes-C components in a genomic prediction model.
Iheshiulor, Oscar O M; Woolliams, John A; Svendsen, Morten; Solberg, Trygve; Meuwissen, Theo H E
2017-08-24
The rapid adoption of genomic selection is due to two key factors: availability of both high-throughput dense genotyping and statistical methods to estimate and predict breeding values. The development of such methods is still ongoing and, so far, there is no consensus on the best approach. Currently, the linear and non-linear methods for genomic prediction (GP) are treated as distinct approaches. The aim of this study was to evaluate the implementation of an iterative method (called GBC) that incorporates aspects of both linear [genomic-best linear unbiased prediction (G-BLUP)] and non-linear (Bayes-C) methods for GP. The iterative nature of GBC makes it less computationally demanding similar to other non-Markov chain Monte Carlo (MCMC) approaches. However, as a Bayesian method, GBC differs from both MCMC- and non-MCMC-based methods by combining some aspects of G-BLUP and Bayes-C methods for GP. Its relative performance was compared to those of G-BLUP and Bayes-C. We used an imputed 50 K single-nucleotide polymorphism (SNP) dataset based on the Illumina Bovine50K BeadChip, which included 48,249 SNPs and 3244 records. Daughter yield deviations for somatic cell count, fat yield, milk yield, and protein yield were used as response variables. GBC was frequently (marginally) superior to G-BLUP and Bayes-C in terms of prediction accuracy and was significantly better than G-BLUP only for fat yield. On average across the four traits, GBC yielded a 0.009 and 0.006 increase in prediction accuracy over G-BLUP and Bayes-C, respectively. Computationally, GBC was very much faster than Bayes-C and similar to G-BLUP. Our results show that incorporating some aspects of G-BLUP and Bayes-C in a single model can improve accuracy of GP over the commonly used method: G-BLUP. Generally, GBC did not statistically perform better than G-BLUP and Bayes-C, probably due to the close relationships between reference and validation individuals. Nevertheless, it is a flexible tool, in the sense, that it simultaneously incorporates some aspects of linear and non-linear models for GP, thereby exploiting family relationships while also accounting for linkage disequilibrium between SNPs and genes with large effects. The application of GBC in GP merits further exploration.
Anderson, Eric C; Ng, Thomas C
2016-02-01
We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.
Assessment of Agricultural Water Management in Punjab, India using Bayesian Methods
NASA Astrophysics Data System (ADS)
Russo, T. A.; Devineni, N.; Lall, U.; Sidhu, R.
2013-12-01
The success of the Green Revolution in Punjab, India is threatened by the declining water table (approx. 1 m/yr). Punjab, a major agricultural supplier for the rest of India, supports irrigation with a canal system and groundwater, which is vastly over-exploited. Groundwater development in many districts is greater than 200% the annual recharge rate. The hydrologic data required to complete a mass-balance model are not available for this region, therefore we use Bayesian methods to estimate hydrologic properties and irrigation requirements. Using the known values of precipitation, total canal water delivery, crop yield, and water table elevation, we solve for each unknown parameter (often a coefficient) using a Markov chain Monte Carlo (MCMC) algorithm. Results provide regional estimates of irrigation requirements and groundwater recharge rates under observed climate conditions (1972 to 2002). Model results are used to estimate future water availability and demand to help inform agriculture management decisions under projected climate conditions. We find that changing cropping patterns for the region can maintain food production while balancing groundwater pumping with natural recharge. This computational method can be applied in data-scarce regions across the world, where agricultural water management is required to resolve competition between food security and changing resource availability.
A Computationally-Efficient Inverse Approach to Probabilistic Strain-Based Damage Diagnosis
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.; Leser, William P.; Leser, Patrick E.; Newman, John A
2016-01-01
This work presents a computationally-efficient inverse approach to probabilistic damage diagnosis. Given strain data at a limited number of measurement locations, Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling are used to estimate probability distributions of the unknown location, size, and orientation of damage. Substantial computational speedup is obtained by replacing a three-dimensional finite element (FE) model with an efficient surrogate model. The approach is experimentally validated on cracked test specimens where full field strains are determined using digital image correlation (DIC). Access to full field DIC data allows for testing of different hypothetical sensor arrangements, facilitating the study of strain-based diagnosis effectiveness as the distance between damage and measurement locations increases. The ability of the framework to effectively perform both probabilistic damage localization and characterization in cracked plates is demonstrated and the impact of measurement location on uncertainty in the predictions is shown. Furthermore, the analysis time to produce these predictions is orders of magnitude less than a baseline Bayesian approach with the FE method by utilizing surrogate modeling and effective numerical sampling approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios, E-mail: georgios.karagiannis@pnnl.gov; Lin, Guang, E-mail: guang.lin@pnnl.gov
2014-02-15
Generalized polynomial chaos (gPC) expansions allow us to represent the solution of a stochastic system using a series of polynomial chaos basis functions. The number of gPC terms increases dramatically as the dimension of the random input variables increases. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs when the corresponding deterministic solver is computationally expensive, evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solutions, in both spatial and random domains, bymore » coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spatial points, via (1) the Bayesian model average (BMA) or (2) the median probability model, and their construction as spatial functions on the spatial domain via spline interpolation. The former accounts for the model uncertainty and provides Bayes-optimal predictions; while the latter provides a sparse representation of the stochastic solutions by evaluating the expansion on a subset of dominating gPC bases. Moreover, the proposed methods quantify the importance of the gPC bases in the probabilistic sense through inclusion probabilities. We design a Markov chain Monte Carlo (MCMC) sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed methods are suitable for, but not restricted to, problems whose stochastic solutions are sparse in the stochastic space with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the accuracy and performance of the proposed methods and make comparisons with other approaches on solving elliptic SPDEs with 1-, 14- and 40-random dimensions.« less
Data Analysis Recipes: Using Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Hogg, David W.; Foreman-Mackey, Daniel
2018-05-01
Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used. .
A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates
NASA Astrophysics Data System (ADS)
Lima, Carlos H. R.; Lall, Upmanu; Troy, Tara; Devineni, Naresh
2016-10-01
We estimate local and regional Generalized Extreme Value (GEV) distribution parameters for flood frequency analysis in a multilevel, hierarchical Bayesian framework, to explicitly model and reduce uncertainties. As prior information for the model, we assume that the GEV location and scale parameters for each site come from independent log-normal distributions, whose mean parameter scales with the drainage area. From empirical and theoretical arguments, the shape parameter for each site is shrunk towards a common mean. Non-informative prior distributions are assumed for the hyperparameters and the MCMC method is used to sample from the joint posterior distribution. The model is tested using annual maximum series from 20 streamflow gauges located in an 83,000 km2 flood prone basin in Southeast Brazil. The results show a significant reduction of uncertainty estimates of flood quantile estimates over the traditional GEV model, particularly for sites with shorter records. For return periods within the range of the data (around 50 years), the Bayesian credible intervals for the flood quantiles tend to be narrower than the classical confidence limits based on the delta method. As the return period increases beyond the range of the data, the confidence limits from the delta method become unreliable and the Bayesian credible intervals provide a way to estimate satisfactory confidence bands for the flood quantiles considering parameter uncertainties and regional information. In order to evaluate the applicability of the proposed hierarchical Bayesian model for regional flood frequency analysis, we estimate flood quantiles for three randomly chosen out-of-sample sites and compare with classical estimates using the index flood method. The posterior distributions of the scaling law coefficients are used to define the predictive distributions of the GEV location and scale parameters for the out-of-sample sites given only their drainage areas and the posterior distribution of the average shape parameter is taken as the regional predictive distribution for this parameter. While the index flood method does not provide a straightforward way to consider the uncertainties in the index flood and in the regional parameters, the results obtained here show that the proposed Bayesian method is able to produce adequate credible intervals for flood quantiles that are in accordance with empirical estimates.
Time-varying Concurrent Risk of Extreme Droughts and Heatwaves in California
NASA Astrophysics Data System (ADS)
Sarhadi, A.; Diffenbaugh, N. S.; Ausin, M. C.
2016-12-01
Anthropogenic global warming has changed the nature and the risk of extreme climate phenomena such as droughts and heatwaves. The concurrent of these nature-changing climatic extremes may result in intensifying undesirable consequences in terms of human health and destructive effects in water resources. The present study assesses the risk of concurrent extreme droughts and heatwaves under dynamic nonstationary conditions arising from climate change in California. For doing so, a generalized fully Bayesian time-varying multivariate risk framework is proposed evolving through time under dynamic human-induced environment. In this methodology, an extreme, Bayesian, dynamic copula (Gumbel) is developed to model the time-varying dependence structure between the two different climate extremes. The time-varying extreme marginals are previously modeled using a Generalized Extreme Value (GEV) distribution. Bayesian Markov Chain Monte Carlo (MCMC) inference is integrated to estimate parameters of the nonstationary marginals and copula using a Gibbs sampling method. Modelled marginals and copula are then used to develop a fully Bayesian, time-varying joint return period concept for the estimation of concurrent risk. Here we argue that climate change has increased the chance of concurrent droughts and heatwaves over decades in California. It is also demonstrated that a time-varying multivariate perspective should be incorporated to assess realistic concurrent risk of the extremes for water resources planning and management in a changing climate in this area. The proposed generalized methodology can be applied for other stochastic nature-changing compound climate extremes that are under the influence of climate change.
Bayesian structured additive regression modeling of epidemic data: application to cholera
2012-01-01
Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662
NASA Astrophysics Data System (ADS)
Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo
2015-04-01
The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.
Bayesian Inversion of 2D Models from Airborne Transient EM Data
NASA Astrophysics Data System (ADS)
Blatter, D. B.; Key, K.; Ray, A.
2016-12-01
The inherent non-uniqueness in most geophysical inverse problems leads to an infinite number of Earth models that fit observed data to within an adequate tolerance. To resolve this ambiguity, traditional inversion methods based on optimization techniques such as the Gauss-Newton and conjugate gradient methods rely on an additional regularization constraint on the properties that an acceptable model can possess, such as having minimal roughness. While allowing such an inversion scheme to converge on a solution, regularization makes it difficult to estimate the uncertainty associated with the model parameters. This is because regularization biases the inversion process toward certain models that satisfy the regularization constraint and away from others that don't, even when both may suitably fit the data. By contrast, a Bayesian inversion framework aims to produce not a single `most acceptable' model but an estimate of the posterior likelihood of the model parameters, given the observed data. In this work, we develop a 2D Bayesian framework for the inversion of transient electromagnetic (TEM) data. Our method relies on a reversible-jump Markov Chain Monte Carlo (RJ-MCMC) Bayesian inverse method with parallel tempering. Previous gradient-based inversion work in this area used a spatially constrained scheme wherein individual (1D) soundings were inverted together and non-uniqueness was tackled by using lateral and vertical smoothness constraints. By contrast, our work uses a 2D model space of Voronoi cells whose parameterization (including number of cells) is fully data-driven. To make the problem work practically, we approximate the forward solution for each TEM sounding using a local 1D approximation where the model is obtained from the 2D model by retrieving a vertical profile through the Voronoi cells. The implicit parsimony of the Bayesian inversion process leads to the simplest models that adequately explain the data, obviating the need for explicit smoothness constraints. In addition, credible intervals in model space are directly obtained, resolving some of the uncertainty introduced by regularization. An example application shows how the method can be used to quantify the uncertainty in airborne EM soundings for imaging subglacial brine channels and groundwater systems.
Sythesis of MCMC and Belief Propagation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Sungsoo; Chertkov, Michael; Shin, Jinwoo
Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus (LC) approach whichmore » allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we first propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efficient rejection-free MCMC scheme for approximating the full series. The main novelty underlying our design is in utilizing the concept of cycle basis, which provides an efficient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments show that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios; Lin, Guang
2014-02-15
Generalized polynomial chaos (gPC) expansions allow the representation of the solution of a stochastic system as a series of polynomial terms. The number of gPC terms increases dramatically with the dimension of the random input variables. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs if the evaluations of the system are expensive, the evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solution, both in spacial and random domains, by coupling Bayesianmore » model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spacial points via (1) Bayesian model average or (2) medial probability model, and their construction as functions on the spacial domain via spline interpolation. The former accounts the model uncertainty and provides Bayes-optimal predictions; while the latter, additionally, provides a sparse representation of the solution by evaluating the expansion on a subset of dominating gPC bases when represented as a gPC expansion. Moreover, the method quantifies the importance of the gPC bases through inclusion probabilities. We design an MCMC sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed method is suitable for, but not restricted to, problems whose stochastic solution is sparse at the stochastic level with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the good performance of the proposed method and make comparisons with others on 1D, 14D and 40D in random space elliptic stochastic partial differential equations.« less
Auxiliary Parameter MCMC for Exponential Random Graph Models
NASA Astrophysics Data System (ADS)
Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro
2016-11-01
Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen
2017-12-27
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
NASA Astrophysics Data System (ADS)
Russo, T. A.; Devineni, N.; Lall, U.
2015-12-01
Lasting success of the Green Revolution in Punjab, India relies on continued availability of local water resources. Supplying primarily rice and wheat for the rest of India, Punjab supports crop irrigation with a canal system and groundwater, which is vastly over-exploited. The detailed data required to physically model future impacts on water supplies agricultural production is not readily available for this region, therefore we use Bayesian methods to estimate hydrologic properties and irrigation requirements for an under-constrained mass balance model. Using measured values of historical precipitation, total canal water delivery, crop yield, and water table elevation, we present a method using a Markov chain Monte Carlo (MCMC) algorithm to solve for a distribution of values for each unknown parameter in a conceptual mass balance model. Due to heterogeneity across the state, and the resolution of input data, we estimate model parameters at the district-scale using spatial pooling. The resulting model is used to predict the impact of precipitation change scenarios on groundwater availability under multiple cropping options. Predicted groundwater declines vary across the state, suggesting that crop selection and water management strategies should be determined at a local scale. This computational method can be applied in data-scarce regions across the world, where water resource management is required to resolve competition between food security and available resources in a changing climate.
Wan, Wai-Yin; Chan, Jennifer S K
2009-08-01
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero-altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).
Building an ACT-R Reader for Eye-Tracking Corpus Data.
Dotlačil, Jakub
2018-01-01
Cognitive architectures have often been applied to data from individual experiments. In this paper, I develop an ACT-R reader that can model a much larger set of data, eye-tracking corpus data. It is shown that the resulting model has a good fit to the data for the considered low-level processes. Unlike previous related works (most prominently, Engelmann, Vasishth, Engbert & Kliegl, ), the model achieves the fit by estimating free parameters of ACT-R using Bayesian estimation and Markov-Chain Monte Carlo (MCMC) techniques, rather than by relying on the mix of manual selection + default values. The method used in the paper is generalizable beyond this particular model and data set and could be used on other ACT-R models. Copyright © 2017 Cognitive Science Society, Inc.
Directional data analysis under the general projected normal distribution
Wang, Fangpo; Gelfand, Alan E.
2013-01-01
The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion. PMID:24046539
NASA Astrophysics Data System (ADS)
Quintero-Chavarria, E.; Ochoa Gutierrez, L. H.
2016-12-01
Applications of the Self-potential Method in the fields of Hydrogeology and Environmental Sciences have had significant developments during the last two decades with a strong use on groundwater flows identification. Although only few authors deal with the forward problem's solution -especially in geophysics literature- different inversion procedures are currently being developed but in most cases they are compared with unconventional groundwater velocity fields and restricted to structured meshes. This research solves the forward problem based on the finite element method using the St. Venant's Principle to transform a point dipole, which is the field generated by a single vector, into a distribution of electrical monopoles. Then, two simple aquifer models were generated with specific boundary conditions and head potentials, velocity fields and electric potentials in the medium were computed. With the model's surface electric potential, the inverse problem is solved to retrieve the source of electric potential (vector field associated to groundwater flow) using deterministic and stochastic approaches. The first approach was carried out by implementing a Tikhonov regularization with a stabilized operator adapted to the finite element mesh while for the second a hierarchical Bayesian model based on Markov chain Monte Carlo (McMC) and Markov Random Fields (MRF) was constructed. For all implemented methods, the result between the direct and inverse models was contrasted in two ways: 1) shape and distribution of the vector field, and 2) magnitude's histogram. Finally, it was concluded that inversion procedures are improved when the velocity field's behavior is considered, thus, the deterministic method is more suitable for unconfined aquifers than confined ones. McMC has restricted applications and requires a lot of information (particularly in potentials fields) while MRF has a remarkable response especially when dealing with confined aquifers.
Black, Andrew J.; Ross, Joshua V.
2013-01-01
The clinical serial interval of an infectious disease is the time between date of symptom onset in an index case and the date of symptom onset in one of its secondary cases. It is a quantity which is commonly collected during a pandemic and is of fundamental importance to public health policy and mathematical modelling. In this paper we present a novel method for calculating the serial interval distribution for a Markovian model of household transmission dynamics. This allows the use of Bayesian MCMC methods, with explicit evaluation of the likelihood, to fit to serial interval data and infer parameters of the underlying model. We use simulated and real data to verify the accuracy of our methodology and illustrate the importance of accounting for household size. The output of our approach can be used to produce posterior distributions of population level epidemic characteristics. PMID:24023679
NASA Astrophysics Data System (ADS)
Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane
2017-07-01
We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.
Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C
2007-06-01
The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion
Padhukasahasram, Badri; Rannala, Bruce
2011-01-01
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Minsley, Burke J.
2011-01-01
A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, ‘best’ model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequencydomain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favorably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment.
Bayesian inversion of refraction seismic traveltime data
NASA Astrophysics Data System (ADS)
Ryberg, T.; Haberland, Ch
2018-03-01
We apply a Bayesian Markov chain Monte Carlo (McMC) formalism to the inversion of refraction seismic, traveltime data sets to derive 2-D velocity models below linear arrays (i.e. profiles) of sources and seismic receivers. Typical refraction data sets, especially when using the far-offset observations, are known as having experimental geometries which are very poor, highly ill-posed and far from being ideal. As a consequence, the structural resolution quickly degrades with depth. Conventional inversion techniques, based on regularization, potentially suffer from the choice of appropriate inversion parameters (i.e. number and distribution of cells, starting velocity models, damping and smoothing constraints, data noise level, etc.) and only local model space exploration. McMC techniques are used for exhaustive sampling of the model space without the need of prior knowledge (or assumptions) of inversion parameters, resulting in a large number of models fitting the observations. Statistical analysis of these models allows to derive an average (reference) solution and its standard deviation, thus providing uncertainty estimates of the inversion result. The highly non-linear character of the inversion problem, mainly caused by the experiment geometry, does not allow to derive a reference solution and error map by a simply averaging procedure. We present a modified averaging technique, which excludes parts of the prior distribution in the posterior values due to poor ray coverage, thus providing reliable estimates of inversion model properties even in those parts of the models. The model is discretized by a set of Voronoi polygons (with constant slowness cells) or a triangulated mesh (with interpolation within the triangles). Forward traveltime calculations are performed by a fast, finite-difference-based eikonal solver. The method is applied to a data set from a refraction seismic survey from Northern Namibia and compared to conventional tomography. An inversion test for a synthetic data set from a known model is also presented.
Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2014-01-01
Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900
Cross-Border Sexual Transmission of the Newly Emerging HIV-1 Clade CRF51_01B
Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng
2014-01-01
A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region. PMID:25340817
Cross-border sexual transmission of the newly emerging HIV-1 clade CRF51_01B.
Cheong, Hui Ting; Ng, Kim Tien; Ong, Lai Yee; Chook, Jack Bee; Chan, Kok Gan; Takebe, Yutaka; Kamarulzaman, Adeeba; Tee, Kok Keng
2014-01-01
A novel HIV-1 recombinant clade (CRF51_01B) was recently identified among men who have sex with men (MSM) in Singapore. As cases of sexually transmitted HIV-1 infection increase concurrently in two socioeconomically intimate countries such as Malaysia and Singapore, cross transmission of HIV-1 between said countries is highly probable. In order to investigate the timeline for the emergence of HIV-1 CRF51_01B in Singapore and its possible introduction into Malaysia, 595 HIV-positive subjects recruited in Kuala Lumpur from 2008 to 2012 were screened. Phylogenetic relationship of 485 amplified polymerase gene sequences was determined through neighbour-joining method. Next, near-full length sequences were amplified for genomic sequences inferred to be CRF51_01B and subjected to further analysis implemented through Bayesian Markov chain Monte Carlo (MCMC) sampling and maximum likelihood methods. Based on the near full length genomes, two isolates formed a phylogenetic cluster with CRF51_01B sequences of Singapore origin, sharing identical recombination structure. Spatial and temporal information from Bayesian MCMC coalescent and maximum likelihood analysis of the protease, gp120 and gp41 genes suggest that Singapore is probably the country of origin of CRF51_01B (as early as in the mid-1990s) and featured a Malaysian who acquired the infection through heterosexual contact as host for its ancestral lineages. CRF51_01B then spread rapidly among the MSM in Singapore and Malaysia. Although the importation of CRF51_01B from Singapore to Malaysia is supported by coalescence analysis, the narrow timeframe of the transmission event indicates a closely linked epidemic. Discrepancies in the estimated divergence times suggest that CRF51_01B may have arisen through multiple recombination events from more than one parental lineage. We report the cross transmission of a novel CRF51_01B lineage between countries that involved different sexual risk groups. Understanding the cross-border transmission of HIV-1 involving sexual networks is crucial for effective intervention strategies in the region.
An Evaluation of a Markov Chain Monte Carlo Method for the Two-Parameter Logistic Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
The accuracy of the Markov Chain Monte Carlo (MCMC) procedure Gibbs sampling was considered for estimation of item parameters of the two-parameter logistic model. Data for the Law School Admission Test (LSAT) Section 6 were analyzed to illustrate the MCMC procedure. In addition, simulated data sets were analyzed using the MCMC, marginal Bayesian…
Estimation Methods for One-Parameter Testlet Models
ERIC Educational Resources Information Center
Jiao, Hong; Wang, Shudong; He, Wei
2013-01-01
This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…
How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.
Wiczling, Paweł; Kaliszan, Roman
2016-01-05
In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.
Bayesian parameter estimation for nonlinear modelling of biological pathways.
Ghasemi, Omid; Lindsey, Merry L; Yang, Tianyi; Nguyen, Nguyen; Huang, Yufei; Jin, Yu-Fang
2011-01-01
The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data. We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems. Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.
Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2014-02-01
Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.
Bayesian analysis for erosion modelling of sediments in combined sewer systems.
Kanso, A; Chebbo, G; Tassin, B
2005-01-01
Previous research has confirmed that the sediments at the bed of combined sewer systems are the main source of particulate and organic pollution during rain events contributing to combined sewer overflows. However, existing urban stormwater models utilize inappropriate sediment transport formulas initially developed from alluvial hydrodynamics. Recently, a model has been formulated and profoundly assessed based on laboratory experiments to simulate the erosion of sediments in sewer pipes taking into account the increase in strength with depth in the weak layer of deposits. In order to objectively evaluate this model, this paper presents a Bayesian analysis of the model using field data collected in sewer pipes in Paris under known hydraulic conditions. The test has been performed using a MCMC sampling method for calibration and uncertainty assessment. Results demonstrate the capacity of the model to reproduce erosion as a direct response to the increase in bed shear stress. This is due to the model description of the erosional strength in the deposits and to the shape of the measured bed shear stress. However, large uncertainties in some of the model parameters suggest that the model could be over-parameterised and necessitates a large amount of informative data for its calibration.
Allen, Bruce C; Hack, C Eric; Clewell, Harvey J
2007-08-01
A Bayesian approach, implemented using Markov Chain Monte Carlo (MCMC) analysis, was applied with a physiologically-based pharmacokinetic (PBPK) model of methylmercury (MeHg) to evaluate the variability of MeHg exposure in women of childbearing age in the U.S. population. The analysis made use of the newly available National Health and Nutrition Survey (NHANES) blood and hair mercury concentration data for women of age 16-49 years (sample size, 1,582). Bayesian analysis was performed to estimate the population variability in MeHg exposure (daily ingestion rate) implied by the variation in blood and hair concentrations of mercury in the NHANES database. The measured variability in the NHANES blood and hair data represents the result of a process that includes interindividual variation in exposure to MeHg and interindividual variation in the pharmacokinetics (distribution, clearance) of MeHg. The PBPK model includes a number of pharmacokinetic parameters (e.g., tissue volumes, partition coefficients, rate constants for metabolism and elimination) that can vary from individual to individual within the subpopulation of interest. Using MCMC analysis, it was possible to combine prior distributions of the PBPK model parameters with the NHANES blood and hair data, as well as with kinetic data from controlled human exposures to MeHg, to derive posterior distributions that refine the estimates of both the population exposure distribution and the pharmacokinetic parameters. In general, based on the populations surveyed by NHANES, the results of the MCMC analysis indicate that a small fraction, less than 1%, of the U.S. population of women of childbearing age may have mercury exposures greater than the EPA RfD for MeHg of 0.1 microg/kg/day, and that there are few, if any, exposures greater than the ATSDR MRL of 0.3 microg/kg/day. The analysis also indicates that typical exposures may be greater than previously estimated from food consumption surveys, but that the variability in exposure within the population of U.S. women of childbearing age may be less than previously assumed.
Stochastic Inversion of 2D Magnetotelluric Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2010-07-01
The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2013-08-01
Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
Sargeant, Glen A.; Sovada, Marsha A.; Slivinski, Christiane C.; Johnson, Douglas H.
2005-01-01
Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997–1999, we searched 355 townships (ca. 93 km) 1–3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ≥1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ≥0.65.
Sargeant, G.A.; Sovada, M.A.; Slivinski, C.C.; Johnson, D.H.
2005-01-01
Accurate maps of species distributions are essential tools for wildlife research and conservation. Unfortunately, biologists often are forced to rely on maps derived from observed occurrences recorded opportunistically during observation periods of variable length. Spurious inferences are likely to result because such maps are profoundly affected by the duration and intensity of observation and by methods used to delineate distributions, especially when detection is uncertain. We conducted a systematic survey of swift fox (Vulpes velox) distribution in western Kansas, USA, and used Markov chain Monte Carlo (MCMC) image restoration to rectify these problems. During 1997-1999, we searched 355 townships (ca. 93 km2) 1-3 times each for an average cost of $7,315 per year and achieved a detection rate (probability of detecting swift foxes, if present, during a single search) of ?? = 0.69 (95% Bayesian confidence interval [BCI] = [0.60, 0.77]). Our analysis produced an estimate of the underlying distribution, rather than a map of observed occurrences, that reflected the uncertainty associated with estimates of model parameters. To evaluate our results, we analyzed simulated data with similar properties. Results of our simulations suggest negligible bias and good precision when probabilities of detection on ???1 survey occasions (cumulative probabilities of detection) exceed 0.65. Although the use of MCMC image restoration has been limited by theoretical and computational complexities, alternatives do not possess the same advantages. Image models accommodate uncertain detection, do not require spatially independent data or a census of map units, and can be used to estimate species distributions directly from observations without relying on habitat covariates or parameters that must be estimated subjectively. These features facilitate economical surveys of large regions, the detection of temporal trends in distribution, and assessments of landscape-level relations between species and habitats. Requirements for the use of MCMC image restoration include study areas that can be partitioned into regular grids of mapping units, spatially contagious species distributions, reliable methods for identifying target species, and cumulative probabilities of detection ???0.65.
Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method
ERIC Educational Resources Information Center
Liu, Yuming; Schulz, E. Matthew; Yu, Lei
2008-01-01
A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y
2017-11-10
Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.
Bayesian spatiotemporal model of fMRI data using transfer functions.
Quirós, Alicia; Diez, Raquel Montes; Wilson, Simon P
2010-09-01
This research describes a new Bayesian spatiotemporal model to analyse BOLD fMRI studies. In the temporal dimension, we describe the shape of the hemodynamic response function (HRF) with a transfer function model. The spatial continuity and local homogeneity of the evoked responses are modelled by a Gaussian Markov random field prior on the parameter indicating activations. The proposal constitutes an extension of the spatiotemporal model presented in a previous approach [Quirós, A., Montes Diez, R. and Gamerman, D., 2010. Bayesian spatiotemporal model of fMRI data, Neuroimage, 49: 442-456], offering more flexibility in the estimation of the HRF and computational advantages in the resulting MCMC algorithm. Simulations from the model are performed in order to ascertain the performance of the sampling scheme and the ability of the posterior to estimate model parameters, as well as to check the model sensitivity to signal to noise ratio. Results are shown on synthetic data and on a real data set from a block-design fMRI experiment. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Minsley, B.J.
2011-01-01
A meaningful interpretation of geophysical measurements requires an assessment of the space of models that are consistent with the data, rather than just a single, 'best' model which does not convey information about parameter uncertainty. For this purpose, a trans-dimensional Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed for assessing frequency-domain electromagnetic (FDEM) data acquired from airborne or ground-based systems. By sampling the distribution of models that are consistent with measured data and any prior knowledge, valuable inferences can be made about parameter values such as the likely depth to an interface, the distribution of possible resistivity values as a function of depth and non-unique relationships between parameters. The trans-dimensional aspect of the algorithm allows the number of layers to be a free parameter that is controlled by the data, where models with fewer layers are inherently favoured, which provides a natural measure of parsimony and a significant degree of flexibility in parametrization. The MCMC algorithm is used with synthetic examples to illustrate how the distribution of acceptable models is affected by the choice of prior information, the system geometry and configuration and the uncertainty in the measured system elevation. An airborne FDEM data set that was acquired for the purpose of hydrogeological characterization is also studied. The results compare favourably with traditional least-squares analysis, borehole resistivity and lithology logs from the site, and also provide new information about parameter uncertainty necessary for model assessment. ?? 2011. Geophysical Journal International ?? 2011 RAS.
Bayesian Regression of Thermodynamic Models of Redox Active Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnston, Katherine
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from themore » model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).« less
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Asteroid orbital inversion using uniform phase-space sampling
NASA Astrophysics Data System (ADS)
Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.
2014-07-01
We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
Uncertainty quantification for environmental models
Hill, Mary C.; Lu, Dan; Kavetski, Dmitri; Clark, Martyn P.; Ye, Ming
2012-01-01
Environmental models are used to evaluate the fate of fertilizers in agricultural settings (including soil denitrification), the degradation of hydrocarbons at spill sites, and water supply for people and ecosystems in small to large basins and cities—to mention but a few applications of these models. They also play a role in understanding and diagnosing potential environmental impacts of global climate change. The models are typically mildly to extremely nonlinear. The persistent demand for enhanced dynamics and resolution to improve model realism [17] means that lengthy individual model execution times will remain common, notwithstanding continued enhancements in computer power. In addition, high-dimensional parameter spaces are often defined, which increases the number of model runs required to quantify uncertainty [2]. Some environmental modeling projects have access to extensive funding and computational resources; many do not. The many recent studies of uncertainty quantification in environmental model predictions have focused on uncertainties related to data error and sparsity of data, expert judgment expressed mathematically through prior information, poorly known parameter values, and model structure (see, for example, [1,7,9,10,13,18]). Approaches for quantifying uncertainty include frequentist (potentially with prior information [7,9]), Bayesian [13,18,19], and likelihood-based. A few of the numerous methods, including some sensitivity and inverse methods with consequences for understanding and quantifying uncertainty, are as follows: Bayesian hierarchical modeling and Bayesian model averaging; single-objective optimization with error-based weighting [7] and multi-objective optimization [3]; methods based on local derivatives [2,7,10]; screening methods like OAT (one at a time) and the method of Morris [14]; FAST (Fourier amplitude sensitivity testing) [14]; the Sobol' method [14]; randomized maximum likelihood [10]; Markov chain Monte Carlo (MCMC) [10]. There are also bootstrapping and cross-validation approaches.Sometimes analyses are conducted using surrogate models [12]. The availability of so many options can be confusing. Categorizing methods based on fundamental questions assists in communicating the essential results of uncertainty analyses to stakeholders. Such questions can focus on model adequacy (e.g., How well does the model reproduce observed system characteristics and dynamics?) and sensitivity analysis (e.g., What parameters can be estimated with available data? What observations are important to parameters and predictions? What parameters are important to predictions?), as well as on the uncertainty quantification (e.g., How accurate and precise are the predictions?). The methods can also be classified by the number of model runs required: few (10s to 1000s) or many (10,000s to 1,000,000s). Of the methods listed above, the most computationally frugal are generally those based on local derivatives; MCMC methods tend to be among the most computationally demanding. Surrogate models (emulators)do not necessarily produce computational frugality because many runs of the full model are generally needed to create a meaningful surrogate model. With this categorization, we can, in general, address all the fundamental questions mentioned above using either computationally frugal or demanding methods. Model development and analysis can thus be conducted consistently using either computation-ally frugal or demanding methods; alternatively, different fundamental questions can be addressed using methods that require different levels of effort. Based on this perspective, we pose the question: Can computationally frugal methods be useful companions to computationally demanding meth-ods? The reliability of computationally frugal methods generally depends on the model being reasonably linear, which usually means smooth nonlin-earities and the assumption of Gaussian errors; both tend to be more valid with more linear
Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals
Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E
2018-01-01
Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587
NASA Astrophysics Data System (ADS)
Astuti, Ani Budi; Iriawan, Nur; Irhamah, Kuswanto, Heri
2017-12-01
In the Bayesian mixture modeling requires stages the identification number of the most appropriate mixture components thus obtained mixture models fit the data through data driven concept. Reversible Jump Markov Chain Monte Carlo (RJMCMC) is a combination of the reversible jump (RJ) concept and the Markov Chain Monte Carlo (MCMC) concept used by some researchers to solve the problem of identifying the number of mixture components which are not known with certainty number. In its application, RJMCMC using the concept of the birth/death and the split-merge with six types of movement, that are w updating, θ updating, z updating, hyperparameter β updating, split-merge for components and birth/death from blank components. The development of the RJMCMC algorithm needs to be done according to the observed case. The purpose of this study is to know the performance of RJMCMC algorithm development in identifying the number of mixture components which are not known with certainty number in the Bayesian mixture modeling for microarray data in Indonesia. The results of this study represent that the concept RJMCMC algorithm development able to properly identify the number of mixture components in the Bayesian normal mixture model wherein the component mixture in the case of microarray data in Indonesia is not known for certain number.
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.
2013-12-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameter identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from indirect concentration measurements in identifying unknown source parameters such as the release time, strength and location. In this approach, the sampling location that gives the maximum relative entropy is selected as the optimal one. Once the sampling location is determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown source parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. Compared with the traditional optimal design, which is based on the Gaussian linear assumption, the method developed in this study can cope with arbitrary nonlinearity. It can be used to assist in groundwater monitor network design and identification of unknown contaminant sources. Contours of the expected information gain. The optimal observing location corresponds to the maximum value. Posterior marginal probability densities of unknown parameters, the thick solid black lines are for the designed location. For comparison, other 7 lines are for randomly chosen locations. The true values are denoted by vertical lines. It is obvious that the unknown parameters are estimated better with the desinged location.
Analysing child mortality in Nigeria with geoadditive discrete-time survival models.
Adebayo, Samson B; Fahrmeir, Ludwig
2005-03-15
Child mortality reflects a country's level of socio-economic development and quality of life. In developing countries, mortality rates are not only influenced by socio-economic, demographic and health variables but they also vary considerably across regions and districts. In this paper, we analysed child mortality in Nigeria with flexible geoadditive discrete-time survival models. This class of models allows us to measure small-area district-specific spatial effects simultaneously with possibly non-linear or time-varying effects of other factors. Inference is fully Bayesian and uses computationally efficient Markov chain Monte Carlo (MCMC) simulation techniques. The application is based on the 1999 Nigeria Demographic and Health Survey. Our method assesses effects at a high level of temporal and spatial resolution not available with traditional parametric models, and the results provide some evidence on how to reduce child mortality by improving socio-economic and public health conditions. Copyright (c) 2004 John Wiley & Sons, Ltd.
Bayesian analysis of Jolly-Seber type models
Matechou, Eleni; Nicholls, Geoff K.; Morgan, Byron J. T.; Collazo, Jaime A.; Lyons, James E.
2016-01-01
We propose the use of finite mixtures of continuous distributions in modelling the process by which new individuals, that arrive in groups, become part of a wildlife population. We demonstrate this approach using a data set of migrating semipalmated sandpipers (Calidris pussila) for which we extend existing stopover models to allow for individuals to have different behaviour in terms of their stopover duration at the site. We demonstrate the use of reversible jump MCMC methods to derive posterior distributions for the model parameters and the models, simultaneously. The algorithm moves between models with different numbers of arrival groups as well as between models with different numbers of behavioural groups. The approach is shown to provide new ecological insights about the stopover behaviour of semipalmated sandpipers but is generally applicable to any population in which animals arrive in groups and potentially exhibit heterogeneity in terms of one or more other processes.
Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biros, George
Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less
MPN estimation of qPCR target sequence recoveries from whole cell calibrator samples.
Sivaganesan, Mano; Siefring, Shawn; Varma, Manju; Haugland, Richard A
2011-12-01
DNA extracts from enumerated target organism cells (calibrator samples) have been used for estimating Enterococcus cell equivalent densities in surface waters by a comparative cycle threshold (Ct) qPCR analysis method. To compare surface water Enterococcus density estimates from different studies by this approach, either a consistent source of calibrator cells must be used or the estimates must account for any differences in target sequence recoveries from different sources of calibrator cells. In this report we describe two methods for estimating target sequence recoveries from whole cell calibrator samples based on qPCR analyses of their serially diluted DNA extracts and most probable number (MPN) calculation. The first method employed a traditional MPN calculation approach. The second method employed a Bayesian hierarchical statistical modeling approach and a Monte Carlo Markov Chain (MCMC) simulation method to account for the uncertainty in these estimates associated with different individual samples of the cell preparations, different dilutions of the DNA extracts and different qPCR analytical runs. The two methods were applied to estimate mean target sequence recoveries per cell from two different lots of a commercially available source of enumerated Enterococcus cell preparations. The mean target sequence recovery estimates (and standard errors) per cell from Lot A and B cell preparations by the Bayesian method were 22.73 (3.4) and 11.76 (2.4), respectively, when the data were adjusted for potential false positive results. Means were similar for the traditional MPN approach which cannot comparably assess uncertainty in the estimates. Cell numbers and estimates of recoverable target sequences in calibrator samples prepared from the two cell sources were also used to estimate cell equivalent and target sequence quantities recovered from surface water samples in a comparative Ct method. Our results illustrate the utility of the Bayesian method in accounting for uncertainty, the high degree of precision attainable by the MPN approach and the need to account for the differences in target sequence recoveries from different calibrator sample cell sources when they are used in the comparative Ct method. Published by Elsevier B.V.
Bayesian Hierarchical Random Intercept Model Based on Three Parameter Gamma Distribution
NASA Astrophysics Data System (ADS)
Wirawati, Ika; Iriawan, Nur; Irhamah
2017-06-01
Hierarchical data structures are common throughout many areas of research. Beforehand, the existence of this type of data was less noticed in the analysis. The appropriate statistical analysis to handle this type of data is the hierarchical linear model (HLM). This article will focus only on random intercept model (RIM), as a subclass of HLM. This model assumes that the intercept of models in the lowest level are varied among those models, and their slopes are fixed. The differences of intercepts were suspected affected by some variables in the upper level. These intercepts, therefore, are regressed against those upper level variables as predictors. The purpose of this paper would demonstrate a proven work of the proposed two level RIM of the modeling on per capita household expenditure in Maluku Utara, which has five characteristics in the first level and three characteristics of districts/cities in the second level. The per capita household expenditure data in the first level were captured by the three parameters Gamma distribution. The model, therefore, would be more complex due to interaction of many parameters for representing the hierarchical structure and distribution pattern of the data. To simplify the estimation processes of parameters, the computational Bayesian method couple with Markov Chain Monte Carlo (MCMC) algorithm and its Gibbs Sampling are employed.
An investigation into exoplanet transits and uncertainties
NASA Astrophysics Data System (ADS)
Ji, Y.; Banks, T.; Budding, E.; Rhodes, M. D.
2017-06-01
A simple transit model is described along with tests of this model against published results for 4 exoplanet systems (Kepler-1, 2, 8, and 77). Data from the Kepler mission are used. The Markov Chain Monte Carlo (MCMC) method is applied to obtain realistic error estimates. Optimisation of limb darkening coefficients is subject to data quality. It is more likely for MCMC to derive an empirical limb darkening coefficient for light curves with S/N (signal to noise) above 15. Finally, the model is applied to Kepler data for 4 Kepler candidate systems (KOI 760.01, 767.01, 802.01, and 824.01) with previously unpublished results. Error estimates for these systems are obtained via the MCMC method.
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
NASA Astrophysics Data System (ADS)
Norris, P. M.; da Silva, A. M., Jr.
2016-12-01
Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.
NASA Astrophysics Data System (ADS)
Yee, Eugene
2007-04-01
Although a great deal of research effort has been focused on the forward prediction of the dispersion of contaminants (e.g., chemical and biological warfare agents) released into the turbulent atmosphere, much less work has been directed toward the inverse prediction of agent source location and strength from the measured concentration, even though the importance of this problem for a number of practical applications is obvious. In general, the inverse problem of source reconstruction is ill-posed and unsolvable without additional information. It is demonstrated that a Bayesian probabilistic inferential framework provides a natural and logically consistent method for source reconstruction from a limited number of noisy concentration data. In particular, the Bayesian approach permits one to incorporate prior knowledge about the source as well as additional information regarding both model and data errors. The latter enables a rigorous determination of the uncertainty in the inference of the source parameters (e.g., spatial location, emission rate, release time, etc.), hence extending the potential of the methodology as a tool for quantitative source reconstruction. A model (or, source-receptor relationship) that relates the source distribution to the concentration data measured by a number of sensors is formulated, and Bayesian probability theory is used to derive the posterior probability density function of the source parameters. A computationally efficient methodology for determination of the likelihood function for the problem, based on an adjoint representation of the source-receptor relationship, is described. Furthermore, we describe the application of efficient stochastic algorithms based on Markov chain Monte Carlo (MCMC) for sampling from the posterior distribution of the source parameters, the latter of which is required to undertake the Bayesian computation. The Bayesian inferential methodology for source reconstruction is validated against real dispersion data for two cases involving contaminant dispersion in highly disturbed flows over urban and complex environments where the idealizations of horizontal homogeneity and/or temporal stationarity in the flow cannot be applied to simplify the problem. Furthermore, the methodology is applied to the case of reconstruction of multiple sources.
NASA Astrophysics Data System (ADS)
Zhang, D.; Liao, Q.
2016-12-01
The Bayesian inference provides a convenient framework to solve statistical inverse problems. In this method, the parameters to be identified are treated as random variables. The prior knowledge, the system nonlinearity, and the measurement errors can be directly incorporated in the posterior probability density function (PDF) of the parameters. The Markov chain Monte Carlo (MCMC) method is a powerful tool to generate samples from the posterior PDF. However, since the MCMC usually requires thousands or even millions of forward simulations, it can be a computationally intensive endeavor, particularly when faced with large-scale flow and transport models. To address this issue, we construct a surrogate system for the model responses in the form of polynomials by the stochastic collocation method. In addition, we employ interpolation based on the nested sparse grids and takes into account the different importance of the parameters, under the condition of high random dimensions in the stochastic space. Furthermore, in case of low regularity such as discontinuous or unsmooth relation between the input parameters and the output responses, we introduce an additional transform process to improve the accuracy of the surrogate model. Once we build the surrogate system, we may evaluate the likelihood with very little computational cost. We analyzed the convergence rate of the forward solution and the surrogate posterior by Kullback-Leibler divergence, which quantifies the difference between probability distributions. The fast convergence of the forward solution implies fast convergence of the surrogate posterior to the true posterior. We also tested the proposed algorithm on water-flooding two-phase flow reservoir examples. The posterior PDF calculated from a very long chain with direct forward simulation is assumed to be accurate. The posterior PDF calculated using the surrogate model is in reasonable agreement with the reference, revealing a great improvement in terms of computational efficiency.
Bayesian random local clocks, or one rate to rule them all
2010-01-01
Background Relaxed molecular clock models allow divergence time dating and "relaxed phylogenetic" inference, in which a time tree is estimated in the face of unequal rates across lineages. We present a new method for relaxing the assumption of a strict molecular clock using Markov chain Monte Carlo to implement Bayesian modeling averaging over random local molecular clocks. The new method approaches the problem of rate variation among lineages by proposing a series of local molecular clocks, each extending over a subregion of the full phylogeny. Each branch in a phylogeny (subtending a clade) is a possible location for a change of rate from one local clock to a new one. Thus, including both the global molecular clock and the unconstrained model results, there are a total of 22n-2 possible rate models available for averaging with 1, 2, ..., 2n - 2 different rate categories. Results We propose an efficient method to sample this model space while simultaneously estimating the phylogeny. The new method conveniently allows a direct test of the strict molecular clock, in which one rate rules them all, against a large array of alternative local molecular clock models. We illustrate the method's utility on three example data sets involving mammal, primate and influenza evolution. Finally, we explore methods to visualize the complex posterior distribution that results from inference under such models. Conclusions The examples suggest that large sequence datasets may only require a small number of local molecular clocks to reconcile their branch lengths with a time scale. All of the analyses described here are implemented in the open access software package BEAST 1.5.4 (http://beast-mcmc.googlecode.com/). PMID:20807414
NASA Astrophysics Data System (ADS)
Jakkareddy, Pradeep S.; Balaji, C.
2016-09-01
This paper employs the Bayesian based Metropolis Hasting - Markov Chain Monte Carlo algorithm to solve inverse heat transfer problem of determining the spatially varying heat transfer coefficient from a flat plate with flush mounted discrete heat sources with measured temperatures at the bottom of the plate. The Nusselt number is assumed to be of the form Nu = aReb(x/l)c . To input reasonable values of ’a’ and ‘b’ into the inverse problem, first limited two dimensional conjugate convection simulations were done with Comsol. Based on the guidance from this different values of ‘a’ and ‘b’ are input to a computationally less complex problem of conjugate conduction in the flat plate (15mm thickness) and temperature distributions at the bottom of the plate which is a more convenient location for measuring the temperatures without disturbing the flow were obtained. Since the goal of this work is to demonstrate the eficiacy of the Bayesian approach to accurately retrieve ‘a’ and ‘b’, numerically generated temperatures with known values of ‘a’ and ‘b’ are treated as ‘surrogate’ experimental data. The inverse problem is then solved by repeatedly using the forward solutions together with the MH-MCMC aprroach. To speed up the estimation, the forward model is replaced by an artificial neural network. The mean, maximum-a-posteriori and standard deviation of the estimated parameters ‘a’ and ‘b’ are reported. The robustness of the proposed method is examined, by synthetically adding noise to the temperatures.
Dependence of paracentric inversion rate on tract length.
York, Thomas L; Durrett, Rick; Nielsen, Rasmus
2007-04-03
We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted.
Dependence of paracentric inversion rate on tract length
York, Thomas L; Durrett, Rick; Nielsen, Rasmus
2007-01-01
Background We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. Results We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. Conclusion The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted. PMID:17407601
Model Reduction via Principe Component Analysis and Markov Chain Monte Carlo (MCMC) Methods
NASA Astrophysics Data System (ADS)
Gong, R.; Chen, J.; Hoversten, M. G.; Luo, J.
2011-12-01
Geophysical and hydrogeological inverse problems often include a large number of unknown parameters, ranging from hundreds to millions, depending on parameterization and problems undertaking. This makes inverse estimation and uncertainty quantification very challenging, especially for those problems in two- or three-dimensional spatial domains. Model reduction technique has the potential of mitigating the curse of dimensionality by reducing total numbers of unknowns while describing the complex subsurface systems adequately. In this study, we explore the use of principal component analysis (PCA) and Markov chain Monte Carlo (MCMC) sampling methods for model reduction through the use of synthetic datasets. We compare the performances of three different but closely related model reduction approaches: (1) PCA methods with geometric sampling (referred to as 'Method 1'), (2) PCA methods with MCMC sampling (referred to as 'Method 2'), and (3) PCA methods with MCMC sampling and inclusion of random effects (referred to as 'Method 3'). We consider a simple convolution model with five unknown parameters as our goal is to understand and visualize the advantages and disadvantages of each method by comparing their inversion results with the corresponding analytical solutions. We generated synthetic data with noise added and invert them under two different situations: (1) the noised data and the covariance matrix for PCA analysis are consistent (referred to as the unbiased case), and (2) the noise data and the covariance matrix are inconsistent (referred to as biased case). In the unbiased case, comparison between the analytical solutions and the inversion results show that all three methods provide good estimates of the true values and Method 1 is computationally more efficient. In terms of uncertainty quantification, Method 1 performs poorly because of relatively small number of samples obtained, Method 2 performs best, and Method 3 overestimates uncertainty due to inclusion of random effects. However, in the biased case, only Method 3 correctly estimates all the unknown parameters, and both Methods 1 and 2 provide wrong values for the biased parameters. The synthetic case study demonstrates that if the covariance matrix for PCA analysis is inconsistent with true models, the PCA methods with geometric or MCMC sampling will provide incorrect estimates.
Jump spillover between oil prices and exchange rates
NASA Astrophysics Data System (ADS)
Li, Xiao-Ping; Zhou, Chun-Yang; Wu, Chong-Feng
2017-11-01
In this paper, we investigate the jump spillover effects between oil prices and exchange rates. To identify the latent historical jumps for exchange rates and oil prices, we use a Bayesian MCMC approach to estimate the stochastic volatility model with correlated jumps in both returns and volatilities for each. We examine the simultaneous jump intensities and the conditional jump spillover probabilities between oil prices and exchange rates, finding strong evidence of jump spillover effects. Further analysis shows that the jump spillovers are mainly due to exogenous events such as financial crises and geopolitical events. Thus, the findings have important implications for financial risk management.
Sharp Boundary Inversion of 2D Magnetotelluric Data using Bayesian Method.
NASA Astrophysics Data System (ADS)
Zhou, S.; Huang, Q.
2017-12-01
Normally magnetotelluric(MT) inversion method cannot show the distribution of underground resistivity with clear boundary, even if there are obviously different blocks. Aiming to solve this problem, we develop a Bayesian structure to inverse 2D MT sharp boundary data, using boundary location and inside resistivity as the random variables. Firstly, we use other MT inversion results, like ModEM, to analyze the resistivity distribution roughly. Then, we select the suitable random variables and change its data format to traditional staggered grid parameters, which can be used to do finite difference forward part. Finally, we can shape the posterior probability density(PPD), which contains all the prior information and model-data correlation, by Markov Chain Monte Carlo(MCMC) sampling from prior distribution. The depth, resistivity and their uncertainty can be valued. It also works for sensibility estimation. We applied the method to a synthetic case, which composes two large abnormal blocks in a trivial background. We consider the boundary smooth and the near true model weight constrains that mimic joint inversion or constrained inversion, then we find that the model results a more precise and focused depth distribution. And we also test the inversion without constrains and find that the boundary could also be figured, though not as well. Both inversions have a good valuation of resistivity. The constrained result has a lower root mean square than ModEM inversion result. The data sensibility obtained via PPD shows that the resistivity is the most sensible, center depth comes second and both sides are the worst.
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H.; Montesinos-López, José C.; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat
2017-01-01
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. PMID:28364037
MultiNest: Efficient and Robust Bayesian Inference
NASA Astrophysics Data System (ADS)
Feroz, F.; Hobson, M. P.; Bridges, M.
2011-09-01
We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.
Macaro, Christian; Prado, Raquel
2014-01-01
We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R
2014-06-01
The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.
NASA Astrophysics Data System (ADS)
Zhang, Junlong; Li, Yongping; Huang, Guohe; Chen, Xi; Bao, Anming
2016-07-01
Without a realistic assessment of parameter uncertainty, decision makers may encounter difficulties in accurately describing hydrologic processes and assessing relationships between model parameters and watershed characteristics. In this study, a Markov-Chain-Monte-Carlo-based multilevel-factorial-analysis (MCMC-MFA) method is developed, which can not only generate samples of parameters from a well constructed Markov chain and assess parameter uncertainties with straightforward Bayesian inference, but also investigate the individual and interactive effects of multiple parameters on model output through measuring the specific variations of hydrological responses. A case study is conducted for addressing parameter uncertainties in the Kaidu watershed of northwest China. Effects of multiple parameters and their interactions are quantitatively investigated using the MCMC-MFA with a three-level factorial experiment (totally 81 runs). A variance-based sensitivity analysis method is used to validate the results of parameters' effects. Results disclose that (i) soil conservation service runoff curve number for moisture condition II (CN2) and fraction of snow volume corresponding to 50% snow cover (SNO50COV) are the most significant factors to hydrological responses, implying that infiltration-excess overland flow and snow water equivalent represent important water input to the hydrological system of the Kaidu watershed; (ii) saturate hydraulic conductivity (SOL_K) and soil evaporation compensation factor (ESCO) have obvious effects on hydrological responses; this implies that the processes of percolation and evaporation would impact hydrological process in this watershed; (iii) the interactions of ESCO and SNO50COV as well as CN2 and SNO50COV have an obvious effect, implying that snow cover can impact the generation of runoff on land surface and the extraction of soil evaporative demand in lower soil layers. These findings can help enhance the hydrological model's capability for simulating/predicting water resources.
Real-time characterization of partially observed epidemics using surrogate models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Safta, Cosmin; Ray, Jaideep; Lefantzi, Sophia
We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiologicalmore » parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.« less
NASA Astrophysics Data System (ADS)
Coelho, Flavio Codeço; Carvalho, Luiz Max De
2015-12-01
Quantifying the attack ratio of disease is key to epidemiological inference and public health planning. For multi-serotype pathogens, however, different levels of serotype-specific immunity make it difficult to assess the population at risk. In this paper we propose a Bayesian method for estimation of the attack ratio of an epidemic and the initial fraction of susceptibles using aggregated incidence data. We derive the probability distribution of the effective reproductive number, Rt, and use MCMC to obtain posterior distributions of the parameters of a single-strain SIR transmission model with time-varying force of infection. Our method is showcased in a data set consisting of 18 years of dengue incidence in the city of Rio de Janeiro, Brazil. We demonstrate that it is possible to learn about the initial fraction of susceptibles and the attack ratio even in the absence of serotype specific data. On the other hand, the information provided by this approach is limited, stressing the need for detailed serological surveys to characterise the distribution of serotype-specific immunity in the population.
Safety performance of traffic phases and phase transitions in three phase traffic theory.
Xu, Chengcheng; Liu, Pan; Wang, Wei; Li, Zhibin
2015-12-01
Crash risk prediction models were developed to link safety to various phases and phase transitions defined by the three phase traffic theory. Results of the Bayesian conditional logit analysis showed that different traffic states differed distinctly with respect to safety performance. The random-parameter logit approach was utilized to account for the heterogeneity caused by unobserved factors. The Bayesian inference approach based on the Markov Chain Monte Carlo (MCMC) method was used for the estimation of the random-parameter logit model. The proposed approach increased the prediction performance of the crash risk models as compared with the conventional logit model. The three phase traffic theory can help us better understand the mechanism of crash occurrences in various traffic states. The contributing factors to crash likelihood can be well explained by the mechanism of phase transitions. We further discovered that the free flow state can be divided into two sub-phases on the basis of safety performance, including a true free flow state in which the interactions between vehicles are minor, and a platooned traffic state in which bunched vehicles travel in successions. The results of this study suggest that a safety perspective can be added to the three phase traffic theory. The results also suggest that the heterogeneity between different traffic states should be considered when estimating the risks of crash occurrences on freeways. Copyright © 2015 Elsevier Ltd. All rights reserved.
Toward a probabilistic acoustic emission source location algorithm: A Bayesian approach
NASA Astrophysics Data System (ADS)
Schumacher, Thomas; Straub, Daniel; Higgins, Christopher
2012-09-01
Acoustic emissions (AE) are stress waves initiated by sudden strain releases within a solid body. These can be caused by internal mechanisms such as crack opening or propagation, crushing, or rubbing of crack surfaces. One application for the AE technique in the field of Structural Engineering is Structural Health Monitoring (SHM). With piezo-electric sensors mounted to the surface of the structure, stress waves can be detected, recorded, and stored for later analysis. An important step in quantitative AE analysis is the estimation of the stress wave source locations. Commonly, source location results are presented in a rather deterministic manner as spatial and temporal points, excluding information about uncertainties and errors. Due to variability in the material properties and uncertainty in the mathematical model, measures of uncertainty are needed beyond best-fit point solutions for source locations. This paper introduces a novel holistic framework for the development of a probabilistic source location algorithm. Bayesian analysis methods with Markov Chain Monte Carlo (MCMC) simulation are employed where all source location parameters are described with posterior probability density functions (PDFs). The proposed methodology is applied to an example employing data collected from a realistic section of a reinforced concrete bridge column. The selected approach is general and has the advantage that it can be extended and refined efficiently. Results are discussed and future steps to improve the algorithm are suggested.
NASA Astrophysics Data System (ADS)
Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci
2013-04-01
This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.
Markov switching multinomial logit model: An application to accident-injury severities.
Malyshkina, Nataliya V; Mannering, Fred L
2009-07-01
In this study, two-state Markov switching multinomial logit models are proposed for statistical modeling of accident-injury severities. These models assume Markov switching over time between two unobserved states of roadway safety as a means of accounting for potential unobserved heterogeneity. The states are distinct in the sense that in different states accident-severity outcomes are generated by separate multinomial logit processes. To demonstrate the applicability of the approach, two-state Markov switching multinomial logit models are estimated for severity outcomes of accidents occurring on Indiana roads over a four-year time period. Bayesian inference methods and Markov Chain Monte Carlo (MCMC) simulations are used for model estimation. The estimated Markov switching models result in a superior statistical fit relative to the standard (single-state) multinomial logit models for a number of roadway classes and accident types. It is found that the more frequent state of roadway safety is correlated with better weather conditions and that the less frequent state is correlated with adverse weather conditions.
A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.
2018-01-01
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759
Rodrigues, Josemar; Cancho, Vicente G; de Castro, Mário; Balakrishnan, N
2012-12-01
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis--latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
ERIC Educational Resources Information Center
Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun
2002-01-01
Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
Enhancing Data Assimilation by Evolutionary Particle Filter and Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Moradkhani, H.; Abbaszadeh, P.; Yan, H.
2016-12-01
Particle Filters (PFs) have received increasing attention by the researchers from different disciplines in hydro-geosciences as an effective method to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation by means of data assimilation in hydrology and geoscience has evolved since 2005 from SIR-PF to PF-MCMC and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC. In this framework, the posterior distribution undergoes an evolutionary process to update an ensemble of prior states that more closely resemble realistic posterior probability distribution. The premise of this approach is that the particles move to optimal position using the GA optimization coupled with MCMC increasing the number of effective particles, hence the particle degeneracy is avoided while the particle diversity is improved. The proposed algorithm is applied on a conceptual and highly nonlinear hydrologic model and the effectiveness, robustness and reliability of the method in jointly estimating the states and parameters and also reducing the uncertainty is demonstrated for few river basins across the United States.
A Bayesian Alternative for Multi-objective Ecohydrological Model Specification
NASA Astrophysics Data System (ADS)
Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.
2015-12-01
Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.
Modelling past land use using archaeological and pollen data
NASA Astrophysics Data System (ADS)
Pirzamanbein, Behnaz; Lindström, johan; Poska, Anneli; Gaillard-Lemdahl, Marie-José
2016-04-01
Accurate maps of past land use are necessary for studying the impact of anthropogenic land-cover changes on climate and biodiversity. We develop a Bayesian hierarchical model to reconstruct the land use using Gaussian Markov random fields. The model uses two observations sets: 1) archaeological data, representing human settlements, urbanization and agricultural findings; and 2) pollen-based land estimates of the three land-cover types Coniferous forest, Broadleaved forest and Unforested/Open land. The pollen based estimates are obtained from the REVEALS model, based on pollen counts from lakes and bogs. Our developed model uses the sparse pollen-based estimations to reconstruct the spatial continuous cover of three land cover types. Using the open-land component and the archaeological data, the extent of land-use is reconstructed. The model is applied on three time periods - centred around 1900 CE, 1000 and, 4000 BCE over Sweden for which both pollen-based estimates and archaeological data are available. To estimate the model parameters and land use, a block updated Markov chain Monte Carlo (MCMC) algorithm is applied. Using the MCMC posterior samples uncertainties in land-use predictions are computed. Due to lack of good historic land use data, model results are evaluated by cross-validation. Keywords. Spatial reconstruction, Gaussian Markov random field, Fossil pollen records, Archaeological data, Human land-use, Prediction uncertainty
Bayesian resolution of TEM, CSEM and MT soundings: a comparative study
NASA Astrophysics Data System (ADS)
Blatter, D. B.; Ray, A.; Key, K.
2017-12-01
We examine the resolution of three electromagnetic exploration methods commonly used to map the electrical conductivity of the shallow crust - the magnetotelluric (MT) method, the controlled-source electromagnetic (CSEM) method and the transient electromagnetic (TEM) method. TEM and CSEM utilize an artificial source of EM energy, while MT makes use of natural variations in the Earth's electromagnetic field. For a given geological setting and acquisition parameters, each of these methods will have a different resolution due to differences in the source field polarization and the frequency range of the measurements. For example, the MT and TEM methods primarily rely on induced horizontal currents and are most sensitive to conductive layers while the CSEM method generates vertical loops of current and is more sensitive to resistive features. Our study seeks to provide a robust resolution comparison that can help inform exploration geophysicists about which technique is best suited for a particular target. While it is possible to understand and describe a difference in resolution qualitatively, it remains challenging to fully describe it quantitatively using optimization based approaches. Part of the difficulty here stems from the standard electromagnetic inversion toolkit, which makes heavy use of regularization (often in the form of smoothing) to constrain the non-uniqueness inherent in the inverse problem. This regularization makes it difficult to accurately estimate the uncertainty in estimated model parameters - and therefore obscures their true resolution. To overcome this difficulty, we compare the resolution of CSEM, airborne TEM, and MT data quantitatively using a Bayesian trans-dimensional Markov chain Monte Carlo (McMC) inversion scheme. Noisy synthetic data for this study are computed from various representative 1D test models: a conductive anomaly under a conductive/resistive overburden; and a resistive anomaly under a conductive/resistive overburden. In addition to obtaining the full posterior probability density function of the model parameters, we develop a metric to more directly compare the resolution of each method as a function of depth.
Neokosmidis, Ioannis; Kamalakis, Thomas; Chipouras, Aristides; Sphicopoulos, Thomas
2005-01-01
The performance of high-powered wavelength-division multiplexed (WDM) optical networks can be severely degraded by four-wave-mixing- (FWM-) induced distortion. The multicanonical Monte Carlo method (MCMC) is used to calculate the probability-density function (PDF) of the decision variable of a receiver, limited by FWM noise. Compared with the conventional Monte Carlo method previously used to estimate this PDF, the MCMC method is much faster and can accurately estimate smaller error probabilities. The method takes into account the correlation between the components of the FWM noise, unlike the Gaussian model, which is shown not to provide accurate results.
Application of Bayesian Inversion for Multilayer Reservoir Mapping while Drilling Measurements
NASA Astrophysics Data System (ADS)
Wang, J.; Chen, H.; Wang, X.
2017-12-01
Real-time geosteering technology plays a key role in horizontal well development, which keeps the wellbore trajectories within target zones to maximize reservoir contact. The new generation logging while drilling (LWD) resistivity tools have longer spacing and deeper investigation depth, but meanwhile bring a new challenge to inversion of logging data that is formation model not be restricted to few possible numbers of layer such as typical three layers model. If the inappropriate starting models of deterministic and gradient-based methods are adopted may mislead geophysicists in interpretation of subsurface structure. For this purpose, to take advantage of richness of the measurements and deep depth of investigation across multiple formation boundaries, a trans-dimensional Markov chain Monte Carlo(MCMC) inversion algorithm has been developed that combines phase and attenuation measurements at various frequencies and spacings. Unlike conventional gradient-based inversion approaches, MCMC algorithm does not introduce bias from prior information and require any subjective choice of regularization parameter. A synthetic three layers model example demonstrates how the algorithm can be used to image the subsurface using the LWD data. When the tool is far from top boundary, the inversion clearly resolves the boundary position; that is where the boundary histogram shows a large peak. But the measurements cannot resolve the bottom boundary; the large spread between quantiles reflects the uncertainty associated with the bed resolution. As the tool moves closer to the top boundary, the middle layer and bottom layer are resolved and retained models are more similar, the uncertainty associated with these two beds decreases. From the spread observed between models, we can evaluate actual depth of investigation, uncertainty, and sensitivity, which is more useful then just a single best model.
NASA Astrophysics Data System (ADS)
Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens
2017-10-01
The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.
Parameter-expanded data augmentation for Bayesian analysis of capture-recapture models
Royle, J. Andrew; Dorazio, Robert M.
2012-01-01
Data augmentation (DA) is a flexible tool for analyzing closed and open population models of capture-recapture data, especially models which include sources of hetereogeneity among individuals. The essential concept underlying DA, as we use the term, is based on adding "observations" to create a dataset composed of a known number of individuals. This new (augmented) dataset, which includes the unknown number of individuals N in the population, is then analyzed using a new model that includes a reformulation of the parameter N in the conventional model of the observed (unaugmented) data. In the context of capture-recapture models, we add a set of "all zero" encounter histories which are not, in practice, observable. The model of the augmented dataset is a zero-inflated version of either a binomial or a multinomial base model. Thus, our use of DA provides a general approach for analyzing both closed and open population models of all types. In doing so, this approach provides a unified framework for the analysis of a huge range of models that are treated as unrelated "black boxes" and named procedures in the classical literature. As a practical matter, analysis of the augmented dataset by MCMC is greatly simplified compared to other methods that require specialized algorithms. For example, complex capture-recapture models of an augmented dataset can be fitted with popular MCMC software packages (WinBUGS or JAGS) by providing a concise statement of the model's assumptions that usually involves only a few lines of pseudocode. In this paper, we review the basic technical concepts of data augmentation, and we provide examples of analyses of closed-population models (M 0, M h , distance sampling, and spatial capture-recapture models) and open-population models (Jolly-Seber) with individual effects.
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan
2013-03-01
We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.
NASA Astrophysics Data System (ADS)
Yang, P.; Ng, T. L.; Yang, W.
2015-12-01
Effective water resources management depends on the reliable estimation of the uncertainty of drought events. Confidence intervals (CIs) are commonly applied to quantify this uncertainty. A CI seeks to be at the minimal length necessary to cover the true value of the estimated variable with the desired probability. In drought analysis where two or more variables (e.g., duration and severity) are often used to describe a drought, copulas have been found suitable for representing the joint probability behavior of these variables. However, the comprehensive assessment of the parameter uncertainties of copulas of droughts has been largely ignored, and the few studies that have recognized this issue have not explicitly compared the various methods to produce the best CIs. Thus, the objective of this study to compare the CIs generated using two widely applied uncertainty estimation methods, bootstrapping and Markov Chain Monte Carlo (MCMC). To achieve this objective, (1) the marginal distributions lognormal, Gamma, and Generalized Extreme Value, and the copula functions Clayton, Frank, and Plackett are selected to construct joint probability functions of two drought related variables. (2) The resulting joint functions are then fitted to 200 sets of simulated realizations of drought events with known distribution and extreme parameters and (3) from there, using bootstrapping and MCMC, CIs of the parameters are generated and compared. The effect of an informative prior on the CIs generated by MCMC is also evaluated. CIs are produced for different sample sizes (50, 100, and 200) of the simulated drought events for fitting the joint probability functions. Preliminary results assuming lognormal marginal distributions and the Clayton copula function suggest that for cases with small or medium sample sizes (~50-100), MCMC to be superior method if an informative prior exists. Where an informative prior is unavailable, for small sample sizes (~50), both bootstrapping and MCMC yield the same level of performance, and for medium sample sizes (~100), bootstrapping is better. For cases with a large sample size (~200), there is little difference between the CIs generated using bootstrapping and MCMC regardless of whether or not an informative prior exists.
Browne, William J; Steele, Fiona; Golalizadeh, Mousa; Green, Martin J
2009-06-01
We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance.
NASA Astrophysics Data System (ADS)
Hernández, Mario R.; Francés, Félix
2015-04-01
One phase of the hydrological models implementation process, significantly contributing to the hydrological predictions uncertainty, is the calibration phase in which values of the unknown model parameters are tuned by optimizing an objective function. An unsuitable error model (e.g. Standard Least Squares or SLS) introduces noise into the estimation of the parameters. The main sources of this noise are the input errors and the hydrological model structural deficiencies. Thus, the biased calibrated parameters cause the divergence model phenomenon, where the errors variance of the (spatially and temporally) forecasted flows far exceeds the errors variance in the fitting period, and provoke the loss of part or all of the physical meaning of the modeled processes. In other words, yielding a calibrated hydrological model which works well, but not for the right reasons. Besides, an unsuitable error model yields a non-reliable predictive uncertainty assessment. Hence, with the aim of prevent all these undesirable effects, this research focuses on the Bayesian joint inference (BJI) of both the hydrological and error model parameters, considering a general additive (GA) error model that allows for correlation, non-stationarity (in variance and bias) and non-normality of model residuals. As hydrological model, it has been used a conceptual distributed model called TETIS, with a particular split structure of the effective model parameters. Bayesian inference has been performed with the aid of a Markov Chain Monte Carlo (MCMC) algorithm called Dream-ZS. MCMC algorithm quantifies the uncertainty of the hydrological and error model parameters by getting the joint posterior probability distribution, conditioned on the observed flows. The BJI methodology is a very powerful and reliable tool, but it must be used correctly this is, if non-stationarity in errors variance and bias is modeled, the Total Laws must be taken into account. The results of this research show that the application of BJI with a GA error model outperforms the hydrological parameters robustness (diminishing the divergence model phenomenon) and improves the reliability of the streamflow predictive distribution, in respect of the results of a bad error model as SLS. Finally, the most likely prediction in a validation period, for both BJI+GA and SLS error models shows a similar performance.
NASA Astrophysics Data System (ADS)
Schoups, G.; Vrugt, J. A.; Fenicia, F.; van de Giesen, N. C.
2010-10-01
Conceptual rainfall-runoff models have traditionally been applied without paying much attention to numerical errors induced by temporal integration of water balance dynamics. Reliance on first-order, explicit, fixed-step integration methods leads to computationally cheap simulation models that are easy to implement. Computational speed is especially desirable for estimating parameter and predictive uncertainty using Markov chain Monte Carlo (MCMC) methods. Confirming earlier work of Kavetski et al. (2003), we show here that the computational speed of first-order, explicit, fixed-step integration methods comes at a cost: for a case study with a spatially lumped conceptual rainfall-runoff model, it introduces artificial bimodality in the marginal posterior parameter distributions, which is not present in numerically accurate implementations of the same model. The resulting effects on MCMC simulation include (1) inconsistent estimates of posterior parameter and predictive distributions, (2) poor performance and slow convergence of the MCMC algorithm, and (3) unreliable convergence diagnosis using the Gelman-Rubin statistic. We studied several alternative numerical implementations to remedy these problems, including various adaptive-step finite difference schemes and an operator splitting method. Our results show that adaptive-step, second-order methods, based on either explicit finite differencing or operator splitting with analytical integration, provide the best alternative for accurate and efficient MCMC simulation. Fixed-step or adaptive-step implicit methods may also be used for increased accuracy, but they cannot match the efficiency of adaptive-step explicit finite differencing or operator splitting. Of the latter two, explicit finite differencing is more generally applicable and is preferred if the individual hydrologic flux laws cannot be integrated analytically, as the splitting method then loses its advantage.
Tom, Jennifer A; Sinsheimer, Janet S; Suchard, Marc A
Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework.
Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.
2015-01-01
Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework. PMID:26681992
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H; Montesinos-López, José C; Singh, Pawan; Juliana, Philomin; Salinas-Ruiz, Josafhat
2017-05-05
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments. Copyright © 2017 Montesinos-López et al.
Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins
Heath, Tracy A.; Ksepka, Daniel T.; Stadler, Tanja; Welch, David; Drummond, Alexei J.
2017-01-01
The total-evidence approach to divergence time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process and can produce estimates that lead to erroneous conclusions. We address this shortcoming by providing a total-evidence method implemented in a Bayesian framework. This approach uses a mechanistic tree prior to describe the underlying diversification process that generated the tree of extant and fossil taxa. Previous attempts to apply the total-evidence approach have used tree priors that do not account for the possibility that fossil samples may be direct ancestors of other samples, that is, ancestors of fossil or extant species or of clades. The fossilized birth–death (FBD) process explicitly models the diversification, fossilization, and sampling processes and naturally allows for sampled ancestors. This model was recently applied to estimate divergence times based on molecular data and fossil occurrence dates. We incorporate the FBD model and a model of morphological trait evolution into a Bayesian total-evidence approach to dating species phylogenies. We apply this method to extant and fossil penguins and show that the modern penguins radiated much more recently than has been previously estimated, with the basal divergence in the crown clade occurring at \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}${\\sim}12.7$\\end{document} Ma and most splits leading to extant species occurring in the last 2 myr. Our results demonstrate that including stem-fossil diversity can greatly improve the estimates of the divergence times of crown taxa. The method is available in BEAST2 (version 2.4) software www.beast2.org with packages SA (version at least 1.1.4) and morph-models (version at least 1.0.4) installed. [Birth–death process; calibration; divergence times; MCMC; phylogenetics.] PMID:28173531
Appraisal of jump distributions in ensemble-based sampling algorithms
NASA Astrophysics Data System (ADS)
Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo
2017-04-01
Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.
Estimating diversifying selection and functional constraint in the presence of recombination.
Wilson, Daniel J; McVean, Gilean
2006-03-01
Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.
Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination
Wilson, Daniel J.; McVean, Gilean
2006-01-01
Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques. PMID:16387887
NASA Astrophysics Data System (ADS)
Eric, L.; Vrugt, J. A.
2010-12-01
Spatially distributed hydrologic models potentially contain hundreds of parameters that need to be derived by calibration against a historical record of input-output data. The quality of this calibration strongly determines the predictive capability of the model and thus its usefulness for science-based decision making and forecasting. Unfortunately, high-dimensional optimization problems are typically difficult to solve. Here we present our recent developments to the Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009) to warrant efficient solution of high-dimensional parameter estimation problems. The algorithm samples from an archive of past states (Ter Braak and Vrugt, 2008), and uses multiple-try Metropolis sampling (Liu et al., 2000) to decrease the required burn-in time for each individual chain and increase efficiency of posterior sampling. This approach is hereafter referred to as MT-DREAM. We present results for 2 synthetic mathematical case studies, and 2 real-world examples involving from 10 to 240 parameters. Results for those cases show that our multiple-try sampler, MT-DREAM, can consistently find better solutions than other Bayesian MCMC methods. Moreover, MT-DREAM is admirably suited to be implemented and ran on a parallel machine and is therefore a powerful method for posterior inference.
Approximate Bayesian computation in large-scale structure: constraining the galaxy-halo connection
NASA Astrophysics Data System (ADS)
Hahn, ChangHoon; Vakili, Mohammadjavad; Walsh, Kilian; Hearin, Andrew P.; Hogg, David W.; Campbell, Duncan
2017-08-01
Standard approaches to Bayesian parameter inference in large-scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as approximate Bayesian computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter haloes with galaxies. Using specific implementation of ABC supplemented with population Monte Carlo importance sampling, a generative forward model using HOD and a distance metric based on galaxy number density, two-point correlation function and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected 'true' HOD parameters. The parameter constraints we obtain from ABC are consistent with the 'true' HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately, our results suggest that ABC can and should be applied in parameter inference for LSS analyses.
Nonstationary Extreme Value Analysis in a Changing Climate: A Software Package
NASA Astrophysics Data System (ADS)
Cheng, L.; AghaKouchak, A.; Gilleland, E.
2013-12-01
Numerous studies show that climatic extremes have increased substantially in the second half of the 20th century. For this reason, analysis of extremes under a nonstationary assumption has received a great deal of attention. This paper presents a software package developed for estimation of return levels, return periods, and risks of climatic extremes in a changing climate. This MATLAB software package offers tools for analysis of climate extremes under both stationary and non-stationary assumptions. The Nonstationary Extreme Value Analysis (hereafter, NEVA) provides an efficient and generalized framework for analyzing extremes using Bayesian inference. NEVA estimates the extreme value parameters using a Differential Evolution Markov Chain (DE-MC) which utilizes the genetic algorithm Differential Evolution (DE) for global optimization over the real parameter space with the Markov Chain Monte Carlo (MCMC) approach and has the advantage of simplicity, speed of calculation and convergence over conventional MCMC. NEVA also offers the confidence interval and uncertainty bounds of estimated return levels based on the sampled parameters. NEVA integrates extreme value design concepts, data analysis tools, optimization and visualization, explicitly designed to facilitate analysis extremes in geosciences. The generalized input and output files of this software package make it attractive for users from across different fields. Both stationary and nonstationary components of the package are validated for a number of case studies using empirical return levels. The results show that NEVA reliably describes extremes and their return levels.
Quenching histories of galaxies and the role of AGN feedback
NASA Astrophysics Data System (ADS)
Smethurst, Rebecca Jane; Lintott, Chris; Simmons, Brooke; Galaxy Zoo Team
2016-01-01
Two open issues in modern astrophysics are: (i) how do galaxies fully quench their star formation and (ii) how is this affected - or not - by AGN feedback? I present the results of a new Bayesian-MCMC analysis of the star formation histories of over 126,000 galaxies across the colour magnitude diagram showing that diverse quenching mechanisms are instrumental in the formation of the present day red sequence. Using classifications from Galaxy Zoo we show that the rate at which quenching can occur is morphologically dependent in each of the blue cloud, green valley and red sequence. We discuss the nature of these possible quenching mechanisms, considering the influence of secular evolution, galaxy interactions and mergers, both with and without black hole activity. We focus particularly on the relationship between these quenched star formation histories and the presence of an AGN by using this new Bayesian method to show a population of type 2 AGN host galaxies have recently (within 2 Gyr) undergone a rapid (τ < 1 Gyr) drop in their star formation rate. With this result we therefore present the first statistically supported observational evidence that AGN feedback is an important mechanism for the cessation of star formation in this population of galaxies. The diversity of this new method also highlights that such rapid quenching histories cannot account fully for all the quenching across the current AGN host population. We demonstrate that slower (τ > 2 Gyr) quenching rates dominate for high stellar mass (log10[M*/M⊙] > 10.75) hosts of AGN with both early- and late-type morphology. We discuss how these results show that both merger-driven and non-merger processes are contributing to the co-evolution of galaxies and supermassive black holes across the entirety of the colour magnitude diagram.
Cabassa, Leopoldo J; Humensky, Jennifer; Druss, Benjamin; Lewis-Fernández, Roberto; Gomes, Arminda P; Wang, Shuai; Blanco, Carlos
2013-06-01
The proportion of people in the United States with multiple chronic medical conditions (MCMC) is increasing. Yet, little is known about the relationship that race, ethnicity, and psychiatric disorders have on the prevalence of MCMCs in the general population. This study used data from wave 2 of the National Epidemiologic Survey on Alcohol and Related Conditions (N=33,107). Multinomial logistic regression models adjusting for sociodemographic variables, body mass index, and quality of life were used to examine differences in the 12-month prevalence of MCMC by race/ethnicity, psychiatric diagnosis, and the interactions between race/ethnicity and psychiatric diagnosis. Compared to non-Hispanic Whites, Hispanics reported lower odds of MCMC and African Americans reported higher odds of MCMC after adjusting for covariates. People with psychiatric disorders reported higher odds of MCMC compared with people without psychiatric disorders. There were significant interactions between race and psychiatric diagnosis associated with rates of MCMC. In the presence of certain psychiatric disorders, the odds of MCMC were higher among African Americans with psychiatric disorders compared to non-Hispanic Whites with similar psychiatric disorders. Our study results indicate that race, ethnicity, and psychiatric disorders are associated with the prevalence of MCMC. As the rates of MCMC rise, it is critical to identify which populations are at increased risk and how to best direct services to address their health care needs.
Riviere, Marie-Karelle; Ueckert, Sebastian; Mentré, France
2016-10-01
Non-linear mixed effect models (NLMEMs) are widely used for the analysis of longitudinal data. To design these studies, optimal design based on the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trial simulations. In recent years, estimation algorithms for NLMEMs have transitioned from linearization toward more exact higher-order methods. Optimal design, on the other hand, has mainly relied on first-order (FO) linearization to calculate the FIM. Although efficient in general, FO cannot be applied to complex non-linear models and with difficulty in studies with discrete data. We propose an approach to evaluate the expected FIM in NLMEMs for both discrete and continuous outcomes. We used Markov Chain Monte Carlo (MCMC) to integrate the derivatives of the log-likelihood over the random effects, and Monte Carlo to evaluate its expectation w.r.t. the observations. Our method was implemented in R using Stan, which efficiently draws MCMC samples and calculates partial derivatives of the log-likelihood. Evaluated on several examples, our approach showed good performance with relative standard errors (RSEs) close to those obtained by simulations. We studied the influence of the number of MC and MCMC samples and computed the uncertainty of the FIM evaluation. We also compared our approach to Adaptive Gaussian Quadrature, Laplace approximation, and FO. Our method is available in R-package MIXFIM and can be used to evaluate the FIM, its determinant with confidence intervals (CIs), and RSEs with CIs. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Tutorial: Asteroseismic Stellar Modelling with AIMS
NASA Astrophysics Data System (ADS)
Lund, Mikkel N.; Reese, Daniel R.
The goal of aims (Asteroseismic Inference on a Massive Scale) is to estimate stellar parameters and credible intervals/error bars in a Bayesian manner from a set of asteroseismic frequency data and so-called classical constraints. To achieve reliable parameter estimates and computational efficiency, it searches through a grid of pre-computed models using an MCMC algorithm—interpolation within the grid of models is performed by first tessellating the grid using a Delaunay triangulation and then doing a linear barycentric interpolation on matching simplexes. Inputs for the modelling consist of individual frequencies from peak-bagging, which can be complemented with classical spectroscopic constraints. aims is mostly written in Python with a modular structure to facilitate contributions from the community. Only a few computationally intensive parts have been rewritten in Fortran in order to speed up calculations.
Investigation of error sources in regional inverse estimates of greenhouse gas emissions in Canada
NASA Astrophysics Data System (ADS)
Chan, E.; Chan, D.; Ishizawa, M.; Vogel, F.; Brioude, J.; Delcloo, A.; Wu, Y.; Jin, B.
2015-08-01
Inversion models can use atmospheric concentration measurements to estimate surface fluxes. This study is an evaluation of the errors in a regional flux inversion model for different provinces of Canada, Alberta (AB), Saskatchewan (SK) and Ontario (ON). Using CarbonTracker model results as the target, the synthetic data experiment analyses examined the impacts of the errors from the Bayesian optimisation method, prior flux distribution and the atmospheric transport model, as well as their interactions. The scaling factors for different sub-regions were estimated by the Markov chain Monte Carlo (MCMC) simulation and cost function minimization (CFM) methods. The CFM method results are sensitive to the relative size of the assumed model-observation mismatch and prior flux error variances. Experiment results show that the estimation error increases with the number of sub-regions using the CFM method. For the region definitions that lead to realistic flux estimates, the numbers of sub-regions for the western region of AB/SK combined and the eastern region of ON are 11 and 4 respectively. The corresponding annual flux estimation errors for the western and eastern regions using the MCMC (CFM) method are -7 and -3 % (0 and 8 %) respectively, when there is only prior flux error. The estimation errors increase to 36 and 94 % (40 and 232 %) resulting from transport model error alone. When prior and transport model errors co-exist in the inversions, the estimation errors become 5 and 85 % (29 and 201 %). This result indicates that estimation errors are dominated by the transport model error and can in fact cancel each other and propagate to the flux estimates non-linearly. In addition, it is possible for the posterior flux estimates having larger differences than the prior compared to the target fluxes, and the posterior uncertainty estimates could be unrealistically small that do not cover the target. The systematic evaluation of the different components of the inversion model can help in the understanding of the posterior estimates and percentage errors. Stable and realistic sub-regional and monthly flux estimates for western region of AB/SK can be obtained, but not for the eastern region of ON. This indicates that it is likely a real observation-based inversion for the annual provincial emissions will work for the western region whereas; improvements are needed with the current inversion setup before real inversion is performed for the eastern region.
Cruz-Marcelo, Alejandro; Ensor, Katherine B; Rosner, Gary L
2011-06-01
The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material.
Cruz-Marcelo, Alejandro; Ensor, Katherine B.; Rosner, Gary L.
2011-01-01
The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material. PMID:21765566
NASA Astrophysics Data System (ADS)
Tang, Yuping; Wang, Daniel; Wilson, Grant; Gutermuth, Robert; Heyer, Mark
2018-01-01
We present the AzTEC/LMT survey of dust continuum at 1.1mm on the central ˜ 200pc (CMZ) of our Galaxy. A joint SED analysis of all existing dust continuum surveys on the CMZ is performed, from 160µm to 1.1mm. Our analysis follows a MCMC sampling strategy incorporating the knowledge of PSFs in different maps, which provides unprecedented spacial resolution on distributions of dust temperature, column density and emissivity index. The dense clumps in the CMZ typically show low dust temperature ( 20K), with no significant sign of buried star formation, and a weak evolution of higher emissivity index toward dense peak. A new model is proposed, allowing for varying dust temperature inside a cloud and self-shielding of dust emission, which leads to similar conclusions on dust temperature and grain properties. We further apply a hierarchical Bayesian analysis to infer the column density probability distribution function (N-PDF), while simultaneously removing the Galactic foreground and background emission. The N-PDF shows a steep power-law profile with α > 3, indicating that formation of dense structures are suppressed.
Zhang, Hua; Huo, Mingdong; Chao, Jianqian; Liu, Pei
2016-01-01
Hepatitis B virus (HBV) infection is a major problem for public health; timely antiviral treatment can significantly prevent the progression of liver damage from HBV by slowing down or stopping the virus from reproducing. In the study we applied Bayesian approach to cost-effectiveness analysis, using Markov Chain Monte Carlo (MCMC) simulation methods for the relevant evidence input into the model to evaluate cost-effectiveness of entecavir (ETV) and lamivudine (LVD) therapy for chronic hepatitis B (CHB) in Jiangsu, China, thus providing information to the public health system in the CHB therapy. Eight-stage Markov model was developed, a hypothetical cohort of 35-year-old HBeAg-positive patients with CHB was entered into the model. Treatment regimens were LVD100mg daily and ETV 0.5 mg daily. The transition parameters were derived either from systematic reviews of the literature or from previous economic studies. The outcome measures were life-years, quality-adjusted lifeyears (QALYs), and expected costs associated with the treatments and disease progression. For the Bayesian models all the analysis was implemented by using WinBUGS version 1.4. Expected cost, life expectancy, QALYs decreased with age. Cost-effectiveness increased with age. Expected cost of ETV was less than LVD, while life expectancy and QALYs were higher than that of LVD, ETV strategy was more cost-effective. Costs and benefits of the Monte Carlo simulation were very close to the results of exact form among the group, but standard deviation of each group indicated there was a big difference between individual patients. Compared with lamivudine, entecavir is the more cost-effective option. CHB patients should accept antiviral treatment as soon as possible as the lower age the more cost-effective. Monte Carlo simulation obtained costs and effectiveness distribution, indicate our Markov model is of good robustness.
Uncertainty Analysis and Parameter Estimation For Nearshore Hydrodynamic Models
NASA Astrophysics Data System (ADS)
Ardani, S.; Kaihatu, J. M.
2012-12-01
Numerical models represent deterministic approaches used for the relevant physical processes in the nearshore. Complexity of the physics of the model and uncertainty involved in the model inputs compel us to apply a stochastic approach to analyze the robustness of the model. The Bayesian inverse problem is one powerful way to estimate the important input model parameters (determined by apriori sensitivity analysis) and can be used for uncertainty analysis of the outputs. Bayesian techniques can be used to find the range of most probable parameters based on the probability of the observed data and the residual errors. In this study, the effect of input data involving lateral (Neumann) boundary conditions, bathymetry and off-shore wave conditions on nearshore numerical models are considered. Monte Carlo simulation is applied to a deterministic numerical model (the Delft3D modeling suite for coupled waves and flow) for the resulting uncertainty analysis of the outputs (wave height, flow velocity, mean sea level and etc.). Uncertainty analysis of outputs is performed by random sampling from the input probability distribution functions and running the model as required until convergence to the consistent results is achieved. The case study used in this analysis is the Duck94 experiment, which was conducted at the U.S. Army Field Research Facility at Duck, North Carolina, USA in the fall of 1994. The joint probability of model parameters relevant for the Duck94 experiments will be found using the Bayesian approach. We will further show that, by using Bayesian techniques to estimate the optimized model parameters as inputs and applying them for uncertainty analysis, we can obtain more consistent results than using the prior information for input data which means that the variation of the uncertain parameter will be decreased and the probability of the observed data will improve as well. Keywords: Monte Carlo Simulation, Delft3D, uncertainty analysis, Bayesian techniques, MCMC
NASA Astrophysics Data System (ADS)
Davis, A. D.; Huan, X.; Heimbach, P.; Marzouk, Y.
2017-12-01
Borehole data are essential for calibrating ice sheet models. However, field expeditions for acquiring borehole data are often time-consuming, expensive, and dangerous. It is thus essential to plan the best sampling locations that maximize the value of data while minimizing costs and risks. We present an uncertainty quantification (UQ) workflow based on rigorous probability framework to achieve these objectives. First, we employ an optimal experimental design (OED) procedure to compute borehole locations that yield the highest expected information gain. We take into account practical considerations of location accessibility (e.g., proximity to research sites, terrain, and ice velocity may affect feasibility of drilling) and robustness (e.g., real-time constraints such as weather may force researchers to drill at sub-optimal locations near those originally planned), by incorporating a penalty reflecting accessibility as well as sensitivity to deviations from the optimal locations. Next, we extract vertical temperature profiles from these boreholes and formulate a Bayesian inverse problem to reconstruct past surface temperatures. Using a model of temperature advection/diffusion, the top boundary condition (corresponding to surface temperatures) is calibrated via efficient Markov chain Monte Carlo (MCMC). The overall procedure can then be iterated to choose new optimal borehole locations for the next expeditions.Through this work, we demonstrate powerful UQ methods for designing experiments, calibrating models, making predictions, and assessing sensitivity--all performed under an uncertain environment. We develop a theoretical framework as well as practical software within an intuitive workflow, and illustrate their usefulness for combining data and models for environmental and climate research.
Cosmology with the largest galaxy cluster surveys: going beyond Fisher matrix forecasts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khedekar, Satej; Majumdar, Subhabrata, E-mail: satej@mpa-garching.mpg.de, E-mail: subha@tifr.res.in
2013-02-01
We make the first detailed MCMC likelihood study of cosmological constraints that are expected from some of the largest, ongoing and proposed, cluster surveys in different wave-bands and compare the estimates to the prevalent Fisher matrix forecasts. Mock catalogs of cluster counts expected from the surveys — eROSITA, WFXT, RCS2, DES and Planck, along with a mock dataset of follow-up mass calibrations are analyzed for this purpose. A fair agreement between MCMC and Fisher results is found only in the case of minimal models. However, for many cases, the marginalized constraints obtained from Fisher and MCMC methods can differ bymore » factors of 30-100%. The discrepancy can be alarmingly large for a time dependent dark energy equation of state, w(a); the Fisher methods are seen to under-estimate the constraints by as much as a factor of 4-5. Typically, Fisher estimates become more and more inappropriate as we move away from ΛCDM, to a constant-w dark energy to varying-w dark energy cosmologies. Fisher analysis, also, predicts incorrect parameter degeneracies. There are noticeable offsets in the likelihood contours obtained from Fisher methods that is caused due to an asymmetry in the posterior likelihood distribution as seen through a MCMC analysis. From the point of mass-calibration uncertainties, a high value of unknown scatter about the mean mass-observable relation, and its redshift dependence, is seen to have large degeneracies with the cosmological parameters σ{sub 8} and w(a) and can degrade the cosmological constraints considerably. We find that the addition of mass-calibrated cluster datasets can improve dark energy and σ{sub 8} constraints by factors of 2-3 from what can be obtained from CMB+SNe+BAO only . Finally, we show that a joint analysis of datasets of two (or more) different cluster surveys would significantly tighten cosmological constraints from using clusters only. Since, details of future cluster surveys are still being planned, we emphasize that optimal survey design must be done using MCMC analysis rather than Fisher forecasting.« less
NASA Astrophysics Data System (ADS)
Susiluoto, Jouni; Raivonen, Maarit; Backman, Leif; Laine, Marko; Makela, Jarmo; Peltola, Olli; Vesala, Timo; Aalto, Tuula
2018-03-01
Estimating methane (CH4) emissions from natural wetlands is complex, and the estimates contain large uncertainties. The models used for the task are typically heavily parameterized and the parameter values are not well known. In this study, we perform a Bayesian model calibration for a new wetland CH4 emission model to improve the quality of the predictions and to understand the limitations of such models.The detailed process model that we analyze contains descriptions for CH4 production from anaerobic respiration, CH4 oxidation, and gas transportation by diffusion, ebullition, and the aerenchyma cells of vascular plants. The processes are controlled by several tunable parameters. We use a hierarchical statistical model to describe the parameters and obtain the posterior distributions of the parameters and uncertainties in the processes with adaptive Markov chain Monte Carlo (MCMC), importance resampling, and time series analysis techniques. For the estimation, the analysis utilizes measurement data from the Siikaneva flux measurement site in southern Finland. The uncertainties related to the parameters and the modeled processes are described quantitatively. At the process level, the flux measurement data are able to constrain the CH4 production processes, methane oxidation, and the different gas transport processes. The posterior covariance structures explain how the parameters and the processes are related. Additionally, the flux and flux component uncertainties are analyzed both at the annual and daily levels. The parameter posterior densities obtained provide information regarding importance of the different processes, which is also useful for development of wetland methane emission models other than the square root HelsinkI Model of MEthane buiLd-up and emIssion for peatlands (sqHIMMELI). The hierarchical modeling allows us to assess the effects of some of the parameters on an annual basis. The results of the calibration and the cross validation suggest that the early spring net primary production could be used to predict parameters affecting the annual methane production. Even though the calibration is specific to the Siikaneva site, the hierarchical modeling approach is well suited for larger-scale studies and the results of the estimation pave way for a regional or global-scale Bayesian calibration of wetland emission models.
NASA Astrophysics Data System (ADS)
Kubo, H.; Asano, K.; Iwata, T.; Aoi, S.
2014-12-01
Previous studies for the period-dependent source characteristics of the 2011 Tohoku earthquake (e.g., Koper et al., 2011; Lay et al., 2012) were based on the short and long period source models using different method. Kubo et al. (2013) obtained source models of the 2011 Tohoku earthquake using multi period-bands waveform data by a common inversion method and discussed its period-dependent source characteristics. In this study, to achieve more in detail spatiotemporal source rupture behavior of this event, we introduce a new fault surface model having finer sub-fault size and estimate the source models in multi period-bands using a Bayesian inversion method combined with a multi-time-window method. Three components of velocity waveforms at 25 stations of K-NET, KiK-net, and F-net of NIED are used in this analysis. The target period band is 10-100 s. We divide this period band into three period bands (10-25 s, 25-50 s, and 50-100 s) and estimate a kinematic source model in each period band using a Bayesian inversion method with MCMC sampling (e.g., Fukuda & Johnson, 2008; Minson et al., 2013, 2014). The parameterization of spatiotemporal slip distribution follows the multi-time-window method (Hartzell & Heaton, 1983). The Green's functions are calculated by the 3D FDM (GMS; Aoi & Fujiwara, 1999) using a 3D velocity structure model (JIVSM; Koketsu et al., 2012). The assumed fault surface model is based on the Pacific plate boundary of JIVSM and is divided into 384 subfaults of about 16 * 16 km^2. The estimated source models in multi period-bands show the following source image: (1) First deep rupture off Miyagi at 0-60 s toward down-dip mostly radiating relatively short period (10-25 s) seismic waves. (2) Shallow rupture off Miyagi at 45-90 s toward up-dip with long duration radiating long period (50-100 s) seismic wave. (3) Second deep rupture off Miyagi at 60-105 s toward down-dip radiating longer period seismic waves then that of the first deep rupture. (4) Deep rupture off Fukushima at 90-135 s. The dominant-period difference of the seismic-wave radiation between two deep ruptures off Miyagi may result from the mechanism that small-scale heterogeneities on the fault are removed by the first rupture. This difference can be also interpreted by the concept of multi-scale dynamic rupture (Ide & Aochi, 2005).
Benefits of seasonal forecasts of crop yields
NASA Astrophysics Data System (ADS)
Sakurai, G.; Okada, M.; Nishimori, M.; Yokozawa, M.
2017-12-01
Major factors behind recent fluctuations in food prices include increased biofuel production and oil price fluctuations. In addition, several extreme climate events that reduced worldwide food production coincided with upward spikes in food prices. The stabilization of crop yields is one of the most important tasks to stabilize food prices and thereby enhance food security. Recent development of technologies related to crop modeling and seasonal weather forecasting has made it possible to forecast future crop yields for maize and soybean. However, the effective use of these technologies remains limited. Here we present the potential benefits of seasonal crop-yield forecasts on a global scale for choice of planting day. For this purpose, we used a model (PRYSBI-2) that can well replicate past crop yields both for maize and soybean. This model system uses a Bayesian statistical approach to estimate the parameters of a basic process-based model of crop growth. The spatial variability of model parameters was considered by estimating the posterior distribution of the parameters from historical yield data by using the Markov-chain Monte Carlo (MCMC) method with a resolution of 1.125° × 1.125°. The posterior distributions of model parameters were estimated for each spatial grid with 30 000 MCMC steps of 10 chains each. By using this model and the estimated parameter distributions, we were able to estimate not only crop yield but also levels of associated uncertainty. We found that the global average crop yield increased about 30% as the result of the optimal selection of planting day and that the seasonal forecast of crop yield had a large benefit in and near the eastern part of Brazil and India for maize and the northern area of China for soybean. In these countries, the effects of El Niño and Indian Ocean dipole are large. The results highlight the importance of developing a system to forecast global crop yields.
On the estimation of the reproduction number based on misreported epidemic data.
Azmon, Amin; Faes, Christel; Hens, Niel
2014-03-30
Epidemic data often suffer from underreporting and delay in reporting. In this paper, we investigated the impact of delays and underreporting on estimates of reproduction number. We used a thinned version of the epidemic renewal equation to describe the epidemic process while accounting for the underlying reporting system. Assuming a constant reporting parameter, we used different delay patterns to represent the delay structure in our model. Instead of assuming a fixed delay distribution, we estimated the delay parameters while assuming a smooth function for the reproduction number over time. In order to estimate the parameters, we used a Bayesian semiparametric approach with penalized splines, allowing both flexibility and exact inference provided by MCMC. To show the performance of our method, we performed different simulation studies. We conducted sensitivity analyses to investigate the impact of misspecification of the delay pattern and the impact of assuming nonconstant reporting parameters on the estimates of the reproduction numbers. We showed that, whenever available, additional information about time-dependent underreporting can be taken into account. As an application of our method, we analyzed confirmed daily A(H1N1) v2009 cases made publicly available by the World Health Organization for Mexico and the USA. Copyright © 2013 John Wiley & Sons, Ltd.
Seismic imaging of Q structures by a trans-dimensional coda-wave analysis
NASA Astrophysics Data System (ADS)
Takahashi, Tsutomu
2017-04-01
Wave scattering and intrinsic attenuation are important processes to describe incoherent and complex wave trains of high frequency seismic wave (>1Hz). The multiple lapse time window analysis (MLTWA) has been used to estimate scattering and intrinsic Q values by assuming constant Q in a study area (e.g., Hoshiba 1993). This study generalizes this MLTWA to estimate lateral variations of Q values under the Bayesian framework in dimension variable space. Study area is partitioned into small areas by means of the Voronoi tessellation. Scattering and intrinsic Q in each small area are constant. We define a misfit function for spatiotemporal variations of wave energy as with the original MLTWA, and maximize the posterior probability with changing not only Q values but the number and spatial layout of the Voronoi cells. This maximization is conducted by means of the reversible jump Markov chain Monte Carlo (rjMCMC) (Green 1995) since the number of unknown parameters (i.e., dimension of posterior probability) is variable. After a convergence to the maximum posterior, we estimate Q structures from the ensemble averages of MCMC samples around the maximum posterior probability. Synthetic tests showed stable reconstructions of input structures with reasonable error distributions. We applied this method for seismic waveform data recorded by ocean bottom seismograms at the outer-rise area off Tohoku, and estimated Q values at 4-8Hz, 8-16Hz and 16-32Hz. Intrinsic Q are nearly constant at all frequency bands, and scattering Q shows two distinct strong scattering regions at petit spot area and high seismicity area. These strong scattering are probably related to magma inclusions and fractured structure, respectively. Difference between these two areas becomes clear at high frequencies. It means that scale dependences of inhomogeneities or smaller scale inhomogeneity is important to discuss medium property and origins of structural variations. While the generalized MLTWA is based on a classical waveform modeling in constant Q medium, this method can be a fundamental basis for Q structure imaging in the crust.
NASA Astrophysics Data System (ADS)
Tauscher, Keith A.; Burns, Jack O.; Rapetti, David; Mirocha, Jordan; Monsalve, Raul A.
2017-01-01
The Dark Ages Radio Explorer (DARE) is a mission concept proposed to NASA in which a crossed dipole antenna collects low frequency (40-120 MHz) radio measurements above the farside of the Moon to detect and characterize the global 21-cm signal from the early (z~35-11) Universe's neutral hydrogen. Simulated data for DARE includes: 1) the global signal modeled using the ares code, 2) spectrally smooth Galactic foregrounds with spatial structure taken from multiple radio foreground maps averaged over a large, well characterized beam, 3) systematics introduced in the data by antenna/receiver reflections, and 4) the Moon. This simulated data is fed into a signal extraction pipeline. As the signal is 4-5 orders of magnitude below the Galactic synchrotron contribution, it is best extracted from the data using Bayesian techniques which take full advantage of prior knowledge of the instrument and foregrounds. For the DARE pipeline, we use the affine-invariant MCMC algorithm implemented in the Python package, emcee. The pipeline also employs singular value decomposition to use known spectral features of the antenna and receiver to form a natural basis with which to fit instrumental systematics. Taking advantage of high-fidelity measurements of the antenna beam (to ~20 ppm) and precise calibration of the instrument, the pipeline extracts the global 21-cm signal with an average RMS error of 10-15 mK for multiple signal models.
Almost but not quite 2D, Non-linear Bayesian Inversion of CSEM Data
NASA Astrophysics Data System (ADS)
Ray, A.; Key, K.; Bodin, T.
2013-12-01
The geophysical inverse problem can be elegantly stated in a Bayesian framework where a probability distribution can be viewed as a statement of information regarding a random variable. After all, the goal of geophysical inversion is to provide information on the random variables of interest - physical properties of the earth's subsurface. However, though it may be simple to postulate, a practical difficulty of fully non-linear Bayesian inversion is the computer time required to adequately sample the model space and extract the information we seek. As a consequence, in geophysical problems where evaluation of a full 2D/3D forward model is computationally expensive, such as marine controlled source electromagnetic (CSEM) mapping of the resistivity of seafloor oil and gas reservoirs, Bayesian studies have largely been conducted with 1D forward models. While the 1D approximation is indeed appropriate for exploration targets with planar geometry and geological stratification, it only provides a limited, site-specific idea of uncertainty in resistivity with depth. In this work, we extend our fully non-linear 1D Bayesian inversion to a 2D model framework, without requiring the usual regularization of model resistivities in the horizontal or vertical directions used to stabilize quasi-2D inversions. In our approach, we use the reversible jump Markov-chain Monte-Carlo (RJ-MCMC) or trans-dimensional method and parameterize the subsurface in a 2D plane with Voronoi cells. The method is trans-dimensional in that the number of cells required to parameterize the subsurface is variable, and the cells dynamically move around and multiply or combine as demanded by the data being inverted. This approach allows us to expand our uncertainty analysis of resistivity at depth to more than a single site location, allowing for interactions between model resistivities at different horizontal locations along a traverse over an exploration target. While the model is parameterized in 2D, we efficiently evaluate the forward response using 1D profiles extracted from the model at the common-midpoints of the EM source-receiver pairs. Since the 1D approximation is locally valid at different midpoint locations, the computation time is far lower than is required by a full 2D or 3D simulation. We have applied this method to both synthetic and real CSEM survey data from the Scarborough gas field on the Northwest shelf of Australia, resulting in a spatially variable quantification of resistivity and its uncertainty in 2D. This Bayesian approach results in a large database of 2D models that comprise a posterior probability distribution, which we can subset to test various hypotheses about the range of model structures compatible with the data. For example, we can subset the model distributions to examine the hypothesis that a resistive reservoir extends overs a certain spatial extent. Depending on how this conditions other parts of the model space, light can be shed on the geological viability of the hypothesis. Since tackling spatially variable uncertainty and trade-offs in 2D and 3D is a challenging research problem, the insights gained from this work may prove valuable for subsequent full 2D and 3D Bayesian inversions.
O'Reilly, Joseph E; Donoghue, Philip C J
2018-03-01
Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
O’Reilly, Joseph E; Donoghue, Philip C J
2018-01-01
Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675
Li, Michael; Dushoff, Jonathan; Bolker, Benjamin M
2018-07-01
Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical frameworks in these contexts. Here we build a simple stochastic, discrete-time, discrete-state epidemic model with both process and observation error and use it to characterize the effectiveness of different flavours of Bayesian Markov chain Monte Carlo (MCMC) techniques. We use fits to simulated data, where parameters (and future behaviour) are known, to explore the limitations of different platforms and quantify parameter estimation accuracy, forecasting accuracy, and computational efficiency across combinations of modeling decisions (e.g. discrete vs. continuous latent states, levels of stochasticity) and computational platforms (JAGS, NIMBLE, Stan).
Neural Network Emulation of Reionization Simulations
NASA Astrophysics Data System (ADS)
Schmit, Claude J.; Pritchard, Jonathan R.
2018-05-01
Next generation radio experiments such as LOFAR, HERA and SKA are expected to probe the Epoch of Reionization and claim a first direct detection of the cosmic 21cm signal within the next decade. One of the major challenges for these experiments will be dealing with enormous incoming data volumes. Machine learning is key to increasing our data analysis efficiency. We consider the use of an artificial neural network to emulate 21cmFAST simulations and use it in a Bayesian parameter inference study. We then compare the network predictions to a direct evaluation of the EoR simulations and analyse the dependence of the results on the training set size. We find that the use of a training set of size 100 samples can recover the error contours of a full scale MCMC analysis which evaluates the model at each step.
SAChES: Scalable Adaptive Chain-Ensemble Sampling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah
We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
Comparing models for perfluorooctanoic acid pharmacokinetics using Bayesian analysis.
Wambaugh, John F; Barton, Hugh A; Setzer, R Woodrow
2008-12-01
Selecting the appropriate pharmacokinetic (PK) model given the available data is investigated for perfluorooctanoic acid (PFOA), which has been widely analyzed with an empirical, one-compartment model. This research examined the results of experiments [Kemper R. A., DuPont Haskell Laboratories, USEPA Administrative Record AR-226.1499 (2003)] that administered single oral or iv doses of PFOA to adult male and female rats. PFOA concentration was observed over time; in plasma for some animals and in fecal and urinary excretion for others. There were four rats per dose group, for a total of 36 males and 36 females. Assuming that the PK parameters for each individual within a gender were drawn from the same, biologically varying population, plasma and excretion data were jointly analyzed using a hierarchical framework to separate uncertainty due to measurement error from actual biological variability. Bayesian analysis using Markov Chain Monte Carlo (MCMC) provides tools to perform such an analysis as well as quantitative diagnostics to evaluate and discriminate between models. Starting from a one-compartment PK model with separate clearances to urine and feces, the model was incrementally expanded using Bayesian measures to assess if the expansion was supported by the data. PFOA excretion is sexually dimorphic in rats; male rats have bi-phasic elimination that is roughly 40 times slower than that of the females, which appear to have a single elimination phase. The male and female data were analyzed separately, keeping only the parameters describing the measurement process in common. For male rats, including excretion data initially decreased certainty in the one-compartment parameter estimates compared to an analysis using plasma data only. Allowing a third, unspecified clearance improved agreement and increased certainty when all the data was used, however a significant amount of eliminated PFOA was estimated to be missing from the excretion data. Adding an additional PK compartment reduced the unaccounted-for elimination to amounts comparable to the cage wash. For both sexes, an MCMC estimate of the appropriateness of a model for a given data type, the Deviance Information Criterion, indicated that this two-compartment model was better suited to describing PFOA PK. The median estimate was 142.1 +/- 37.6 ml/kg for the volume of the primary compartment and 1.24 +/- 1.1 ml/kg/h for the clearances of male rats and 166.4 +/- 46.8 ml/kg and 30.3 +/- 13.2 ml/kg/h, respectively for female rats. The estimates for the second compartment differed greatly with gender-volume 311.8 +/- 453.9 ml/kg with clearance 3.2 +/- 6.2 for males and 1400 +/- 2507.5 ml/kg and 4.3 +/- 2.2 ml/kg/h for females. The median estimated clearance was 12 +/- 6% to feces and 85 +/- 7% to urine for male rats and 8 +/- 6% and 77 +/- 9% for female rats. We conclude that the available data may support more models for PFOA PK beyond two-compartments and that the methods employed here will be generally useful for more complicated, including PBPK, models.
Random Partition Distribution Indexed by Pairwise Information
Dahl, David B.; Day, Ryan; Tsai, Jerry W.
2017-01-01
We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318
Multilocus lod scores in large pedigrees: combination of exact and approximate calculations.
Tong, Liping; Thompson, Elizabeth
2008-01-01
To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some 'key' individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. (c) 2007 S. Karger AG, Basel
Multilocus Lod Scores in Large Pedigrees: Combination of Exact and Approximate Calculations
Tong, Liping; Thompson, Elizabeth
2007-01-01
To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some ‘key’ individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data. PMID:17934317
Borths, Matthew R; Holroyd, Patricia A; Seiffert, Erik R
2016-01-01
Hyaenodonta is a diverse, extinct group of carnivorous mammals that included weasel- to rhinoceros-sized species. The oldest-known hyaenodont fossils are from the middle Paleocene of North Africa and the antiquity of the group in Afro-Arabia led to the hypothesis that it originated there and dispersed to Asia, Europe, and North America. Here we describe two new hyaenodont species based on the oldest hyaenodont cranial specimens known from Afro-Arabia. The material was collected from the latest Eocene Locality 41 (L-41, ∼34 Ma) in the Fayum Depression, Egypt. Akhnatenavus nefertiticyon sp. nov. has specialized, hypercarnivorous molars and an elongate cranial vault. In A. nefertiticyon the tallest, piercing cusp on M 1 -M 2 is the paracone. Brychotherium ephalmos gen. et sp. nov. has more generalized molars that retain the metacone and complex talonids. In B. ephalmos the tallest, piercing cusp on M 1 -M 2 is the metacone. We incorporate this new material into a series of phylogenetic analyses using a character-taxon matrix that includes novel dental, cranial, and postcranial characters, and samples extensively from the global record of the group. The phylogenetic analysis includes the first application of Bayesian methods to hyaenodont relationships. B. ephalmos is consistently placed within Teratodontinae, an Afro-Arabian clade with several generalist and hypercarnivorous forms, and Akhnatenavus is consistently recovered in Hyainailourinae as part of an Afro-Arabian radiation. The phylogenetic results suggest that hypercarnivory evolved independently three times within Hyaenodonta: in Teratodontinae, in Hyainailourinae, and in Hyaenodontinae. Teratodontines are consistently placed in a close relationship with Hyainailouridae (Hyainailourinae + Apterodontinae) to the exclusion of "proviverrines," hyaenodontines, and several North American clades, and we propose that the superfamily Hyainailouroidea be used to describe this relationship. Using the topologies recovered from each phylogenetic method, we reconstructed the biogeographic history of Hyaenodonta using parsimony optimization (PO), likelihood optimization (LO), and Bayesian Binary Markov chain Monte Carlo (MCMC) to examine support for the Afro-Arabian origin of Hyaenodonta. Across all analyses, we found that Hyaenodonta most likely originated in Europe, rather than Afro-Arabia. The clade is estimated by tip-dating analysis to have undergone a rapid radiation in the Late Cretaceous and Paleocene; a radiation currently not documented by fossil evidence. During the Paleocene, lineages are reconstructed as dispersing to Asia, Afro-Arabia, and North America. The place of origin of Hyainailouroidea is likely Afro-Arabia according to the Bayesian topologies but it is ambiguous using parsimony. All topologies support the constituent clades-Hyainailourinae, Apterodontinae, and Teratodontinae-as Afro-Arabian and tip-dating estimates that each clade is established in Afro-Arabia by the middle Eocene.
Seiffert, Erik R.
2016-01-01
Hyaenodonta is a diverse, extinct group of carnivorous mammals that included weasel- to rhinoceros-sized species. The oldest-known hyaenodont fossils are from the middle Paleocene of North Africa and the antiquity of the group in Afro-Arabia led to the hypothesis that it originated there and dispersed to Asia, Europe, and North America. Here we describe two new hyaenodont species based on the oldest hyaenodont cranial specimens known from Afro-Arabia. The material was collected from the latest Eocene Locality 41 (L-41, ∼34 Ma) in the Fayum Depression, Egypt. Akhnatenavus nefertiticyon sp. nov. has specialized, hypercarnivorous molars and an elongate cranial vault. In A. nefertiticyon the tallest, piercing cusp on M1–M2 is the paracone. Brychotherium ephalmos gen. et sp. nov. has more generalized molars that retain the metacone and complex talonids. In B. ephalmos the tallest, piercing cusp on M1–M2 is the metacone. We incorporate this new material into a series of phylogenetic analyses using a character-taxon matrix that includes novel dental, cranial, and postcranial characters, and samples extensively from the global record of the group. The phylogenetic analysis includes the first application of Bayesian methods to hyaenodont relationships. B. ephalmos is consistently placed within Teratodontinae, an Afro-Arabian clade with several generalist and hypercarnivorous forms, and Akhnatenavus is consistently recovered in Hyainailourinae as part of an Afro-Arabian radiation. The phylogenetic results suggest that hypercarnivory evolved independently three times within Hyaenodonta: in Teratodontinae, in Hyainailourinae, and in Hyaenodontinae. Teratodontines are consistently placed in a close relationship with Hyainailouridae (Hyainailourinae + Apterodontinae) to the exclusion of “proviverrines,” hyaenodontines, and several North American clades, and we propose that the superfamily Hyainailouroidea be used to describe this relationship. Using the topologies recovered from each phylogenetic method, we reconstructed the biogeographic history of Hyaenodonta using parsimony optimization (PO), likelihood optimization (LO), and Bayesian Binary Markov chain Monte Carlo (MCMC) to examine support for the Afro-Arabian origin of Hyaenodonta. Across all analyses, we found that Hyaenodonta most likely originated in Europe, rather than Afro-Arabia. The clade is estimated by tip-dating analysis to have undergone a rapid radiation in the Late Cretaceous and Paleocene; a radiation currently not documented by fossil evidence. During the Paleocene, lineages are reconstructed as dispersing to Asia, Afro-Arabia, and North America. The place of origin of Hyainailouroidea is likely Afro-Arabia according to the Bayesian topologies but it is ambiguous using parsimony. All topologies support the constituent clades–Hyainailourinae, Apterodontinae, and Teratodontinae–as Afro-Arabian and tip-dating estimates that each clade is established in Afro-Arabia by the middle Eocene. PMID:27867761
2018-01-01
The genus Liolaemus comprises more than 260 species and can be divided in two subgenera: Eulaemus and Liolaemus sensu stricto. In this paper, we present a phylogenetic analysis, divergence times, and ancestral distribution ranges of the Liolaemus alticolor-bibronii group (Liolaemus sensu stricto subgenus). We inferred a total evidence phylogeny combining molecular (Cytb and 12S genes) and morphological characters using Maximum Parsimony and Bayesian Inference. Divergence times were calculated using Bayesian MCMC with an uncorrelated lognormal distributed relaxed clock, calibrated with a fossil record. Ancestral ranges were estimated using the Dispersal-Extinction-Cladogenesis (DEC-Lagrange). Effects of some a priori parameters of DEC were also tested. Distribution ranged from central Perú to southern Argentina, including areas at sea level up to the high Andes. The L. alticolor-bibronii group was recovered as monophyletic, formed by two clades: L. walkeri and L. gracilis, the latter can be split in two groups. Additionally, many species candidates were recognized. We estimate that the L. alticolor-bibronii group diversified 14.5 Myr ago, during the Middle Miocene. Our results suggest that the ancestor of the Liolaemus alticolor-bibronii group was distributed in a wide area including Patagonia and Puna highlands. The speciation pattern follows the South-North Diversification Hypothesis, following the Andean uplift. PMID:29479502
Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach.
Mohammadi, Tayeb; Kheiri, Soleiman; Sedehi, Morteza
2016-01-01
Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables "number of blood donation" and "number of blood deferral": as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models.
Analysis of Blood Transfusion Data Using Bivariate Zero-Inflated Poisson Model: A Bayesian Approach
Mohammadi, Tayeb; Sedehi, Morteza
2016-01-01
Recognizing the factors affecting the number of blood donation and blood deferral has a major impact on blood transfusion. There is a positive correlation between the variables “number of blood donation” and “number of blood deferral”: as the number of return for donation increases, so does the number of blood deferral. On the other hand, due to the fact that many donors never return to donate, there is an extra zero frequency for both of the above-mentioned variables. In this study, in order to apply the correlation and to explain the frequency of the excessive zero, the bivariate zero-inflated Poisson regression model was used for joint modeling of the number of blood donation and number of blood deferral. The data was analyzed using the Bayesian approach applying noninformative priors at the presence and absence of covariates. Estimating the parameters of the model, that is, correlation, zero-inflation parameter, and regression coefficients, was done through MCMC simulation. Eventually double-Poisson model, bivariate Poisson model, and bivariate zero-inflated Poisson model were fitted on the data and were compared using the deviance information criteria (DIC). The results showed that the bivariate zero-inflated Poisson regression model fitted the data better than the other models. PMID:27703493
Bayesian Geostatistical Modeling of Malaria Indicator Survey Data in Angola
Gosoniu, Laura; Veta, Andre Mia; Vounatsou, Penelope
2010-01-01
The 2006–2007 Angola Malaria Indicator Survey (AMIS) is the first nationally representative household survey in the country assessing coverage of the key malaria control interventions and measuring malaria-related burden among children under 5 years of age. In this paper, the Angolan MIS data were analyzed to produce the first smooth map of parasitaemia prevalence based on contemporary nationwide empirical data in the country. Bayesian geostatistical models were fitted to assess the effect of interventions after adjusting for environmental, climatic and socio-economic factors. Non-linear relationships between parasitaemia risk and environmental predictors were modeled by categorizing the covariates and by employing two non-parametric approaches, the B-splines and the P-splines. The results of the model validation showed that the categorical model was able to better capture the relationship between parasitaemia prevalence and the environmental factors. Model fit and prediction were handled within a Bayesian framework using Markov chain Monte Carlo (MCMC) simulations. Combining estimates of parasitaemia prevalence with the number of children under we obtained estimates of the number of infected children in the country. The population-adjusted prevalence ranges from in Namibe province to in Malanje province. The odds of parasitaemia in children living in a household with at least ITNs per person was by 41% lower (CI: 14%, 60%) than in those with fewer ITNs. The estimates of the number of parasitaemic children produced in this paper are important for planning and implementing malaria control interventions and for monitoring the impact of prevention and control activities. PMID:20351775
An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals
ERIC Educational Resources Information Center
Verhelst, Norman D.
2008-01-01
Uniform sampling of binary matrices with fixed margins is known as a difficult problem. Two classes of algorithms to sample from a distribution not too different from the uniform are studied in the literature: importance sampling and Markov chain Monte Carlo (MCMC). Existing MCMC algorithms converge slowly, require a long burn-in period and yield…
Multi-chain Markov chain Monte Carlo methods for computationally expensive models
NASA Astrophysics Data System (ADS)
Huang, M.; Ray, J.; Ren, H.; Hou, Z.; Bao, J.
2017-12-01
Markov chain Monte Carlo (MCMC) methods are used to infer model parameters from observational data. The parameters are inferred as probability densities, thus capturing estimation error due to sparsity of the data, and the shortcomings of the model. Multiple communicating chains executing the MCMC method have the potential to explore the parameter space better, and conceivably accelerate the convergence to the final distribution. We present results from tests conducted with the multi-chain method to show how the acceleration occurs i.e., for loose convergence tolerances, the multiple chains do not make much of a difference. The ensemble of chains also seems to have the ability to accelerate the convergence of a few chains that might start from suboptimal starting points. Finally, we show the performance of the chains in the estimation of O(10) parameters using computationally expensive forward models such as the Community Land Model, where the sampling burden is distributed over multiple chains.
A High-Resolution Aerosol Retrieval Method for Urban Areas Using MISR Data
NASA Astrophysics Data System (ADS)
Moon, T.; Wang, Y.; Liu, Y.; Yu, B.
2012-12-01
Satellite-retrieved Aerosol Optical Depth (AOD) can provide a cost-effective way to monitor particulate air pollution without using expensive ground measurement sensors. One of the current state-of-the-art AOD retrieval method is NASA's Multi-angle Imaging SpectroRadiometer (MISR) operational algorithm, which has the spatial resolution of 17.6 km x 17.6 km. While the MISR baseline scheme already leads to exciting research opportunities to study particle compositions at regional scale, its spatial resolution is too coarse for analyzing urban areas where the AOD level has stronger spatial variations. We develop a novel high-resolution AOD retrieval algorithm that still uses MISR's radiance observations but has the resolution of 4.4km x 4.4km. We achieve the high resolution AOD retrieval by implementing a hierarchical Bayesian model and Monte-Carlo Markov Chain (MCMC) inference method. Our algorithm not only improves the spatial resolution, but also extends the coverage of AOD retrieval and provides with additional composition information of aerosol components that contribute to the AOD. We validate our method using the recent NASA's DISCOVER-AQ mission data, which contains the ground measured AOD values for Washington DC and Baltimore area. The validation result shows that, compared to the operational MISR retrievals, our scheme has 41.1% more AOD retrieval coverage for the DISCOVER-AQ data points and 24.2% improvement in mean-squared error (MSE) with respect to the AERONET ground measurements.
Jamieson, Andrew R; Giger, Maryellen L; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha
2010-01-01
In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Maximal compression of the redshift-space galaxy power spectrum and bispectrum
NASA Astrophysics Data System (ADS)
Gualdi, Davide; Manera, Marc; Joachimi, Benjamin; Lahav, Ofer
2018-05-01
We explore two methods of compressing the redshift-space galaxy power spectrum and bispectrum with respect to a chosen set of cosmological parameters. Both methods involve reducing the dimension of the original data vector (e.g. 1000 elements) to the number of cosmological parameters considered (e.g. seven ) using the Karhunen-Loève algorithm. In the first case, we run MCMC sampling on the compressed data vector in order to recover the 1D and 2D posterior distributions. The second option, approximately 2000 times faster, works by orthogonalizing the parameter space through diagonalization of the Fisher information matrix before the compression, obtaining the posterior distributions without the need of MCMC sampling. Using these methods for future spectroscopic redshift surveys like DESI, Euclid, and PFS would drastically reduce the number of simulations needed to compute accurate covariance matrices with minimal loss of constraining power. We consider a redshift bin of a DESI-like experiment. Using the power spectrum combined with the bispectrum as a data vector, both compression methods on average recover the 68 {per cent} credible regions to within 0.7 {per cent} and 2 {per cent} of those resulting from standard MCMC sampling, respectively. These confidence intervals are also smaller than the ones obtained using only the power spectrum by 81 per cent, 80 per cent, and 82 per cent respectively, for the bias parameter b1, the growth rate f, and the scalar amplitude parameter As.
A probabilistic model framework for evaluating year-to-year variation in crop productivity
NASA Astrophysics Data System (ADS)
Yokozawa, M.; Iizumi, T.; Tao, F.
2008-12-01
Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.
SaaS enabled admission control for MCMC simulation in cloud computing infrastructures
NASA Astrophysics Data System (ADS)
Vázquez-Poletti, J. L.; Moreno-Vozmediano, R.; Han, R.; Wang, W.; Llorente, I. M.
2017-02-01
Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.
Duffull, Stephen B; Graham, Gordon; Mengersen, Kerrie; Eccleston, John
2012-01-01
Information theoretic methods are often used to design studies that aim to learn about pharmacokinetic and linked pharmacokinetic-pharmacodynamic systems. These design techniques, such as D-optimality, provide the optimum experimental conditions. The performance of the optimum design will depend on the ability of the investigator to comply with the proposed study conditions. However, in clinical settings it is not possible to comply exactly with the optimum design and hence some degree of unplanned suboptimality occurs due to error in the execution of the study. In addition, due to the nonlinear relationship of the parameters of these models to the data, the designs are also locally dependent on an arbitrary choice of a nominal set of parameter values. A design that is robust to both study conditions and uncertainty in the nominal set of parameter values is likely to be of use clinically. We propose an adaptive design strategy to account for both execution error and uncertainty in the parameter values. In this study we investigate designs for a one-compartment first-order pharmacokinetic model. We do this in a Bayesian framework using Markov-chain Monte Carlo (MCMC) methods. We consider log-normal prior distributions on the parameters and investigate several prior distributions on the sampling times. An adaptive design was used to find the sampling window for the current sampling time conditional on the actual times of all previous samples.
Self-consistent Bulge/Disk/Halo Galaxy Dynamical Modeling Using Integral Field Kinematics
NASA Astrophysics Data System (ADS)
Taranu, D. S.; Obreschkow, D.; Dubinski, J. J.; Fogarty, L. M. R.; van de Sande, J.; Catinella, B.; Cortese, L.; Moffett, A.; Robotham, A. S. G.; Allen, J. T.; Bland-Hawthorn, J.; Bryant, J. J.; Colless, M.; Croom, S. M.; D'Eugenio, F.; Davies, R. L.; Drinkwater, M. J.; Driver, S. P.; Goodwin, M.; Konstantopoulos, I. S.; Lawrence, J. S.; López-Sánchez, Á. R.; Lorente, N. P. F.; Medling, A. M.; Mould, J. R.; Owers, M. S.; Power, C.; Richards, S. N.; Tonini, C.
2017-11-01
We introduce a method for modeling disk galaxies designed to take full advantage of data from integral field spectroscopy (IFS). The method fits equilibrium models to simultaneously reproduce the surface brightness, rotation, and velocity dispersion profiles of a galaxy. The models are fully self-consistent 6D distribution functions for a galaxy with a Sérsic profile stellar bulge, exponential disk, and parametric dark-matter halo, generated by an updated version of GalactICS. By creating realistic flux-weighted maps of the kinematic moments (flux, mean velocity, and dispersion), we simultaneously fit photometric and spectroscopic data using both maximum-likelihood and Bayesian (MCMC) techniques. We apply the method to a GAMA spiral galaxy (G79635) with kinematics from the SAMI Galaxy Survey and deep g- and r-band photometry from the VST-KiDS survey, comparing parameter constraints with those from traditional 2D bulge-disk decomposition. Our method returns broadly consistent results for shared parameters while constraining the mass-to-light ratios of stellar components and reproducing the H I-inferred circular velocity well beyond the limits of the SAMI data. Although the method is tailored for fitting integral field kinematic data, it can use other dynamical constraints like central fiber dispersions and H I circular velocities, and is well-suited for modeling galaxies with a combination of deep imaging and H I and/or optical spectra (resolved or otherwise). Our implementation (MagRite) is computationally efficient and can generate well-resolved models and kinematic maps in under a minute on modern processors.
Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.
2011-01-01
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348
Mohammed, Seid; Asfaw, Zeytu G
2018-01-01
The term malnutrition generally refers to both under-nutrition and over-nutrition, but this study uses the term to refer solely to a deficiency of nutrition. In Ethiopia, child malnutrition is one of the most serious public health problem and the highest in the world. The purpose of the present study was to identify the high risk factors of malnutrition and test different statistical models for childhood malnutrition and, thereafter weighing the preferable model through model comparison criteria. Bayesian Gaussian regression model was used to analyze the effect of selected socioeconomic, demographic, health and environmental covariates on malnutrition under five years old child's. Inference was made using Bayesian approach based on Markov Chain Monte Carlo (MCMC) simulation techniques in BayesX. The study found that the variables such as sex of a child, preceding birth interval, age of the child, father's education level, source of water, mother's body mass index, head of household sex, mother's age at birth, wealth index, birth order, diarrhea, child's size at birth and duration of breast feeding showed significant effects on children's malnutrition in Ethiopia. The age of child, mother's age at birth and mother's body mass index could also be important factors with a non linear effect for the child's malnutrition in Ethiopia. Thus, the present study emphasizes a special care on variables such as sex of child, preceding birth interval, father's education level, source of water, sex of head of household, wealth index, birth order, diarrhea, child's size at birth, duration of breast feeding, age of child, mother's age at birth and mother's body mass index to combat childhood malnutrition in developing countries.
Iterative updating of model error for Bayesian inversion
NASA Astrophysics Data System (ADS)
Calvetti, Daniela; Dunlop, Matthew; Somersalo, Erkki; Stuart, Andrew
2018-02-01
In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations when the data is finite dimensional. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit under a simplifying assumption. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.
A Web-Based System for Bayesian Benchmark Dose Estimation.
Shao, Kan; Shapiro, Andrew J
2018-01-11
Benchmark dose (BMD) modeling is an important step in human health risk assessment and is used as the default approach to identify the point of departure for risk assessment. A probabilistic framework for dose-response assessment has been proposed and advocated by various institutions and organizations; therefore, a reliable tool is needed to provide distributional estimates for BMD and other important quantities in dose-response assessment. We developed an online system for Bayesian BMD (BBMD) estimation and compared results from this software with U.S. Environmental Protection Agency's (EPA's) Benchmark Dose Software (BMDS). The system is built on a Bayesian framework featuring the application of Markov chain Monte Carlo (MCMC) sampling for model parameter estimation and BMD calculation, which makes the BBMD system fundamentally different from the currently prevailing BMD software packages. In addition to estimating the traditional BMDs for dichotomous and continuous data, the developed system is also capable of computing model-averaged BMD estimates. A total of 518 dichotomous and 108 continuous data sets extracted from the U.S. EPA's Integrated Risk Information System (IRIS) database (and similar databases) were used as testing data to compare the estimates from the BBMD and BMDS programs. The results suggest that the BBMD system may outperform the BMDS program in a number of aspects, including fewer failed BMD and BMDL calculations and estimates. The BBMD system is a useful alternative tool for estimating BMD with additional functionalities for BMD analysis based on most recent research. Most importantly, the BBMD has the potential to incorporate prior information to make dose-response modeling more reliable and can provide distributional estimates for important quantities in dose-response assessment, which greatly facilitates the current trend for probabilistic risk assessment. https://doi.org/10.1289/EHP1289.
NASA Astrophysics Data System (ADS)
Awatey, M. T.; Irving, J.; Oware, E. K.
2016-12-01
Markov chain Monte Carlo (McMC) inversion frameworks are becoming increasingly popular in geophysics due to their ability to recover multiple equally plausible geologic features that honor the limited noisy measurements. Standard McMC methods, however, become computationally intractable with increasing dimensionality of the problem, for example, when working with spatially distributed geophysical parameter fields. We present a McMC approach based on a sparse proper orthogonal decomposition (POD) model parameterization that implicitly incorporates the physics of the underlying process. First, we generate training images (TIs) via Monte Carlo simulations of the target process constrained to a conceptual model. We then apply POD to construct basis vectors from the TIs. A small number of basis vectors can represent most of the variability in the TIs, leading to dimensionality reduction. A projection of the starting model into the reduced basis space generates the starting POD coefficients. At each iteration, only coefficients within a specified sampling window are resimulated assuming a Gaussian prior. The sampling window grows at a specified rate as the number of iteration progresses starting from the coefficients corresponding to the highest ranked basis to those of the least informative basis. We found this gradual increment in the sampling window to be more stable compared to resampling all the coefficients right from the first iteration. We demonstrate the performance of the algorithm with both synthetic and lab-scale electrical resistivity imaging of saline tracer experiments, employing the same set of basis vectors for all inversions. We consider two scenarios of unimodal and bimodal plumes. The unimodal plume is consistent with the hypothesis underlying the generation of the TIs whereas bimodality in plume morphology was not theorized. We show that uncertainty quantification using McMC can proceed in the reduced dimensionality space while accounting for the physics of the underlying process.
Mass change distribution inverted from space-borne gravimetric data using a Monte Carlo method
NASA Astrophysics Data System (ADS)
Zhou, X.; Sun, X.; Wu, Y.; Sun, W.
2017-12-01
Mass estimate plays a key role in using temporally satellite gravimetric data to quantify the terrestrial water storage change. GRACE (Gravity Recovery and Climate Experiment) only observes the low degree gravity field changes, which can be used to estimate the total surface density or equivalent water height (EWH) variation, with a limited spatial resolution of 300 km. There are several methods to estimate the mass variation in an arbitrary region, such as averaging kernel, forward modelling and mass concentration (mascon). Mascon method can isolate the local mass from the gravity change at a large scale through solving the observation equation (objective function) which represents the relationship between unknown masses and the measurements. To avoid the unreasonable local mass inverted from smoothed gravity change map, regularization has to be used in the inversion. We herein give a Markov chain Monte Carlo (MCMC) method to objectively determine the regularization parameter for the non-negative mass inversion problem. We first apply this approach to the mass inversion from synthetic data. Result show MCMC can effectively reproduce the local mass variation taking GRACE measurement error into consideration. We then use MCMC to estimate the ground water change rate of North China Plain from GRACE gravity change rate from 2003 to 2014 under a supposition of the continuous ground water loss in this region. Inversion result show that the ground water loss rate in North China Plain is 7.6±0.2Gt/yr during past 12 years which is coincident with that from previous researches.
MCMC-ODPR: primer design optimization using Markov Chain Monte Carlo sampling.
Kitchen, James L; Moore, Jonathan D; Palmer, Sarah A; Allaby, Robin G
2012-11-05
Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base.
MCMC-ODPR: Primer design optimization using Markov Chain Monte Carlo sampling
2012-01-01
Background Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. Results After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. Conclusions MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base. PMID:23126469
Sethuraman, Arun; Hey, Jody
2015-01-01
IMa2 and related programs are used to study the divergence of closely related species and of populations within species. These methods are based on the sampling of genealogies using MCMC, and they can proceed quite slowly for larger data sets. We describe a parallel implementation, called IMa2p, that provides a nearly linear increase in genealogy sampling rate with the number of processors in use. IMa2p is written in OpenMPI and C++, and scales well for demographic analyses of a large number of loci and populations, which are difficult to study using the serial version of the program. PMID:26059786
The Chandra Source Catalog 2.0: Estimating Source Fluxes
NASA Astrophysics Data System (ADS)
Primini, Francis Anthony; Allen, Christopher E.; Miller, Joseph; Anderson, Craig S.; Budynkiewicz, Jamie A.; Burke, Douglas; Chen, Judy C.; Civano, Francesca Maria; D'Abrusco, Raffaele; Doe, Stephen M.; Evans, Ian N.; Evans, Janet D.; Fabbiano, Giuseppina; Gibbs, Danny G., II; Glotfelty, Kenny J.; Graessle, Dale E.; Grier, John D.; Hain, Roger; Hall, Diane M.; Harbo, Peter N.; Houck, John C.; Lauer, Jennifer L.; Laurino, Omar; Lee, Nicholas P.; Martínez-Galarza, Juan Rafael; McCollough, Michael L.; McDowell, Jonathan C.; McLaughlin, Warren; Morgan, Douglas L.; Mossman, Amy E.; Nguyen, Dan T.; Nichols, Joy S.; Nowak, Michael A.; Paxson, Charles; Plummer, David A.; Rots, Arnold H.; Siemiginowska, Aneta; Sundheim, Beth A.; Tibbetts, Michael; Van Stone, David W.; Zografou, Panagoula
2018-01-01
The Second Chandra Source Catalog (CSC2.0) will provide information on approximately 316,000 point or compact extended x-ray sources, derived from over 10,000 ACIS and HRC-I imaging observations available in the public archive at the end of 2014. As in the previous catalog release (CSC1.1), fluxes for these sources will be determined separately from source detection, using a Bayesian formalism that accounts for background, spatial resolution effects, and contamination from nearby sources. However, the CSC2.0 procedure differs from that used in CSC1.1 in three important aspects. First, for sources in crowded regions in which photometric apertures overlap, fluxes are determined jointly, using an extension of the CSC1.1 algorithm, as discussed in Primini & Kashyap (2014ApJ...796…24P). Second, an MCMC procedure is used to estimate marginalized posterior probability distributions for source fluxes. Finally, for sources observed in multiple observations, a Bayesian Blocks algorithm (Scargle, et al. 2013ApJ...764..167S) is used to group observations into blocks of constant source flux.In this poster we present details of the CSC2.0 photometry algorithms and illustrate their performance in actual CSC2.0 datasets.This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center.
Phylogenetic Analysis of Dengue Virus in Bangkalan, Madura Island, East Java Province, Indonesia.
Sucipto, Teguh Hari; Kotaki, Tomohiro; Mulyatno, Kris Cahyo; Churrotin, Siti; Labiqah, Amaliah; Soegijanto, Soegeng; Kameoka, Masanori
2018-01-01
Dengue virus (DENV) infection is a major health issue in tropical and subtropical areas. Indonesia is one of the biggest dengue endemic countries in the world. In the present study, the phylogenetic analysis of DENV in Bangkalan, Madura Island, Indonesia, was performed in order to obtain a clearer understanding of its dynamics in this country. A total of 359 blood samples from dengue-suspected patients were collected between 2012 and 2014. Serotyping was conducted using a multiplex Reverse Transcriptase-Polymerase Chain Reaction and a phylogenetic analysis of E gene sequences was performed using the Bayesian Markov chain Monte Carlo (MCMC) method. 17 out of 359 blood samples (4.7%) were positive for the isolation of DENV. Serotyping and the phylogenetic analysis revealed the predominance of DENV-1 genotype I (9/17, 52.9%), followed by DENV-2 Cosmopolitan type (7/17, 41.2%) and DENV-3 genotype I (1/17, 5.9%) . DENV-4 was not isolated. The Madura Island isolates showed high nucleotide similarity to other Indonesian isolates, indicating frequent virus circulation in Indonesia. The results of the present study highlight the importance of continuous viral surveillance in dengue endemic areas in order to obtain a clearer understanding of the dynamics of DENV in Indonesia.
Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I
2012-07-27
A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.
Yurko, Joseph P.; Buongiorno, Jacopo; Youngblood, Robert
2015-05-28
System codes for simulation of safety performance of nuclear plants may contain parameters whose values are not known very accurately. New information from tests or operating experience is incorporated into safety codes by a process known as calibration, which reduces uncertainty in the output of the code and thereby improves its support for decision-making. The work reported here implements several improvements on classic calibration techniques afforded by modern analysis techniques. The key innovation has come from development of code surrogate model (or code emulator) construction and prediction algorithms. Use of a fast emulator makes the calibration processes used here withmore » Markov Chain Monte Carlo (MCMC) sampling feasible. This study uses Gaussian Process (GP) based emulators, which have been used previously to emulate computer codes in the nuclear field. The present work describes the formulation of an emulator that incorporates GPs into a factor analysis-type or pattern recognition-type model. This “function factorization” Gaussian Process (FFGP) model allows overcoming limitations present in standard GP emulators, thereby improving both accuracy and speed of the emulator-based calibration process. Calibration of a friction-factor example using a Method of Manufactured Solution is performed to illustrate key properties of the FFGP based process.« less
Farr, W. M.; Mandel, I.; Stevens, D.
2015-01-01
Selection among alternative theoretical models given an observed dataset is an important challenge in many areas of physics and astronomy. Reversible-jump Markov chain Monte Carlo (RJMCMC) is an extremely powerful technique for performing Bayesian model selection, but it suffers from a fundamental difficulty and it requires jumps between model parameter spaces, but cannot efficiently explore both parameter spaces at once. Thus, a naive jump between parameter spaces is unlikely to be accepted in the Markov chain Monte Carlo (MCMC) algorithm and convergence is correspondingly slow. Here, we demonstrate an interpolation technique that uses samples from single-model MCMCs to propose intermodel jumps from an approximation to the single-model posterior of the target parameter space. The interpolation technique, based on a kD-tree data structure, is adaptive and efficient in modest dimensionality. We show that our technique leads to improved convergence over naive jumps in an RJMCMC, and compare it to other proposals in the literature to improve the convergence of RJMCMCs. We also demonstrate the use of the same interpolation technique as a way to construct efficient ‘global’ proposal distributions for single-model MCMCs without prior knowledge of the structure of the posterior distribution, and discuss improvements that permit the method to be used in higher dimensional spaces efficiently. PMID:26543580
Volatility modeling for IDR exchange rate through APARCH model with student-t distribution
NASA Astrophysics Data System (ADS)
Nugroho, Didit Budi; Susanto, Bambang
2017-08-01
The aim of this study is to empirically investigate the performance of APARCH(1,1) volatility model with the Student-t error distribution on five foreign currency selling rates to Indonesian rupiah (IDR), including the Swiss franc (CHF), the Euro (EUR), the British pound (GBP), Japanese yen (JPY), and the US dollar (USD). Six years daily closing rates over the period of January 2010 to December 2016 for a total number of 1722 observations have analysed. The Bayesian inference using the efficient independence chain Metropolis-Hastings and adaptive random walk Metropolis methods in the Markov chain Monte Carlo (MCMC) scheme has been applied to estimate the parameters of model. According to the DIC criterion, this study has found that the APARCH(1,1) model under Student-t distribution is a better fit than the model under normal distribution for any observed rate return series. The 95% highest posterior density interval suggested the APARCH models to model the IDR/JPY and IDR/USD volatilities. In particular, the IDR/JPY and IDR/USD data, respectively, have significant negative and positive leverage effect in the rate returns. Meanwhile, the optimal power coefficient of volatility has been found to be statistically different from 2 in adopting all rate return series, save the IDR/EUR rate return series.
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
Jamieson, Andrew R.; Giger, Maryellen L.; Drukker, Karen; Li, Hui; Yuan, Yading; Bhooshan, Neha
2010-01-01
Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008)]. Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier’s AUC performance. Results: In the large U.S. data set, sample high performance results include, AUC0.632+=0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+=0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+=0.90 with interval [0.847;0.919], all using the MCMC-BANN. Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space. PMID:20175497
SCoPE: an efficient method of Cosmological Parameter Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Das, Santanu; Souradeep, Tarun, E-mail: santanud@iucaa.ernet.in, E-mail: tarun@iucaa.ernet.in
Markov Chain Monte Carlo (MCMC) sampler is widely used for cosmological parameter estimation from CMB and other data. However, due to the intrinsic serial nature of the MCMC sampler, convergence is often very slow. Here we present a fast and independently written Monte Carlo method for cosmological parameter estimation named as Slick Cosmological Parameter Estimator (SCoPE), that employs delayed rejection to increase the acceptance rate of a chain, and pre-fetching that helps an individual chain to run on parallel CPUs. An inter-chain covariance update is also incorporated to prevent clustering of the chains allowing faster and better mixing of themore » chains. We use an adaptive method for covariance calculation to calculate and update the covariance automatically as the chains progress. Our analysis shows that the acceptance probability of each step in SCoPE is more than 95% and the convergence of the chains are faster. Using SCoPE, we carry out some cosmological parameter estimations with different cosmological models using WMAP-9 and Planck results. One of the current research interests in cosmology is quantifying the nature of dark energy. We analyze the cosmological parameters from two illustrative commonly used parameterisations of dark energy models. We also asses primordial helium fraction in the universe can be constrained by the present CMB data from WMAP-9 and Planck. The results from our MCMC analysis on the one hand helps us to understand the workability of the SCoPE better, on the other hand it provides a completely independent estimation of cosmological parameters from WMAP-9 and Planck data.« less
Improving flash flood frequency analyses by using non-systematic dendrogeomorphic data
NASA Astrophysics Data System (ADS)
Mediero, Luis; María Bodoque, Jose; Garrote, Julio; Ballesteros-Cánovas, Juan Antonio; Aroca-Jimenez, Estefania
2017-04-01
Flash floods have a rapid hydrological response in catchments with short lag times, characterized by ''peaky'' hydrographs. The peak flows are reached within a few hours, thus giving little or no advance warning to prevent and mitigate flood damage. As a result, flash floods may result in a high social risk, as shown for instance by the 1997 Biescas disaster in Spain. The analysis and management of flood risk are clearly conditioned by data availability, especially in mountain areas where usually flash-floods occur. Nevertheless, in mountain basins there is often short data series available that are not accurate in terms of statistical significance. In addition, when flow data is ready for use maximum annual values are generally not as reliable as average flow values, since conventional stream gauge stations may not record the extreme floods, leading to gaps in the time series. Dendrogeomorphology has been shown to be especially useful for improving flood frequency analyses in catchments where short flood series limit the use of conventional hydrological methods. This study presents pros and cons of using a given probability distribution function, such as the Generalized Extreme Value (GEV), and Bayesian Markov Chain Monte Carlo (MCMC) methods to account for non-systematic data provided by dendrogeomorphic techniques, in order to asses flood quantile estimates accuracy. To this end, we have considered a set of locations in Central Spain, where systematic flow available at a gauging site can be extended with non-systematic data obtained from implementation of dendrogeomorphic techniques.
Stochastic approach for radionuclides quantification
NASA Astrophysics Data System (ADS)
Clement, A.; Saurel, N.; Perrin, G.
2018-01-01
Gamma spectrometry is a passive non-destructive assay used to quantify radionuclides present in more or less complex objects. Basic methods using empirical calibration with a standard in order to quantify the activity of nuclear materials by determining the calibration coefficient are useless on non-reproducible, complex and single nuclear objects such as waste packages. Package specifications as composition or geometry change from one package to another and involve a high variability of objects. Current quantification process uses numerical modelling of the measured scene with few available data such as geometry or composition. These data are density, material, screen, geometric shape, matrix composition, matrix and source distribution. Some of them are strongly dependent on package data knowledge and operator backgrounds. The French Commissariat à l'Energie Atomique (CEA) is developing a new methodology to quantify nuclear materials in waste packages and waste drums without operator adjustment and internal package configuration knowledge. This method suggests combining a global stochastic approach which uses, among others, surrogate models available to simulate the gamma attenuation behaviour, a Bayesian approach which considers conditional probability densities of problem inputs, and Markov Chains Monte Carlo algorithms (MCMC) which solve inverse problems, with gamma ray emission radionuclide spectrum, and outside dimensions of interest objects. The methodology is testing to quantify actinide activity in different kind of matrix, composition, and configuration of sources standard in terms of actinide masses, locations and distributions. Activity uncertainties are taken into account by this adjustment methodology.
Wang, Jiabiao; Zhao, Jianshi; Lei, Xiaohui; Wang, Hao
2018-06-13
Pollution risk from the discharge of industrial waste or accidental spills during transportation poses a considerable threat to the security of rivers. The ability to quickly identify the pollution source is extremely important to enable emergency disposal of pollutants. This study proposes a new approach for point source identification of sudden water pollution in rivers, which aims to determine where (source location), when (release time) and how much pollutant (released mass) was introduced into the river. Based on the backward probability method (BPM) and the linear regression model (LR), the proposed LR-BPM converts the ill-posed problem of source identification into an optimization model, which is solved using a Differential Evolution Algorithm (DEA). The decoupled parameters of released mass are not dependent on prior information, which improves the identification efficiency. A hypothetical case study with a different number of pollution sources was conducted to test the proposed approach, and the largest relative errors for identified location, release time, and released mass in all tests were not greater than 10%. Uncertainty in the LR-BPM is mainly due to a problem with model equifinality, but averaging the results of repeated tests greatly reduces errors. Furthermore, increasing the gauging sections further improves identification results. A real-world case study examines the applicability of the LR-BPM in practice, where it is demonstrated to be more accurate and time-saving than two existing approaches, Bayesian-MCMC and basic DEA. Copyright © 2018 Elsevier Ltd. All rights reserved.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Spatio-temporal analysis of Modified Omori law in Bayesian framework
NASA Astrophysics Data System (ADS)
Rezanezhad, V.; Narteau, C.; Shebalin, P.; Zoeller, G.; Holschneider, M.
2017-12-01
This work presents a study of the spatio temporal evolution of the modified Omori parameters in southern California in then time period of 1981-2016. A nearest-neighbor approach is applied for earthquake clustering. This study targets small mainshocks and corresponding big aftershocks ( 2.5 ≤ mmainshocks ≤ 4.5 and 1.8 ≤ maftershocks ≤ 2.8 ). We invert for the spatio temporal behavior of c and p values (especially c) all over the area using a MCMC based maximum likelihood estimator. As parameterizing families we use Voronoi cells with randomly distributed cell centers. Considering that c value represents a physical character like stress change we expect to see a coherent c value pattern over seismologically coacting areas. This correlation of c valus can actually be seen for the San Andreas, San Jacinto and Elsinore faults. Moreover, the depth dependency of c value is studied which shows a linear behavior of log(c) with respect to aftershock's depth within 5 to 15 km depth.
NASA Astrophysics Data System (ADS)
Zhou, Weimin; Anastasio, Mark A.
2018-03-01
It has been advocated that task-based measures of image quality (IQ) should be employed to evaluate and optimize imaging systems. Task-based measures of IQ quantify the performance of an observer on a medically relevant task. The Bayesian Ideal Observer (IO), which employs complete statistical information of the object and noise, achieves the upper limit of the performance for a binary signal classification task. However, computing the IO performance is generally analytically intractable and can be computationally burdensome when Markov-chain Monte Carlo (MCMC) techniques are employed. In this paper, supervised learning with convolutional neural networks (CNNs) is employed to approximate the IO test statistics for a signal-known-exactly and background-known-exactly (SKE/BKE) binary detection task. The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are compared to those produced by the analytically computed IO. The advantages of the proposed supervised learning approach for approximating the IO are demonstrated.
Active Subspace Methods for Data-Intensive Inverse Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Qiqi
2017-04-27
The project has developed theory and computational tools to exploit active subspaces to reduce the dimension in statistical calibration problems. This dimension reduction enables MCMC methods to calibrate otherwise intractable models. The same theoretical and computational tools can also reduce the measurement dimension for calibration problems that use large stores of data.
NASA Astrophysics Data System (ADS)
Engeland, K.; Steinsland, I.; Petersen-Øverleir, A.; Johansen, S.
2012-04-01
The aim of this study is to assess the uncertainties in streamflow simulations when uncertainties in both observed inputs (precipitation and temperature) and streamflow observations used in the calibration of the hydrological model are explicitly accounted for. To achieve this goal we applied the elevation distributed HBV model operating on daily time steps to a small catchment in high elevation in Southern Norway where the seasonal snow cover is important. The uncertainties in precipitation inputs were quantified using conditional simulation. This procedure accounts for the uncertainty related to the density of the precipitation network, but neglects uncertainties related to measurement bias/errors and eventual elevation gradients in precipitation. The uncertainties in temperature inputs were quantified using a Bayesian temperature interpolation procedure where the temperature lapse rate is re-estimated every day. The uncertainty in the lapse rate was accounted for whereas the sampling uncertainty related to network density was neglected. For every day a random sample of precipitation and temperature inputs were drawn to be applied as inputs to the hydrologic model. The uncertainties in observed streamflow were assessed based on the uncertainties in the rating curve model. A Bayesian procedure was applied to estimate the probability for rating curve models with 1 to 3 segments and the uncertainties in their parameters. This method neglects uncertainties related to errors in observed water levels. Note that one rating curve was drawn to make one realisation of a whole time series of streamflow, thus the rating curve errors lead to a systematic bias in the streamflow observations. All these uncertainty sources were linked together in both calibration and evaluation of the hydrologic model using a DREAM based MCMC routine. Effects of having less information (e.g. missing one streamflow measurement for defining the rating curve or missing one precipitation station) was also investigated.
Estimating safety effects of pavement management factors utilizing Bayesian random effect models.
Jiang, Ximiao; Huang, Baoshan; Zaretzki, Russell L; Richards, Stephen; Yan, Xuedong
2013-01-01
Previous studies of pavement management factors that relate to the occurrence of traffic-related crashes are rare. Traditional research has mostly employed summary statistics of bidirectional pavement quality measurements in extended longitudinal road segments over a long time period, which may cause a loss of important information and result in biased parameter estimates. The research presented in this article focuses on crash risk of roadways with overall fair to good pavement quality. Real-time and location-specific data were employed to estimate the effects of pavement management factors on the occurrence of crashes. This research is based on the crash data and corresponding pavement quality data for the Tennessee state route highways from 2004 to 2009. The potential temporal and spatial correlations among observations caused by unobserved factors were considered. Overall 6 models were built accounting for no correlation, temporal correlation only, and both the temporal and spatial correlations. These models included Poisson, negative binomial (NB), one random effect Poisson and negative binomial (OREP, ORENB), and two random effect Poisson and negative binomial (TREP, TRENB) models. The Bayesian method was employed to construct these models. The inference is based on the posterior distribution from the Markov chain Monte Carlo (MCMC) simulation. These models were compared using the deviance information criterion. Analysis of the posterior distribution of parameter coefficients indicates that the pavement management factors indexed by Present Serviceability Index (PSI) and Pavement Distress Index (PDI) had significant impacts on the occurrence of crashes, whereas the variable rutting depth was not significant. Among other factors, lane width, median width, type of terrain, and posted speed limit were significant in affecting crash frequency. The findings of this study indicate that a reduction in pavement roughness would reduce the likelihood of traffic-related crashes. Hence, maintaining a low level of pavement roughness is strongly suggested. In addition, the results suggested that the temporal correlation among observations was significant and that the ORENB model outperformed all other models.
Estimation of k-ε parameters using surrogate models and jet-in-crossflow data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lefantzi, Sophia; Ray, Jaideep; Arunajatesan, Srinivasan
2014-11-01
We demonstrate a Bayesian method that can be used to calibrate computationally expensive 3D RANS (Reynolds Av- eraged Navier Stokes) models with complex response surfaces. Such calibrations, conditioned on experimental data, can yield turbulence model parameters as probability density functions (PDF), concisely capturing the uncertainty in the parameter estimates. Methods such as Markov chain Monte Carlo (MCMC) estimate the PDF by sampling, with each sample requiring a run of the RANS model. Consequently a quick-running surrogate is used instead to the RANS simulator. The surrogate can be very difficult to design if the model's response i.e., the dependence of themore » calibration variable (the observable) on the parameter being estimated is complex. We show how the training data used to construct the surrogate can be employed to isolate a promising and physically realistic part of the parameter space, within which the response is well-behaved and easily modeled. We design a classifier, based on treed linear models, to model the "well-behaved region". This classifier serves as a prior in a Bayesian calibration study aimed at estimating 3 k - ε parameters ( C μ, C ε2 , C ε1 ) from experimental data of a transonic jet-in-crossflow interaction. The robustness of the calibration is investigated by checking its predictions of variables not included in the cal- ibration data. We also check the limit of applicability of the calibration by testing at off-calibration flow regimes. We find that calibration yield turbulence model parameters which predict the flowfield far better than when the nomi- nal values of the parameters are used. Substantial improvements are still obtained when we use the calibrated RANS model to predict jet-in-crossflow at Mach numbers and jet strengths quite different from those used to generate the ex- perimental (calibration) data. Thus the primary reason for poor predictive skill of RANS, when using nominal values of the turbulence model parameters, was parametric uncertainty, which was rectified by calibration. Post-calibration, the dominant contribution to model inaccuraries are due to the structural errors in RANS.« less
NASA Astrophysics Data System (ADS)
Musenge, Eustasius; Chirwa, Tobias Freeman; Kahn, Kathleen; Vounatsou, Penelope
2013-06-01
Longitudinal mortality data with few deaths usually have problems of zero-inflation. This paper presents and applies two Bayesian models which cater for zero-inflation, spatial and temporal random effects. To reduce the computational burden experienced when a large number of geo-locations are treated as a Gaussian field (GF) we transformed the field to a Gaussian Markov Random Fields (GMRF) by triangulation. We then modelled the spatial random effects using the Stochastic Partial Differential Equations (SPDEs). Inference was done using a computationally efficient alternative to Markov chain Monte Carlo (MCMC) called Integrated Nested Laplace Approximation (INLA) suited for GMRF. The models were applied to data from 71,057 children aged 0 to under 10 years from rural north-east South Africa living in 15,703 households over the years 1992-2010. We found protective effects on HIV/TB mortality due to greater birth weight, older age and more antenatal clinic visits during pregnancy (adjusted RR (95% CI)): 0.73(0.53;0.99), 0.18(0.14;0.22) and 0.96(0.94;0.97) respectively. Therefore childhood HIV/TB mortality could be reduced if mothers are better catered for during pregnancy as this can reduce mother-to-child transmissions and contribute to improved birth weights. The INLA and SPDE approaches are computationally good alternatives in modelling large multilevel spatiotemporal GMRF data structures.
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks
2017-01-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco
2017-05-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
Modelling the effect of bednet coverage on malaria transmission in South Sudan.
Mukhtar, Abdulaziz Y A; Munyakazi, Justin B; Ouifki, Rachid; Clark, Allan E
2018-01-01
A campaign for malaria control, using Long Lasting Insecticide Nets (LLINs) was launched in South Sudan in 2009. The success of such a campaign often depends upon adequate available resources and reliable surveillance data which help officials understand existing infections. An optimal allocation of resources for malaria control at a sub-national scale is therefore paramount to the success of efforts to reduce malaria prevalence. In this paper, we extend an existing SIR mathematical model to capture the effect of LLINs on malaria transmission. Available data on malaria is utilized to determine realistic parameter values of this model using a Bayesian approach via Markov Chain Monte Carlo (MCMC) methods. Then, we explore the parasite prevalence on a continued rollout of LLINs in three different settings in order to create a sub-national projection of malaria. Further, we calculate the model's basic reproductive number and study its sensitivity to LLINs' coverage and its efficacy. From the numerical simulation results, we notice a basic reproduction number, [Formula: see text], confirming a substantial increase of incidence cases if no form of intervention takes place in the community. This work indicates that an effective use of LLINs may reduce [Formula: see text] and hence malaria transmission. We hope that this study will provide a basis for recommending a scaling-up of the entry point of LLINs' distribution that targets households in areas at risk of malaria.
Non-stationary hydrologic frequency analysis using B-spline quantile regression
NASA Astrophysics Data System (ADS)
Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.
2017-11-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.
Hierarchical animal movement models for population-level inference
Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.
2016-01-01
New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.
Smoothing spline ANOVA frailty model for recurrent event data.
Du, Pang; Jiang, Yihua; Wang, Yuedong
2011-12-01
Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.
Bayesian spatial analysis of childhood diseases in Zimbabwe.
Tsiko, Rodney Godfrey
2015-09-02
Many sub-Saharan countries are confronted with persistently high levels of childhood morbidity and mortality because of the impact of a range of demographic, biological and social factors or situational events that directly precipitate ill health. In particular, under-five morbidity and mortality have increased in recent decades due to childhood diarrhoea, cough and fever. Understanding the geographic distribution of such diseases and their relationships to potential risk factors can be invaluable for cost effective intervention. Bayesian semi-parametric regression models were used to quantify the spatial risk of childhood diarrhoea, fever and cough, as well as associations between childhood diseases and a range of factors, after accounting for spatial correlation between neighbouring areas. Such semi-parametric regression models allow joint analysis of non-linear effects of continuous covariates, spatially structured variation, unstructured heterogeneity, and other fixed effects on childhood diseases. Modelling and inference made use of the fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulation techniques. The analysis was based on data derived from the 1999, 2005/6 and 2010/11 Zimbabwe Demographic and Health Surveys (ZDHS). The results suggest that until recently, sex of child had little or no significant association with childhood diseases. However, a higher proportion of male than female children within a given province had a significant association with childhood cough, fever and diarrhoea. Compared to their counterparts in rural areas, children raised in an urban setting had less exposure to cough, fever and diarrhoea across all the survey years with the exception of diarrhoea in 2010. In addition, the link between sanitation, parental education, antenatal care, vaccination and childhood diseases was found to be both intuitive and counterintuitive. Results also showed marked geographical differences in the prevalence of childhood diarrhoea, fever and cough. Across all the survey years Manicaland province reported the highest cases of childhood diseases. There is also clear evidence of significant high prevalence of childhood diseases in Mashonaland than in Matabeleland provinces.
Integration of Geophysical Data into Structural Geological Modelling through Bayesian Networks
NASA Astrophysics Data System (ADS)
de la Varga, Miguel; Wellmann, Florian; Murdie, Ruth
2016-04-01
Structural geological models are widely used to represent the spatial distribution of relevant geological features. Several techniques exist to construct these models on the basis of different assumptions and different types of geological observations (e.g. Jessell et al., 2014). However, two problems are prevalent when constructing models: (i) observations and assumptions, and therefore also the constructed model, are subject to uncertainties, and (ii) additional information, such as geophysical data, is often available, but cannot be considered directly in the geological modelling step. In our work, we propose the integration of all available data into a Bayesian network including the generation of the implicit geological method by means of interpolation functions (Mallet, 1992; Lajaunie et al., 1997; Mallet, 2004; Carr et al., 2001; Hillier et al., 2014). As a result, we are able to increase the certainty of the resultant models as well as potentially learn features of our regional geology through data mining and information theory techniques. MCMC methods are used in order to optimize computational time and assure the validity of the results. Here, we apply the aforementioned concepts in a 3-D model of the Sandstone Greenstone Belt in the Archean Yilgarn Craton in Western Australia. The example given, defines the uncertainty in the thickness of greenstone as limited by Bouguer anomaly and the internal structure of the greenstone as limited by the magnetic signature of a banded iron formation. The incorporation of the additional data and specially the gravity provides an important reduction of the possible outcomes and therefore the overall uncertainty. References Carr, C. J., K. R. Beatson, B. J. Cherrie, J. T. Mitchell, R. W. Fright, C. B. McCallum, and R. T. Evans, 2001, Reconstruction and representation of 3D objects with radial basis functions: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 67-76. Jessell, M., Aillères, L., de Kemp, E., Lindsay, M., Wellmann, F., Hillier, M., ... & Martin, R. (2014). Next Generation Three-Dimensional Geologic Modeling and Inversion. Lajaunie, C., G. Courrioux, and L. Manuel, 1997, Foliation fields and 3D cartography in geology: Principles of a method based on potential interpolation: Mathematical Geology, 29, 571-584. Mallet, J.-L., 1992, Discrete smooth interpolation in geometric modelling: Computer-Aided Design, 24, 178-191 Mallet, L. J., 2004, Space-time mathematical framework for sedimentary geology: Mathematical Geology, 36, 1-32.
Kim, Kyungmok; Lee, Jaewook
2016-01-01
This paper describes a sliding friction model for an electro-deposited coating. Reciprocating sliding tests using ball-on-flat plate test apparatus are performed to determine an evolution of the kinetic friction coefficient. The evolution of the friction coefficient is classified into the initial running-in period, steady-state sliding, and transition to higher friction. The friction coefficient during the initial running-in period and steady-state sliding is expressed as a simple linear function. The friction coefficient in the transition to higher friction is described with a mathematical model derived from Kachanov-type damage law. The model parameters are then estimated using the Markov Chain Monte Carlo (MCMC) approach. It is identified that estimated friction coefficients obtained by MCMC approach are in good agreement with measured ones. PMID:28773359
On the Adequacy of Bayesian Evaluations of Categorization Models: Reply to Vanpaemel and Lee (2012)
ERIC Educational Resources Information Center
Wills, Andy J.; Pothos, Emmanuel M.
2012-01-01
Vanpaemel and Lee (2012) argued, and we agree, that the comparison of formal models can be facilitated by Bayesian methods. However, Bayesian methods neither precede nor supplant our proposals (Wills & Pothos, 2012), as Bayesian methods can be applied both to our proposals and to their polar opposites. Furthermore, the use of Bayesian methods to…
Birth-death prior on phylogeny and speed dating
2008-01-01
Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893
Birth-death prior on phylogeny and speed dating.
Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens
2008-03-04
In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
NASA Astrophysics Data System (ADS)
Rana, Sachin; Ertekin, Turgay; King, Gregory R.
2018-05-01
Reservoir history matching is frequently viewed as an optimization problem which involves minimizing misfit between simulated and observed data. Many gradient and evolutionary strategy based optimization algorithms have been proposed to solve this problem which typically require a large number of numerical simulations to find feasible solutions. Therefore, a new methodology referred to as GP-VARS is proposed in this study which uses forward and inverse Gaussian processes (GP) based proxy models combined with a novel application of variogram analysis of response surface (VARS) based sensitivity analysis to efficiently solve high dimensional history matching problems. Empirical Bayes approach is proposed to optimally train GP proxy models for any given data. The history matching solutions are found via Bayesian optimization (BO) on forward GP models and via predictions of inverse GP model in an iterative manner. An uncertainty quantification method using MCMC sampling in conjunction with GP model is also presented to obtain a probabilistic estimate of reservoir properties and estimated ultimate recovery (EUR). An application of the proposed GP-VARS methodology on PUNQ-S3 reservoir is presented in which it is shown that GP-VARS provides history match solutions in approximately four times less numerical simulations as compared to the differential evolution (DE) algorithm. Furthermore, a comparison of uncertainty quantification results obtained by GP-VARS, EnKF and other previously published methods shows that the P50 estimate of oil EUR obtained by GP-VARS is in close agreement to the true values for the PUNQ-S3 reservoir.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Bayesian demography 250 years after Bayes
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
Geochemical Characterization Using Geophysical Data and Markov Chain Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Chen, J.; Hubbard, S.; Rubin, Y.; Murray, C.; Roden, E.; Majer, E.
2002-12-01
Although the spatial distribution of geochemical parameters is extremely important for many subsurface remediation approaches, traditional characterization of those parameters is invasive and laborious, and thus is rarely performed sufficiently to describe natural hydrogeological variability at the field-scale. This study is an effort to jointly use multiple sources of information, including noninvasive geophysical data, for geochemical characterization of the saturated and anaerobic portion of the DOE South Oyster Bacterial Transport Site in Virginia. Our data set includes hydrogeological and geochemical measurements from five boreholes and ground-penetrating radar (GPR) and seismic tomographic data along two profiles that traverse the boreholes. The primary geochemical parameters are the concentrations of extractable ferrous iron Fe(II) and ferric iron Fe(III). Since iron-reducing bacteria can reduce Fe(III) to Fe(II) under certain conditions, information about the spatial distributions of Fe(II) and Fe(III) may indicate both where microbial iron reduction has occurred and in which zone it is likely to occur in the future. In addition, as geochemical heterogeneity influences bacterial transport and activity, estimates of the geochemical parameters provide important input to numerical flow and contaminant transport models geared toward bioremediation. Motivated by our previous research, which demonstrated that crosshole geophysical data could be very useful for estimating hydrogeological parameters, we hypothesize in this study that geochemical and geophysical parameters may be linked through their mutual dependence on hydrogeological parameters such as lithofacies. We attempt to estimate geochemical parameters using both hydrogeological and geophysical measurements in a Bayesian framework. Within the two-dimensional study domain (12m x 6m vertical cross section divided into 0.25m x 0.25m pixels), geochemical and hydrogeological parameters were considered as data if they were available from direct measurements or as variables otherwise. To estimate the geochemical parameters, we first assigned a prior model for each variable and a likelihood model for each type of data, which together define posterior probability distributions for each variable on the domain. Since the posterior probability distribution may involve hundreds of variables, we used a Markov Chain Monte Carlo (MCMC) method to explore each variable by generating and subsequently evaluating hundreds of realizations. Results from this case study showed that although geophysical attributes are not necessarily directly related to geochemical parameters, geophysical data could be very useful for providing accurate and high-resolution information about geochemical parameter distribution through their joint and indirect connections with hydrogeological properties such as lithofacies. This case study also demonstrated that MCMC methods were particularly useful for geochemical parameter estimation using geophysical data because they allow incorporation into the procedure of spatial correlation information, measurement errors, and cross correlations among different types of parameters.
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; Lall, Upmanu; Engel, Vic
2011-09-01
The ability to map relationships between ecological outcomes and hydrologic conditions in the Everglades National Park (ENP) is a key building block for their restoration program, a primary goal of which is to improve conditions for wading birds. This paper presents a model linking wading bird foraging numbers to hydrologic conditions in the ENP. Seasonal hydrologic statistics derived from a single water level recorder are well correlated with water depths throughout most areas of the ENP, and are effective as predictors of wading bird numbers when using a nonlinear hierarchical Bayesian model to estimate the conditional distribution of bird populations. Model parameters are estimated using a Markov chain Monte Carlo (MCMC) procedure. Parameter and model uncertainty is assessed as a byproduct of the estimation process. Water depths at the beginning of the nesting season, the average dry season water level, and the numbers of reversals from the dry season recession are identified as significant predictors, consistent with the hydrologic conditions considered important in the production and concentration of prey organisms in this system. Long-term hydrologic records at the index location allow for a retrospective analysis (1952-2006) of foraging bird numbers showing low frequency oscillations in response to decadal fluctuations in hydroclimatic conditions. Simulations of water levels at the index location used in the Bayesian model under alternative water management scenarios allow the posterior probability distributions of the number of foraging birds to be compared, thus providing a mechanism for linking management schemes to seasonal rainfall forecasts.
Phylogeny and biogeography of the amphi-Pacific genus Aphananthe
Yang, Mei-Qing; Li, De-Zhu; Wen, Jun; Yi, Ting-Shuang
2017-01-01
Aphananthe is a small genus of five species showing an intriguing amphi-Pacific distribution in eastern, southern and southeastern Asia, Australia, and Mexico, also with one species in Madagascar. The phylogenetic relationships of Aphananthe were reconstructed with two nuclear (ITS & ETS) and two plastid (psbA-trnH & trnL-trnF) regions. Clade divergence times were estimated with a Bayesian approach, and the ancestral areas were inferred using the dispersal-extinction-cladogenesis and Bayesian Binary MCMC analyses. Aphananthe was supported to be monophyletic, with the eastern Asian A. aspera resolved as sister to a clade of the remaining four species. Aphananthe was inferred to have originated in the Late Cretaceous (71.5 mya, with 95% HPD: 66.6–81.3 mya), and the crown age of the genus was dated to be in the early Miocene (19.1 mya, with 95% HPD: 12.4–28.9 mya). The fossil record indicates that Aphananthe was present in the high latitude thermophilic forests in the early Tertiary, and experienced extinctions from the middle Tertiary onwards. Aphananthe originated in Europe based on the inference that included fossil and extant species, but eastern Asia was estimated to be the ancestral area of the clade of the extant species of Aphananthe. Both the West Gondwanan vicariance hypothesis and the boreotropics hypothesis could be excluded as explanation for its amphi-Pacific distribution. Long-distance dispersals out of eastern Asia into North America, southern and southeastern Asia and Australia, and Madagascar during the Miocene account for its wide intercontinental disjunct distribution. PMID:28170425
NASA Astrophysics Data System (ADS)
Madsen, Line Meldgaard; Fiandaca, Gianluca; Auken, Esben; Christiansen, Anders Vest
2017-12-01
The application of time-domain induced polarization (TDIP) is increasing with advances in acquisition techniques, data processing and spectral inversion schemes. An inversion of TDIP data for the spectral Cole-Cole parameters is a non-linear problem, but by applying a 1-D Markov Chain Monte Carlo (MCMC) inversion algorithm, a full non-linear uncertainty analysis of the parameters and the parameter correlations can be accessed. This is essential to understand to what degree the spectral Cole-Cole parameters can be resolved from TDIP data. MCMC inversions of synthetic TDIP data, which show bell-shaped probability distributions with a single maximum, show that the Cole-Cole parameters can be resolved from TDIP data if an acquisition range above two decades in time is applied. Linear correlations between the Cole-Cole parameters are observed and by decreasing the acquisitions ranges, the correlations increase and become non-linear. It is further investigated how waveform and parameter values influence the resolution of the Cole-Cole parameters. A limiting factor is the value of the frequency exponent, C. As C decreases, the resolution of all the Cole-Cole parameters decreases and the results become increasingly non-linear. While the values of the time constant, τ, must be in the acquisition range to resolve the parameters well, the choice between a 50 per cent and a 100 per cent duty cycle for the current injection does not have an influence on the parameter resolution. The limits of resolution and linearity are also studied in a comparison between the MCMC and a linearized gradient-based inversion approach. The two methods are consistent for resolved models, but the linearized approach tends to underestimate the uncertainties for poorly resolved parameters due to the corresponding non-linear features. Finally, an MCMC inversion of 1-D field data verifies that spectral Cole-Cole parameters can also be resolved from TD field measurements.
Optimizing cosmological surveys in a crowded market
NASA Astrophysics Data System (ADS)
Bassett, Bruce A.
2005-04-01
Optimizing the major next-generation cosmological surveys (such as SNAP, KAOS, etc.) is a key problem given our ignorance of the physics underlying cosmic acceleration and the plethora of surveys planned. We propose a Bayesian design framework which (1) maximizes the discrimination power of a survey without assuming any underlying dark-energy model, (2) finds the best niche survey geometry given current data and future competing experiments, (3) maximizes the cross section for serendipitous discoveries and (4) can be adapted to answer specific questions (such as “is dark energy dynamical?”). Integrated parameter-space optimization (IPSO) is a design framework that integrates projected parameter errors over an entire dark energy parameter space and then extremizes a figure of merit (such as Shannon entropy gain which we show is stable to off-diagonal covariance matrix perturbations) as a function of survey parameters using analytical, grid or MCMC techniques. We discuss examples where the optimization can be performed analytically. IPSO is thus a general, model-independent and scalable framework that allows us to appropriately use prior information to design the best possible surveys.
Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu
2002-01-01
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences. PMID:12136032
Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu
2002-07-01
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.
Experiences with Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples
ERIC Educational Resources Information Center
Sinharay, Sandip
2004-01-01
There is an increasing use of Markov chain Monte Carlo (MCMC) algorithms for fitting statistical models in psychometrics, especially in situations where the traditional estimation techniques are very difficult to apply. One of the disadvantages of using an MCMC algorithm is that it is not straightforward to determine the convergence of the…
A Technology Solution Strengthens Comprehensive Environmental Management
2012-05-23
General Navigation Chemical Approval Example NEPA Coordination Example Safety PPE Example Summary Marine Corps Support Facility...coordination, completion and documentation through automated workflows of various business processes Chemical Approval NEPA Coordination Safety ...Completion Diagram Government Employee/M CMC MCMC Chemical Manager MCMC HS&E Specialist IMO Chemical Safety Specialist IMO Chemical Environmental
NASA Astrophysics Data System (ADS)
Wang, Ji; Zhang, Ru; Yan, Yuting; Dong, Xiaoqiang; Li, Jun Ming
2017-05-01
Hazardous gas leaks in the atmosphere can cause significant economic losses in addition to environmental hazards, such as fires and explosions. A three-stage hazardous gas leak source localization method was developed that uses movable and stationary gas concentration sensors. The method calculates a preliminary source inversion with a modified genetic algorithm (MGA) and has the potential to crossover with eliminated individuals from the population, following the selection of the best candidate. The method then determines a search zone using Markov Chain Monte Carlo (MCMC) sampling, utilizing a partial evaluation strategy. The leak source is then accurately localized using a modified guaranteed convergence particle swarm optimization algorithm with several bad-performing individuals, following selection of the most successful individual with dynamic updates. The first two stages are based on data collected by motionless sensors, and the last stage is based on data from movable robots with sensors. The measurement error adaptability and the effect of the leak source location were analyzed. The test results showed that this three-stage localization process can localize a leak source within 1.0 m of the source for different leak source locations, with measurement error standard deviation smaller than 2.0.
Somarathna, P D S N; Minasny, Budiman; Malone, Brendan P; Stockmann, Uta; McBratney, Alex B
2018-08-01
Spatial modelling of environmental data commonly only considers spatial variability as the single source of uncertainty. In reality however, the measurement errors should also be accounted for. In recent years, infrared spectroscopy has been shown to offer low cost, yet invaluable information needed for digital soil mapping at meaningful spatial scales for land management. However, spectrally inferred soil carbon data are known to be less accurate compared to laboratory analysed measurements. This study establishes a methodology to filter out the measurement error variability by incorporating the measurement error variance in the spatial covariance structure of the model. The study was carried out in the Lower Hunter Valley, New South Wales, Australia where a combination of laboratory measured, and vis-NIR and MIR inferred topsoil and subsoil soil carbon data are available. We investigated the applicability of residual maximum likelihood (REML) and Markov Chain Monte Carlo (MCMC) simulation methods to generate parameters of the Matérn covariance function directly from the data in the presence of measurement error. The results revealed that the measurement error can be effectively filtered-out through the proposed technique. When the measurement error was filtered from the data, the prediction variance almost halved, which ultimately yielded a greater certainty in spatial predictions of soil carbon. Further, the MCMC technique was successfully used to define the posterior distribution of measurement error. This is an important outcome, as the MCMC technique can be used to estimate the measurement error if it is not explicitly quantified. Although this study dealt with soil carbon data, this method is amenable for filtering the measurement error of any kind of continuous spatial environmental data. Copyright © 2018 Elsevier B.V. All rights reserved.
2013-08-01
cost due to potential warranty costs, repairs and loss of market share. Reliability is the probability that the system will perform its intended...MCMC and splitting sampling schemes. Our proposed SS/ STP method is presented in Section 4, including accuracy bounds and computational effort
Markov chain Monte Carlo linkage analysis: effect of bin width on the probability of linkage.
Slager, S L; Juo, S H; Durner, M; Hodge, S E
2001-01-01
We analyzed part of the Genetic Analysis Workshop (GAW) 12 simulated data using Monte Carlo Markov chain (MCMC) methods that are implemented in the computer program Loki. The MCMC method reports the "probability of linkage" (PL) across the chromosomal regions of interest. The point of maximum PL can then be taken as a "location estimate" for the location of the quantitative trait locus (QTL). However, Loki does not provide a formal statistical test of linkage. In this paper, we explore how the bin width used in the calculations affects the max PL and the location estimate. We analyzed age at onset (AO) and quantitative trait number 5, Q5, from 26 replicates of the general simulated data in one region where we knew a major gene, MG5, is located. For each trait, we found the max PL and the corresponding location estimate, using four different bin widths. We found that bin width, as expected, does affect the max PL and the location estimate, and we recommend that users of Loki explore how their results vary with different bin widths.
Prior approval: the growth of Bayesian methods in psychology.
Andrews, Mark; Baguley, Thom
2013-02-01
Within the last few years, Bayesian methods of data analysis in psychology have proliferated. In this paper, we briefly review the history or the Bayesian approach to statistics, and consider the implications that Bayesian methods have for the theory and practice of data analysis in psychology.
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Li, L.; Xu, C.-Y.; Engeland, K.
2012-04-01
With respect to model calibration, parameter estimation and analysis of uncertainty sources, different approaches have been used in hydrological models. Bayesian method is one of the most widely used methods for uncertainty assessment of hydrological models, which incorporates different sources of information into a single analysis through Bayesian theorem. However, none of these applications can well treat the uncertainty in extreme flows of hydrological models' simulations. This study proposes a Bayesian modularization method approach in uncertainty assessment of conceptual hydrological models by considering the extreme flows. It includes a comprehensive comparison and evaluation of uncertainty assessments by a new Bayesian modularization method approach and traditional Bayesian models using the Metropolis Hasting (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions are used in combination with traditional Bayesian: the AR (1) plus Normal and time period independent model (Model 1), the AR (1) plus Normal and time period dependent model (Model 2) and the AR (1) plus multi-normal model (Model 3). The results reveal that (1) the simulations derived from Bayesian modularization method are more accurate with the highest Nash-Sutcliffe efficiency value, and (2) the Bayesian modularization method performs best in uncertainty estimates of entire flows and in terms of the application and computational efficiency. The study thus introduces a new approach for reducing the extreme flow's effect on the discharge uncertainty assessment of hydrological models via Bayesian. Keywords: extreme flow, uncertainty assessment, Bayesian modularization, hydrological model, WASMOD
Mining geographic variations of Plasmodium vivax for active surveillance: a case study in China.
Shi, Benyun; Tan, Qi; Zhou, Xiao-Nong; Liu, Jiming
2015-05-27
Geographic variations of an infectious disease characterize the spatial differentiation of disease incidences caused by various impact factors, such as environmental, demographic, and socioeconomic factors. Some factors may directly determine the force of infection of the disease (namely, explicit factors), while many other factors may indirectly affect the number of disease incidences via certain unmeasurable processes (namely, implicit factors). In this study, the impact of heterogeneous factors on geographic variations of Plasmodium vivax incidences is systematically investigate in Tengchong, Yunnan province, China. A space-time model that resembles a P. vivax transmission model and a hidden time-dependent process, is presented by taking into consideration both explicit and implicit factors. Specifically, the transmission model is built upon relevant demographic, environmental, and biophysical factors to describe the local infections of P. vivax. While the hidden time-dependent process is assessed by several socioeconomic factors to account for the imported cases of P. vivax. To quantitatively assess the impact of heterogeneous factors on geographic variations of P. vivax infections, a Markov chain Monte Carlo (MCMC) simulation method is developed to estimate the model parameters by fitting the space-time model to the reported spatial-temporal disease incidences. Since there is no ground-truth information available, the performance of the MCMC method is first evaluated against a synthetic dataset. The results show that the model parameters can be well estimated using the proposed MCMC method. Then, the proposed model is applied to investigate the geographic variations of P. vivax incidences among all 18 towns in Tengchong, Yunnan province, China. Based on the geographic variations, the 18 towns can be further classify into five groups with similar socioeconomic causality for P. vivax incidences. Although this study focuses mainly on the transmission of P. vivax, the proposed space-time model is general and can readily be extended to investigate geographic variations of other diseases. Practically, such a computational model will offer new insights into active surveillance and strategic planning for disease surveillance and control.
An MCMC determination of the primordial helium abundance
NASA Astrophysics Data System (ADS)
Aver, Erik; Olive, Keith A.; Skillman, Evan D.
2012-04-01
Spectroscopic observations of the chemical abundances in metal-poor H II regions provide an independent method for estimating the primordial helium abundance. H II regions are described by several physical parameters such as electron density, electron temperature, and reddening, in addition to y, the ratio of helium to hydrogen. It had been customary to estimate or determine self-consistently these parameters to calculate y. Frequentist analyses of the parameter space have been shown to be successful in these parameter determinations, and Markov Chain Monte Carlo (MCMC) techniques have proven to be very efficient in sampling this parameter space. Nevertheless, accurate determination of the primordial helium abundance from observations of H II regions is constrained by both systematic and statistical uncertainties. In an attempt to better reduce the latter, and continue to better characterize the former, we apply MCMC methods to the large dataset recently compiled by Izotov, Thuan, & Stasińska (2007). To improve the reliability of the determination, a high quality dataset is needed. In pursuit of this, a variety of cuts are explored. The efficacy of the He I λ4026 emission line as a constraint on the solutions is first examined, revealing the introduction of systematic bias through its absence. As a clear measure of the quality of the physical solution, a χ2 analysis proves instrumental in the selection of data compatible with the theoretical model. Nearly two-thirds of the observations fall outside a standard 95% confidence level cut, which highlights the care necessary in selecting systems and warrants further investigation into potential deficiencies of the model or data. In addition, the method also allows us to exclude systems for which parameter estimations are statistical outliers. As a result, the final selected dataset gains in reliability and exhibits improved consistency. Regression to zero metallicity yields Yp = 0.2534 ± 0.0083, in broad agreement with the WMAP result. The inclusion of more observations shows promise for further reducing the uncertainty, but more high quality spectra are required.
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B; Neyer, Franz J; van Aken, Marcel AG
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided. PMID:24116396
Detecting consistent patterns of directional adaptation using differential selection codon models.
Parto, Sahar; Lartillot, Nicolas
2017-06-23
Phylogenetic codon models are often used to characterize the selective regimes acting on protein-coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by modeling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions. Here, we introduce a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles. Our differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.
Genomic analysis of the Chinese genotype 1F rubella virus that disappeared after 2002 in China.
Zhu, Zhen; Chen, Min-Hsin; Abernathy, Emily; Zhou, Shujie; Wang, Changyin; Icenogle, Joseph; Xu, Wenbo
2014-12-01
Genotype 1F was likely localized geographically to China as it has not been reported elsewhere. In this study, whole genome sequences of two rubella 1F virus isolates were completed. Both viruses contained 9,761 nt with a single nucleotide deletion in the intergenic region, compared to the NCBI rubella reference sequence (NC 001545). No evidence of recombination was found between 1F and other rubella viruses. The genetic distance between 1F viruses and 10 other rubella virus genotypes (1a, 1B, 1C, 1D, 1E, 1G, 1J 2A, 2B, and 2C) ranged from 3.9% to 8.6% by pairwise comparison. A region known to be hypervariable in other rubella genotypes was also the most variable region in the 1F genomes. Comparisons to all available rubella virus sequences from GenBank identified 22 nucleotide variations exclusively in 1F viruses. Among these unique variations, C9306U is located within the recommended molecular window for rubella virus genotyping assignment, could be useful to confirm 1F viruses. Using the Bayesian Markov Chain Monte Carlo (MCMC) method, the time of the most recent common ancestor for the genotype 1F was estimated between 1976 and 1995. Recent rubella molecular surveillance suggests that this indigenous strain may have circulated for less than three decades, as it has not been detected since 2002. © 2014 Wiley Periodicals, Inc.
The behavior of Metropolis-coupled Markov chains when sampling rugged phylogenetic distributions.
Brown, Jeremy M; Thomson, Robert C
2018-02-15
Bayesian phylogenetic inference involves sampling from posterior distributions of trees, which sometimes exhibit local optima, or peaks, separated by regions of low posterior density. Markov chain Monte Carlo (MCMC) algorithms are the most widely used numerical method for generating samples from these posterior distributions, but they are susceptible to entrapment on individual optima in rugged distributions when they are unable to easily cross through or jump across regions of low posterior density. Ruggedness of posterior distributions can result from a variety of factors, including unmodeled variation in evolutionary processes and unrecognized variation in the true topology across sites or genes. Ruggedness can also become exaggerated when constraints are placed on topologies that require the presence or absence of particular bipartitions (often referred to as positive or negative constraints, respectively). These types of constraints are frequently employed when conducting tests of topological hypotheses (Bergsten et al. 2013; Brown and Thomson 2017). Negative constraints can lead to particularly rugged distributions when the data strongly support a forbidden clade, because monophyly of the clade can be disrupted by inserting outgroup taxa in many different ways. However, topological moves between the alternative disruptions are very difficult, because they require swaps between the inserted outgroup taxa while the data constrain taxa from the forbidden clade to remain close together on the tree. While this precise form of ruggedness is particular to negative constraints, trees with high posterior density can be separated by similarly complicated topological rearrangements, even in the absence of constraints.
Krafty, Robert T; Rosen, Ori; Stoffer, David S; Buysse, Daniel J; Hall, Martica H
2017-01-01
This article considers the problem of analyzing associations between power spectra of multiple time series and cross-sectional outcomes when data are observed from multiple subjects. The motivating application comes from sleep medicine, where researchers are able to non-invasively record physiological time series signals during sleep. The frequency patterns of these signals, which can be quantified through the power spectrum, contain interpretable information about biological processes. An important problem in sleep research is drawing connections between power spectra of time series signals and clinical characteristics; these connections are key to understanding biological pathways through which sleep affects, and can be treated to improve, health. Such analyses are challenging as they must overcome the complicated structure of a power spectrum from multiple time series as a complex positive-definite matrix-valued function. This article proposes a new approach to such analyses based on a tensor-product spline model of Cholesky components of outcome-dependent power spectra. The approach exibly models power spectra as nonparametric functions of frequency and outcome while preserving geometric constraints. Formulated in a fully Bayesian framework, a Whittle likelihood based Markov chain Monte Carlo (MCMC) algorithm is developed for automated model fitting and for conducting inference on associations between outcomes and spectral measures. The method is used to analyze data from a study of sleep in older adults and uncovers new insights into how stress and arousal are connected to the amount of time one spends in bed.
NASA Astrophysics Data System (ADS)
Li, Ming-Hua; Zhu, Weishan; Zhao, Dong
2018-05-01
The gas is the dominant component of baryonic matter in most galaxy groups and clusters. The spatial offsets of gas centre from the halo centre could be an indicator of the dynamical state of cluster. Knowledge of such offsets is important for estimate the uncertainties when using clusters as cosmological probes. In this paper, we study the centre offsets roff between the gas and that of all the matter within halo systems in ΛCDM cosmological hydrodynamic simulations. We focus on two kinds of centre offsets: one is the three-dimensional PB offsets between the gravitational potential minimum of the entire halo and the barycentre of the ICM, and the other is the two-dimensional PX offsets between the potential minimum of the halo and the iterative centroid of the projected synthetic X-ray emission of the halo. Haloes at higher redshifts tend to have larger values of rescaled offsets roff/r200 and larger gas velocity dispersion σ v^gas/σ _{200}. For both types of offsets, we find that the correlation between the rescaled centre offsets roff/r200 and the rescaled 3D gas velocity dispersion, σ _v^gas/σ _{200} can be approximately described by a quadratic function as r_{off}/r_{200} ∝ (σ v^gas/σ _{200} - k_2)2. A Bayesian analysis with MCMC method is employed to estimate the model parameters. Dependence of the correlation relation on redshifts and the gas mass fraction are also investigated.
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
ERIC Educational Resources Information Center
van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A. G.
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are…
NASA Astrophysics Data System (ADS)
Lenoir, Guillaume; Crucifix, Michel
2018-03-01
We develop a general framework for the frequency analysis of irregularly sampled time series. It is based on the Lomb-Scargle periodogram, but extended to algebraic operators accounting for the presence of a polynomial trend in the model for the data, in addition to a periodic component and a background noise. Special care is devoted to the correlation between the trend and the periodic component. This new periodogram is then cast into the Welch overlapping segment averaging (WOSA) method in order to reduce its variance. We also design a test of significance for the WOSA periodogram, against the background noise. The model for the background noise is a stationary Gaussian continuous autoregressive-moving-average (CARMA) process, more general than the classical Gaussian white or red noise processes. CARMA parameters are estimated following a Bayesian framework. We provide algorithms that compute the confidence levels for the WOSA periodogram and fully take into account the uncertainty in the CARMA noise parameters. Alternatively, a theory using point estimates of CARMA parameters provides analytical confidence levels for the WOSA periodogram, which are more accurate than Markov chain Monte Carlo (MCMC) confidence levels and, below some threshold for the number of data points, less costly in computing time. We then estimate the amplitude of the periodic component with least-squares methods, and derive an approximate proportionality between the squared amplitude and the periodogram. This proportionality leads to a new extension for the periodogram: the weighted WOSA periodogram, which we recommend for most frequency analyses with irregularly sampled data. The estimated signal amplitude also permits filtering in a frequency band. Our results generalise and unify methods developed in the fields of geosciences, engineering, astronomy and astrophysics. They also constitute the starting point for an extension to the continuous wavelet transform developed in a companion article (Lenoir and Crucifix, 2018). All the methods presented in this paper are available to the reader in the Python package WAVEPAL.
Steen Magnussen
2009-01-01
Areas burned annually in 29 Canadian forest fire regions show a patchy and irregular correlation structure that significantly influences the distribution of annual totals for Canada and for groups of regions. A binary Monte Carlo Markov Chain (MCMC) is constructed for the purpose of joint simulation of regional areas burned in forest fires. For each year the MCMC...
ERIC Educational Resources Information Center
Matthews-Lopez, Joy L.; Hombo, Catherine M.
The purpose of this study was to examine the recovery of item parameters in simulated Automatic Item Generation (AIG) conditions, using Markov chain Monte Carlo (MCMC) estimation methods to attempt to recover the generating distributions. To do this, variability in item and ability parameters was manipulated. Realistic AIG conditions were…
Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model
ERIC Educational Resources Information Center
Roberts, James S.; Thompson, Vanessa M.
2011-01-01
A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
Teaching Bayesian Statistics in a Health Research Methodology Program
ERIC Educational Resources Information Center
Pullenayegum, Eleanor M.; Thabane, Lehana
2009-01-01
Despite the appeal of Bayesian methods in health research, they are not widely used. This is partly due to a lack of courses in Bayesian methods at an appropriate level for non-statisticians in health research. Teaching such a course can be challenging because most statisticians have been taught Bayesian methods using a mathematical approach, and…
Yoshihara, Keisuke; Le, Minh Nhat; Nagasawa, Koo; Tsukagoshi, Hiroyuki; Nguyen, Hien Anh; Toizumi, Michiko; Moriuchi, Hiroyuki; Hashizume, Masahiro; Ariyoshi, Koya; Dang, Duc Anh; Kimura, Hirokazu; Yoshida, Lay-Myint
2016-11-01
We performed molecular evolutionary analyses of the G gene C-terminal 3rd hypervariable region of RSV-A genotypes NA1 and ON1 strains from the paediatric acute respiratory infection patients in central Vietnam during the 2010-2012 study period. Time-scaled phylogenetic analyses were performed using Bayesian Markov Chain Monte Carlo (MCMC) method, and pairwise distances (p-distances) were calculated. Bayesian Skyline Plot (BSP) was constructed to analyze the time-trend relative genetic diversity of central Vietnam RSV-A strains. We also estimated the N-glycosylation sites within G gene hypervariable region. Amino acid substitutions under positive and negative selection pressure were examined using Conservative Single Likelihood Ancestor Counting (SLAC), Fixed Effects Likelihood (FEL), Internal Fixed Effects Likelihood (IFEL) and Mixed Effects Model for Episodic Diversifying Selection (MEME) models. The majority of central Vietnam ON1 strains detected in 2012 were classified into lineage 1 with few positively selected substitutions. As for the Vietnamese NA1 strains, four lineages were circulating during the study period with a few positive selection sites. Shifting patterns of the predominantly circulating NA1 lineage were observed in each year during the investigation period. Median p-distance of central Vietnam NA1 strains was wider (p-distance=0.028) than that of ON1 (p-distance=0.012). The molecular evolutionary rate of central Vietnam ON1 strains was estimated to be 2.55×10 -2 (substitutions/site/year) and was faster than NA1 (7.12×10 -3 (substitutions/site/year)). Interestingly, the evolutionary rates of both genotypes ON1 and NA1 strains from central Vietnam were faster than the global strains respectively. Furthermore, the shifts of N-glycosylation pattern within the G gene 3rd hypervariable region of Vietnamese NA1 strains were observed in each year. BSP analysis indicated the rapid growth of RSV-A effective population size in early 2012. These results suggested that the molecular evolution of RSV-A G gene detected in central Vietnam was fast with unique evolutionary dynamics. Copyright © 2016 Elsevier B.V. All rights reserved.
2014-01-01
Background Given that most species that have ever existed on earth are extinct, it stands to reason that the evolutionary history can be better understood with fossil taxa. Bauhinia is a typical genus of pantropical intercontinental disjunction among the Asian, African, and American continents. Geographic distribution patterns are better recognized when fossil records and molecular sequences are combined in the analyses. Here, we describe a new macrofossil species of Bauhinia from the Upper Miocene Xiaolongtan Formation in Wenshan County, Southeast Yunnan, China, and elucidate the biogeographic significance through the analyses of molecules and fossils. Results Morphometric analysis demonstrates that the leaf shapes of B. acuminata, B. championii, B. chalcophylla, B. purpurea, and B. podopetala closely resemble the leaf shapes of the new finding fossil. Phylogenetic relationships among the Bauhinia species were reconstructed using maximum parsimony and Bayesian inference, which inferred that species in Bauhinia species are well-resolved into three main groups. Divergence times were estimated by the Bayesian Markov chain Monte Carlo (MCMC) method under a relaxed clock, and inferred that the stem diversification time of Bauhinia was ca. 62.7 Ma. The Asian lineage first diverged at ca. 59.8 Ma, followed by divergence of the Africa lineage starting during the late Eocene, whereas that of the neotropical lineage starting during the middle Miocene. Conclusions Hypotheses relying on vicariance or continental history to explain pantropical disjunct distributions are dismissed because they require mostly Palaeogene and older tectonic events. We suggest that Bauhinia originated in the middle Paleocene in Laurasia, probably in Asia, implying a possible Tethys Seaway origin or an “Out of Tropical Asia”, and dispersal of legumes. Its present pantropical disjunction resulted from disruption of the boreotropical flora by climatic cooling after the Paleocene-Eocene Thermal Maximum (PETM). North Atlantic land bridges (NALB) seem the most plausible route for migration of Bauhinia from Asia to America; and additional aspects of the Bauhinia species distribution are explained by migration and long distance dispersal (LDD) from Eurasia to the African and American continents. PMID:25288346
Meng, Hong-Hu; Jacques, Frédéric Mb; Su, Tao; Huang, Yong-Jiang; Zhang, Shi-Tao; Ma, Hong-Jie; Zhou, Zhe-Kun
2014-08-10
Given that most species that have ever existed on earth are extinct, it stands to reason that the evolutionary history can be better understood with fossil taxa. Bauhinia is a typical genus of pantropical intercontinental disjunction among the Asian, African, and American continents. Geographic distribution patterns are better recognized when fossil records and molecular sequences are combined in the analyses. Here, we describe a new macrofossil species of Bauhinia from the Upper Miocene Xiaolongtan Formation in Wenshan County, Southeast Yunnan, China, and elucidate the biogeographic significance through the analyses of molecules and fossils. Morphometric analysis demonstrates that the leaf shapes of B. acuminata, B. championii, B. chalcophylla, B. purpurea, and B. podopetala closely resemble the leaf shapes of the new finding fossil. Phylogenetic relationships among the Bauhinia species were reconstructed using maximum parsimony and Bayesian inference, which inferred that species in Bauhinia species are well-resolved into three main groups. Divergence times were estimated by the Bayesian Markov chain Monte Carlo (MCMC) method under a relaxed clock, and inferred that the stem diversification time of Bauhinia was ca. 62.7 Ma. The Asian lineage first diverged at ca. 59.8 Ma, followed by divergence of the Africa lineage starting during the late Eocene, whereas that of the neotropical lineage starting during the middle Miocene. Hypotheses relying on vicariance or continental history to explain pantropical disjunct distributions are dismissed because they require mostly Palaeogene and older tectonic events. We suggest that Bauhinia originated in the middle Paleocene in Laurasia, probably in Asia, implying a possible Tethys Seaway origin or an "Out of Tropical Asia", and dispersal of legumes. Its present pantropical disjunction resulted from disruption of the boreotropical flora by climatic cooling after the Paleocene-Eocene Thermal Maximum (PETM). North Atlantic land bridges (NALB) seem the most plausible route for migration of Bauhinia from Asia to America; and additional aspects of the Bauhinia species distribution are explained by migration and long distance dispersal (LDD) from Eurasia to the African and American continents.
Enhancements to the Bayesian Infrasound Source Location Method
2012-09-01
ENHANCEMENTS TO THE BAYESIAN INFRASOUND SOURCE LOCATION METHOD Omar E. Marcillo, Stephen J. Arrowsmith, Rod W. Whitaker, and Dale N. Anderson Los...ABSTRACT We report on R&D that is enabling enhancements to the Bayesian Infrasound Source Location (BISL) method for infrasound event location...the Bayesian Infrasound Source Location Method 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER
Hierarchical multistage MCMC follow-up of continuous gravitational wave candidates
NASA Astrophysics Data System (ADS)
Ashton, G.; Prix, R.
2018-05-01
Leveraging Markov chain Monte Carlo optimization of the F statistic, we introduce a method for the hierarchical follow-up of continuous gravitational wave candidates identified by wide-parameter space semicoherent searches. We demonstrate parameter estimation for continuous wave sources and develop a framework and tools to understand and control the effective size of the parameter space, critical to the success of the method. Monte Carlo tests of simulated signals in noise demonstrate that this method is close to the theoretical optimal performance.
Uncertainty analysis for fluorescence tomography with Monte Carlo method
NASA Astrophysics Data System (ADS)
Reinbacher-Köstinger, Alice; Freiberger, Manuel; Scharfetter, Hermann
2011-07-01
Fluorescence tomography seeks to image an inaccessible fluorophore distribution inside an object like a small animal by injecting light at the boundary and measuring the light emitted by the fluorophore. Optical parameters (e.g. the conversion efficiency or the fluorescence life-time) of certain fluorophores depend on physiologically interesting quantities like the pH value or the oxygen concentration in the tissue, which allows functional rather than just anatomical imaging. To reconstruct the concentration and the life-time from the boundary measurements, a nonlinear inverse problem has to be solved. It is, however, difficult to estimate the uncertainty of the reconstructed parameters in case of iterative algorithms and a large number of degrees of freedom. Uncertainties in fluorescence tomography applications arise from model inaccuracies, discretization errors, data noise and a priori errors. Thus, a Markov chain Monte Carlo method (MCMC) was used to consider all these uncertainty factors exploiting Bayesian formulation of conditional probabilities. A 2-D simulation experiment was carried out for a circular object with two inclusions. Both inclusions had a 2-D Gaussian distribution of the concentration and constant life-time inside of a representative area of the inclusion. Forward calculations were done with the diffusion approximation of Boltzmann's transport equation. The reconstruction results show that the percent estimation error of the lifetime parameter is by a factor of approximately 10 lower than that of the concentration. This finding suggests that lifetime imaging may provide more accurate information than concentration imaging only. The results must be interpreted with caution, however, because the chosen simulation setup represents a special case and a more detailed analysis remains to be done in future to clarify if the findings can be generalized.
Hierarchical modeling of population stability and species group attributes from survey data
Sauer, J.R.; Link, W.A.
2002-01-01
Many ecological studies require analysis of collections of estimates. For example, population change is routinely estimated for many species from surveys such as the North American Breeding Bird Survey (BBS), and the species are grouped and used in comparative analyses. We developed a hierarchical model for estimation of group attributes from a collection of estimates of population trend. The model uses information from predefined groups of species to provide a context and to supplement data for individual species; summaries of group attributes are improved by statistical methods that simultaneously analyze collections of trend estimates. The model is Bayesian; trends are treated as random variables rather than fixed parameters. We use Markov Chain Monte Carlo (MCMC) methods to fit the model. Standard assessments of population stability cannot distinguish magnitude of trend and statistical significance of trend estimates, but the hierarchical model allows us to legitimately describe the probability that a trend is within given bounds. Thus we define population stability in terms of the probability that the magnitude of population change for a species is less than or equal to a predefined threshold. We applied the model to estimates of trend for 399 species from the BBS to estimate the proportion of species with increasing populations and to identify species with unstable populations. Analyses are presented for the collection of all species and for 12 species groups commonly used in BBS summaries. Overall, we estimated that 49% of species in the BBS have positive trends and 33 species have unstable populations. However, the proportion of species with increasing trends differs among habitat groups, with grassland birds having only 19% of species with positive trend estimates and wetland birds having 68% of species with positive trend estimates.
Hybrid Inflation: Multi-field Dynamics and Cosmological Constraints
NASA Astrophysics Data System (ADS)
Clesse, Sébastien
2011-09-01
The dynamics of hybrid models is usually approximated by the evolution of a scalar field slowly rolling along a nearly flat valley. Inflation ends with a waterfall phase, due to a tachyonic instability. This final phase is usually assumed to be nearly instantaneous. In this thesis, we go beyond these approximations and analyze the exact 2-field dynamics of hybrid models. Several effects are put in evidence: 1) the possible slow-roll violations along the valley induce the non existence of inflation at small field values. Provided super-planckian fields, the scalar spectrum of the original model is red, in agreement with observations. 2) The initial field values are not fine-tuned along the valley but also occupy a considerable part of the field space exterior to it. They form a structure with fractal boundaries. Using bayesian methods, their distribution in the whole parameter space is studied. Natural bounds on the potential parameters are derived. 3) For the original model, inflation is found to continue for more than 60 e-folds along waterfall trajectories in some part of the parameter space. The scalar power spectrum of adiabatic perturbations is modified and is generically red, possibly in agreement with CMB observations. Topological defects are conveniently stretched outside the observable Universe. 4) The analysis of the initial conditions is extended to the case of a closed Universe, in which the initial singularity is replaced by a classical bounce. In the third part of the thesis, we study how the present CMB constraints on the cosmological parameters could be ameliorated with the observation of the 21cm cosmic background, by future giant radio-telescopes. Forecasts are determined for a characteristic Fast Fourier Transform Telescope, by using both Fisher matrix and MCMC methods.
NASA Astrophysics Data System (ADS)
Kimura, H.; Ito, T.; Tadokoro, K.
2017-12-01
Introduction In southwest Japan, Philippine sea plate is subducting under the overriding plate such as Amurian plate, and mega interplate earthquakes has occurred at about 100 years interval. There is no occurrence of mega interplate earthquakes in southwest Japan, although it has passed about 70 years since the last mega interplate earthquakes: 1944 and 1946 along Nankai trough, meaning that the strain has been accumulated at plate interface. Therefore, it is essential to reveal the interplate coupling more precisely for predicting or understanding the mechanism of next occurring mega interplate earthquake. Recently, seafloor geodetic observation revealed the detailed interplate coupling distribution in expected source region of Nankai trough earthquake (e.g., Yokota et al. [2016]). In this study, we estimated interplate coupling in southwest Japan, considering block motion model and using seafloor geodetic observation data as well as onland GNSS observation data, based on Markov Chain Monte Carlo (MCMC) method. Method Observed crustal deformation is assumed that sum of rigid block motion and elastic deformation due to coupling at block boundaries. We modeled this relationship as a non-linear inverse problem that the unknown parameters are Euler pole of each block and coupling at each subfault, and solved them simultaneously based on MCMC method. Input data we used in this study are 863 onland GNSS observation data and 24 seafloor GPS/A observation data. We made some block division models based on the map of active fault tracing and selected the best model based on Akaike's Information Criterion (AIC): that is consist of 12 blocks. Result We find that the interplate coupling along Nankai trough has heterogeneous spatial distribution, strong at the depth of 0 to 20km at off Tokai region, and 0 to 30km at off Shikoku region. Moreover, we find that observed crustal deformation at off Tokai region is well explained by elastic deformation due to subducting Izu Micro Plate. We will present more details of our result, and discuss about not only interplate coupling but also rigid block motion, elastic deformation due to inland fault coupling, and resolution of estimated parameters.
Kruschke, John K; Liddell, Torrin M
2018-02-01
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
NASA Astrophysics Data System (ADS)
Li, Lu; Xu, Chong-Yu; Engeland, Kolbjørn
2013-04-01
SummaryWith respect to model calibration, parameter estimation and analysis of uncertainty sources, various regression and probabilistic approaches are used in hydrological modeling. A family of Bayesian methods, which incorporates different sources of information into a single analysis through Bayes' theorem, is widely used for uncertainty assessment. However, none of these approaches can well treat the impact of high flows in hydrological modeling. This study proposes a Bayesian modularization uncertainty assessment approach in which the highest streamflow observations are treated as suspect information that should not influence the inference of the main bulk of the model parameters. This study includes a comprehensive comparison and evaluation of uncertainty assessments by our new Bayesian modularization method and standard Bayesian methods using the Metropolis-Hastings (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions were used in combination with standard Bayesian method: the AR(1) plus Normal model independent of time (Model 1), the AR(1) plus Normal model dependent on time (Model 2) and the AR(1) plus Multi-normal model (Model 3). The results reveal that the Bayesian modularization method provides the most accurate streamflow estimates measured by the Nash-Sutcliffe efficiency and provide the best in uncertainty estimates for low, medium and entire flows compared to standard Bayesian methods. The study thus provides a new approach for reducing the impact of high flows on the discharge uncertainty assessment of hydrological models via Bayesian method.
Enhancing hydrologic data assimilation by evolutionary Particle Filter and Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Abbaszadeh, Peyman; Moradkhani, Hamid; Yan, Hongxiang
2018-01-01
Particle Filters (PFs) have received increasing attention by researchers from different disciplines including the hydro-geosciences, as an effective tool to improve model predictions in nonlinear and non-Gaussian dynamical systems. The implication of dual state and parameter estimation using the PFs in hydrology has evolved since 2005 from the PF-SIR (sampling importance resampling) to PF-MCMC (Markov Chain Monte Carlo), and now to the most effective and robust framework through evolutionary PF approach based on Genetic Algorithm (GA) and MCMC, the so-called EPFM. In this framework, the prior distribution undergoes an evolutionary process based on the designed mutation and crossover operators of GA. The merit of this approach is that the particles move to an appropriate position by using the GA optimization and then the number of effective particles is increased by means of MCMC, whereby the particle degeneracy is avoided and the particle diversity is improved. In this study, the usefulness and effectiveness of the proposed EPFM is investigated by applying the technique on a conceptual and highly nonlinear hydrologic model over four river basins located in different climate and geographical regions of the United States. Both synthetic and real case studies demonstrate that the EPFM improves both the state and parameter estimation more effectively and reliably as compared with the PF-MCMC.
NASA Astrophysics Data System (ADS)
Zielke, Olaf; McDougall, Damon; Mai, Martin; Babuska, Ivo
2014-05-01
Seismic, often augmented with geodetic data, are frequently used to invert for the spatio-temporal evolution of slip along a rupture plane. The resulting images of the slip evolution for a single event, inferred by different research teams, often vary distinctly, depending on the adopted inversion approach and rupture model parameterization. This observation raises the question, which of the provided kinematic source inversion solutions is most reliable and most robust, and — more generally — how accurate are fault parameterization and solution predictions? These issues are not included in "standard" source inversion approaches. Here, we present a statistical inversion approach to constrain kinematic rupture parameters from teleseismic body waves. The approach is based a) on a forward-modeling scheme that computes synthetic (body-)waves for a given kinematic rupture model, and b) on the QUESO (Quantification of Uncertainty for Estimation, Simulation, and Optimization) library that uses MCMC algorithms and Bayes theorem for sample selection. We present Bayesian inversions for rupture parameters in synthetic earthquakes (i.e. for which the exact rupture history is known) in an attempt to identify the cross-over at which further model discretization (spatial and temporal resolution of the parameter space) is no longer attributed to a decreasing misfit. Identification of this cross-over is of importance as it reveals the resolution power of the studied data set (i.e. teleseismic body waves), enabling one to constrain kinematic earthquake rupture histories of real earthquakes at a resolution that is supported by data. In addition, the Bayesian approach allows for mapping complete posterior probability density functions of the desired kinematic source parameters, thus enabling us to rigorously assess the uncertainties in earthquake source inversions.
Bayesian estimation of post-Messinian divergence times in Balearic Island lizards.
Brown, R P; Terrasa, B; Pérez-Mellado, V; Castro, J A; Hoskisson, P A; Picornell, A; Ramon, M M
2008-07-01
Phylogenetic relationships and timings of major cladogenesis events are investigated in the Balearic Island lizards Podarcislilfordi and P.pityusensis using 2675bp of mitochondrial and nuclear DNA sequences. Partitioned Bayesian and Maximum Parsimony analyses provided a well-resolved phylogeny with high node-support values. Bayesian MCMC estimation of node dates was investigated by comparing means of posterior distributions from different subsets of the sequence against the most robust analysis which used multiple partitions and allowed for rate heterogeneity among branches under a rate-drift model. Evolutionary rates were systematically underestimated and thus divergence times overestimated when sequences containing lower numbers of variable sites were used (based on ingroup node constraints). The following analyses allowed the best recovery of node times under the constant-rate (i.e., perfect clock) model: (i) all cytochrome b sequence (partitioned by codon position), (ii) cytochrome b (codon position 3 alone), (iii) NADH dehydrogenase (subunits 1 and 2; partitioned by codon position), (iv) cytochrome b and NADH dehydrogenase sequence together (six gene-codon partitions), (v) all unpartitioned sequence, (vi) a full multipartition analysis (nine partitions). Of these, only (iv) and (vi) performed well under the rate-drift model. These findings have significant implications for dating of recent divergence times in other taxa. The earliest P.lilfordi cladogenesis event (divergence of Menorcan populations), occurred before the end of the Pliocene, some 2.6Ma. Subsequent events led to a West Mallorcan lineage (2.0Ma ago), followed 1.2Ma ago by divergence of populations from the southern part of the Cabrera archipelago from a widely-distributed group from north Cabrera, northern and southern Mallorcan islets. Divergence within P.pityusensis is more recent with the main Ibiza and Formentera clades sharing a common ancestor at about 1.0Ma ago. Climatic and sea level changes are likely to have initiated cladogenesis, with lineages making secondary contact during periodic landbridge formation. This oscillating cross-archipelago pattern in which ancient divergence is followed by repeated contact resembles that seen between East-West refugia populations from mainland Europe.
Uncertainty and the Social Cost of Methane Using Bayesian Constrained Climate Models
NASA Astrophysics Data System (ADS)
Errickson, F. C.; Anthoff, D.; Keller, K.
2016-12-01
Social cost estimates of greenhouse gases are important for the design of sound climate policies and are also plagued by uncertainty. One major source of uncertainty stems from the simplified representation of the climate system used in the integrated assessment models that provide these social cost estimates. We explore how uncertainty over the social cost of methane varies with the way physical processes and feedbacks in the methane cycle are modeled by (i) coupling three different methane models to a simple climate model, (ii) using MCMC to perform a Bayesian calibration of the three coupled climate models that simulates direct sampling from the joint posterior probability density function (pdf) of model parameters, and (iii) producing probabilistic climate projections that are then used to calculate the Social Cost of Methane (SCM) with the DICE and FUND integrated assessment models. We find that including a temperature feedback in the methane cycle acts as an additional constraint during the calibration process and results in a correlation between the tropospheric lifetime of methane and several climate model parameters. This correlation is not seen in the models lacking this feedback. Several of the estimated marginal pdfs of the model parameters also exhibit different distributional shapes and expected values depending on the methane model used. As a result, probabilistic projections of the climate system out to the year 2300 exhibit different levels of uncertainty and magnitudes of warming for each of the three models under an RCP8.5 scenario. We find these differences in climate projections result in differences in the distributions and expected values for our estimates of the SCM. We also examine uncertainty about the SCM by performing a Monte Carlo analysis using a distribution for the climate sensitivity while holding all other climate model parameters constant. Our SCM estimates using the Bayesian calibration are lower and exhibit less uncertainty about extremely high values in the right tail of the distribution compared to the Monte Carlo approach. This finding has important climate policy implications and suggests previous work that accounts for climate model uncertainty by only varying the climate sensitivity parameter may overestimate the SCM.
Tau-REx: A new look at the retrieval of exoplanetary atmospheres
NASA Astrophysics Data System (ADS)
Waldmann, Ingo
2014-11-01
The field of exoplanetary spectroscopy is as fast moving as it is new. With an increasing amount of space and ground based instruments obtaining data on a large set of extrasolar planets we are indeed entering the era of exoplanetary characterisation. Permanently at the edge of instrument feasibility, it is as important as it is difficult to find the most optimal and objective methodologies to analysing and interpreting current data. This is particularly true for smaller and fainter Earth and Super-Earth type planets.For low to mid signal to noise (SNR) observations, we are prone to two sources of biases: 1) Prior selection in the data reduction and analysis; 2) Prior constraints on the spectral retrieval. In Waldmann et al. (2013), Morello et al. (2014) and Waldmann (2012, 2014) we have shown a prior-free approach to data analysis based on non-parametric machine learning techniques. Following these approaches we will present a new take on the spectral retrieval of extrasolar planets. Tau-REx (tau-retrieval of exoplanets) is a new line-by-line, atmospheric retrieval framework. In the past the decision on what opacity sources go into an atmospheric model were usually user defined. Manual input can lead to model biases and poor convergence of the atmospheric model to the data. In Tau-REx we have set out to solve this. Through custom built pattern recognition software, Tau-REx is able to rapidly identify the most likely atmospheric opacities from a large number of possible absorbers/emitters (ExoMol or HiTran data bases) and non-parametrically constrain the prior space for the Bayesian retrieval. Unlike other (MCMC based) techniques, Tau-REx is able to fully integrate high-dimensional log-likelihood spaces and to calculate the full Bayesian Evidence of the atmospheric models. We achieve this through a combination of Nested Sampling and a high degree of code parallelisation. This allows for an exact and unbiased Bayesian model selection and a fully mapping of potential model-data degeneracies. Together with non-parametric data de-trending of exoplanetary spectra, we can reach an un- precedented level of objectivity in our atmospheric characterisation of these foreign worlds.
Habibi, Neda
2014-10-15
The preparation and characterization of magnetite-carboxymethyl cellulose nano-composite (M-CMC) material is described. Magnetite nano-particles were synthesized by a modified co-precipitation method using ferrous chloride tetrahydrate and ferric chloride hexahydrate in ammonium hydroxide solution. The M-CMC nano-composite particles were synthesized by embedding the magnetite nanoparticles inside carboxymethyl cellulose (CMC) using a freshly prepared mixture of Fe3O4 with CMC precursor. Morphology, particle size, and structural properties of magnetite-carboxymethyl cellulose nano-composite was accomplished using X-ray powder diffraction (XRD), transmission electron microscopy (TEM), Fourier transformed infrared (FTIR) and field emission scanning electron microscopy (FESEM) analysis. As a result, magnetite nano-particles with an average size of 35nm were obtained. The biocompatible Fe3O4-carboxymethyl cellulose nano-composite particles obtained from the natural CMC polymers have a potential range of application in biomedical field. Copyright © 2014 Elsevier B.V. All rights reserved.
Asteroid mass estimation using Markov-chain Monte Carlo
NASA Astrophysics Data System (ADS)
Siltala, Lauri; Granvik, Mikael
2017-11-01
Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to an inverse problem in at least 13 dimensions where the aim is to derive the mass of the perturbing asteroid(s) and six orbital elements for both the perturbing asteroid(s) and the test asteroid(s) based on astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations: the very rough 'marching' approximation, in which the asteroids' orbital elements are not fitted, thereby reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-chain Monte Carlo (MCMC) approach. We describe each of these algorithms with particular focus on the MCMC algorithm, and present example results using both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans.
Estimating Tree Height-Diameter Models with the Bayesian Method
Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the “best” model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2. PMID:24711733
Estimating tree height-diameter models with the Bayesian method.
Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.
NASA Astrophysics Data System (ADS)
Ramirez, A. L.; Foxall, W.
2011-12-01
Surface displacements caused by reservoir pressure perturbations resulting from CO2 injection can often be measured by geodetic methods such as InSAR, tilt and GPS. We have developed a Markov Chain Monte Carlo (MCMC) approach to invert surface displacements measured by InSAR to map the pressure distribution associated with CO2 injection at the In Salah Krechba field, Algeria. The MCMC inversion entails sampling the solution space by proposing a series of trial 3D pressure-plume models. In the case of In Salah, the range of allowable models is constrained by prior information provided by well and geophysical data for the reservoir and possible fluid pathways in the overburden, and injection pressures and volumes. Each trial pressure distribution source is run through a (mathematical) forward model to calculate a set of synthetic surface deformation data. The likelihood that a particular proposal represents the true source is determined from the fit of the calculated data to the InSAR measurements, and those having higher likelihoods are passed to the posterior distribution. This procedure is repeated over typically ~104 - 105 trials until the posterior distribution converges to a stable solution. The solution to each stochastic inversion is in the form of Bayesian posterior probability density function (pdf) over the range of the alternative models that are consistent with the measured data and prior information. Therefore, the solution provides not only the highest likelihood model but also a realistic estimate of the solution uncertainty. Our InSalah work considered three flow model alternatives: 1) The first model assumed that the CO2 saturation and fluid pressure changes were confined to the reservoir; 2) the second model allowed the perturbations to occur also in a damage zone inferred in the lower caprock from 3D seismic surveys; and 3) the third model allowed fluid pressure changes anywhere within the reservoir and overburden. Alternative (2) yielded optimal fits to the data in inversions of InSAR data collected in 2007. The results indicate that pressure changes developed near the injection well and then penetrated into the lower caprock along the postulated damage zone. As in many geophysical inverse problems, inversion of surface displacement data for subsurface sources of deformation is inherently uncertain and non-unique. We will also discuss the approach used to characterize solution uncertainty. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
A SAS Interface for Bayesian Analysis with WinBUGS
ERIC Educational Resources Information Center
Zhang, Zhiyong; McArdle, John J.; Wang, Lijuan; Hamagami, Fumiaki
2008-01-01
Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model…
Groth, Katrina M.; Smith, Curtis L.; Swiler, Laura P.
2014-04-05
In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existingmore » HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.« less
DLRS: gene tree evolution in light of a species tree.
Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens
2012-11-15
PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).
An information-theoretic approach to the gravitational-wave burst detection problem
NASA Astrophysics Data System (ADS)
Katsavounidis, E.; Lynch, R.; Vitale, S.; Essick, R.; Robinet, F.
2016-03-01
The advanced era of gravitational-wave astronomy, with data collected in part by the LIGO gravitational-wave interferometers, has begun as of fall 2015. One potential type of detectable gravitational waves is short-duration gravitational-wave bursts, whose waveforms can be difficult to predict. We present the framework for a new detection algorithm - called oLIB - that can be used in relatively low-latency to turn calibrated strain data into a detection significance statement. This pipeline consists of 1) a sine-Gaussian matched-filter trigger generator based on the Q-transform - known as Omicron -, 2) incoherent down-selection of these triggers to the most signal-like set, and 3) a fully coherent analysis of this signal-like set using the Markov chain Monte Carlo (MCMC) Bayesian evidence calculator LALInferenceBurst (LIB). We optimally extract this information by using a likelihood-ratio test (LRT) to map these search statistics into a significance statement. Using representative archival LIGO data, we show that the algorithm can detect gravitational-wave burst events of realistic strength in realistic instrumental noise with good detection efficiencies across different burst waveform morphologies. With support from the National Science Foundation under Grant PHY-0757058.
Chen, Changyou; Buntine, Wray; Ding, Nan; Xie, Lexing; Du, Lan
2015-02-01
In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a differential topic model for this application that models both topic differences and similarities. For this we use hierarchical Bayesian nonparametric models. Moreover, we found it was important to properly model power-law phenomena in topic-word distributions and thus we used the full Pitman-Yor process rather than just a Dirichlet process. Furthermore, we propose the transformed Pitman-Yor process (TPYP) to incorporate prior knowledge such as vocabulary variations in different collections into the model. To deal with the non-conjugate issue between model prior and likelihood in the TPYP, we thus propose an efficient sampling algorithm using a data augmentation technique based on the multinomial theorem. Experimental results show the model discovers interesting aspects of different collections. We also show the proposed MCMC based algorithm achieves a dramatically reduced test perplexity compared to some existing topic models. Finally, we show our model outperforms the state-of-the-art for document classification/ideology prediction on a number of text collections.
Pattern analysis of community health center location in Surabaya using spatial Poisson point process
NASA Astrophysics Data System (ADS)
Kusumaningrum, Choriah Margareta; Iriawan, Nur; Winahju, Wiwiek Setya
2017-11-01
Community health center (puskesmas) is one of the closest health service facilities for the community, which provide healthcare for population on sub-district level as one of the government-mandated community health clinics located across Indonesia. The increasing number of this puskesmas does not directly comply the fulfillment of basic health services needed in such region. Ideally, a puskesmas has to cover up to maximum 30,000 people. The number of puskesmas in Surabaya indicates an unbalance spread in all of the area. This research aims to analyze the spread of puskesmas in Surabaya using spatial Poisson point process model in order to get the effective location of Surabaya's puskesmas which based on their location. The results of the analysis showed that the distribution pattern of puskesmas in Surabaya is non-homogeneous Poisson process and is approched by mixture Poisson model. Based on the estimated model obtained by using Bayesian mixture model couple with MCMC process, some characteristics of each puskesmas have no significant influence as factors to decide the addition of health center in such location. Some factors related to the areas of sub-districts have to be considered as covariate to make a decision adding the puskesmas in Surabaya.
Probing dim point sources in the inner Milky Way using PCAT
NASA Astrophysics Data System (ADS)
Daylan, Tansu; Portillo, Stephen K. N.; Finkbeiner, Douglas P.
2017-01-01
Poisson regression of the Fermi-LAT data in the inner Milky Way reveals an extended gamma-ray excess. An important question is whether the signal is coming from a collection of unresolved point sources, possibly old recycled pulsars, or constitutes a truly diffuse emission component. Previous analyses have relied on non-Poissonian template fits or wavelet decomposition of the Fermi-LAT data, which find evidence for a population of dim point sources just below the 3FGL flux limit. In order to be able to draw conclusions about the flux distribution of point sources at the dim end, we employ a Bayesian trans-dimensional MCMC framework by taking samples from the space of catalogs consistent with the observed gamma-ray emission in the inner Milky Way. The software implementation, PCAT (Probabilistic Cataloger), is designed to efficiently explore that catalog space in the crowded field limit such as in the galactic plane, where the model PSF, point source positions and fluxes are highly degenerate. We thus generate fair realizations of the underlying MSP population in the inner galaxy and constrain the population characteristics such as the radial and flux distribution of such sources.
Andersen, Heidi L; Ekman, Stefan
2005-01-01
The phylogeny of the family Micareaceae and the genus Micarea was studied using mitochondrial small subunit ribosomal DNA sequences. Phylogenetic reconstructions were performed using Bayesian MCMC tree sampling and a maximum likelihood approach. The Micareaceae in its current sense is highly heterogeneous, and Helocarpon, Psilolechia, and Scutula, all thought to be close relatives of Micarea, are shown to be only distantly related. The genus Micarea is paraphyletic unless the entire Pilocarpaceae and Ectolechiaceae are included, as also indicated by an expected likelihood weights test. It is suggested that the Micareaceae is reduced to synonymy with the Pilocarpaceae, which also includes the Ectolechiaceae, and that Micarea may have to be divided into a series of smaller genera in the future. Micarea species with a 'non-micareoid' photobiont group with Psora and the Ramalinaceae, whereas Micarea intrusa appears to belong in Scoliciosporum. Three species fall inside the paraphyletic Micarea: Szczawinskia tsugae, Catillaria contristans, and Fellhaneropsis vezdae. Tropical foliicolous taxa are nested within groups of mainly temperate and arctic-alpine distribution. A 'micareoid' photobiont appears to be plesiomorphic in the Pilocarpaceae but has been lost a few times.
Buesing, Lars; Bill, Johannes; Nessler, Bernhard; Maass, Wolfgang
2011-01-01
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons. PMID:22096452
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
NASA Astrophysics Data System (ADS)
Stanaway, D. J.; Flores, A. N.; Haggerty, R.; Benner, S. G.; Feris, K. P.
2011-12-01
Concurrent assessment of biogeochemical and solute transport data (i.e. advection, dispersion, transient storage) within lotic systems remains a challenge in eco-hydrological research. Recently, the Resazurin-Resorufin Smart Tracer System (RRST) was proposed as a mechanism to measure microbial activity at the sediment-water interface [Haggerty et al., 2008, 2009] associating metabolic and hydrologic processes and allowing for the reach scale extrapolation of biotic function in the context of a dynamic physical environment. This study presents a Markov Chain Monte Carlo (MCMC) data assimilation technique to solve the inverse model of the Raz Rru Advection Dispersion Equation (RRADE). The RRADE is a suite of dependent 1-D reactive ADEs, associated through the microbially mediated reduction of Raz to Rru (k12). This reduction is proportional to DO consumption (R^2=0.928). MCMC is a suite of algorithms that solve Bayes theorem to condition uncertain model states and parameters on imperfect observations. Here, the RRST is employed to quantify the effect of chronic metal exposure on hyporheic microbial metabolism along a 100+ year old metal contamination gradient in the Clark Fork River (CF). We hypothesized that 1) the energetic cost of metal tolerance limits heterotrophic microbial respiration in communities evolved in chronic metal contaminated environments, with respiration inhibition directly correlated to degree of contamination (observational experiment) and 2) when experiencing acute metal stress, respiration rate inhibition of metal tolerant communities is less than that of naïve communities (manipulative experiment). To test these hypotheses, 4 replicate columns containing sediment collected from differently contaminated CF reaches and reference sites were fed a solution of RRST, NaCl, and cadmium (manipulative experiment only) within 24 hrs post collection. Column effluent was collected and measured for Raz, Rru, and EC to determine the Raz Rru breakthrough curves (BTC), subsequently modeled by the RRADE and thereby allowing derivation of in situ rates of metabolism. RRADE parameter values are estimated through Metropolis Hastings MCMC optimization. Unknown prior parameter distributions (PD) were constrained via a sensitivity analysis, except for the empirically estimated velocity. MCMC simulations were initiated at random points within the PD. Convergence of target distributions (TD) is achieved when the variance of the mode values of the six RRADE parameters in independent model replication is at least 10^{-3} less than the mode value. Convergence of k12, the parameter of interest, was more resolved, with modal variance of replicate simulations ranging from 10^{-4} less than the modal value to 0. The MCMC algorithm presented here offers a robust approach to solve the inverse RRST model and could be easily adapted to other inverse problems.
On Bayesian Testing of Additive Conjoint Measurement Axioms Using Synthetic Likelihood
ERIC Educational Resources Information Center
Karabatsos, George
2017-01-01
This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon…
Power in Bayesian Mediation Analysis for Small Sample Research
Miočević, Milica; MacKinnon, David P.; Levy, Roy
2018-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results. PMID:29662296
Power in Bayesian Mediation Analysis for Small Sample Research.
Miočević, Milica; MacKinnon, David P; Levy, Roy
2017-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.
Spatio-temporal interpolation of precipitation during monsoon periods in Pakistan
NASA Astrophysics Data System (ADS)
Hussain, Ijaz; Spöck, Gunter; Pilz, Jürgen; Yu, Hwa-Lung
2010-08-01
Spatio-temporal estimation of precipitation over a region is essential to the modeling of hydrologic processes for water resources management. The changes of magnitude and space-time heterogeneity of rainfall observations make space-time estimation of precipitation a challenging task. In this paper we propose a Box-Cox transformed hierarchical Bayesian multivariate spatio-temporal interpolation method for the skewed response variable. The proposed method is applied to estimate space-time monthly precipitation in the monsoon periods during 1974-2000, and 27-year monthly average precipitation data are obtained from 51 stations in Pakistan. The results of transformed hierarchical Bayesian multivariate spatio-temporal interpolation are compared to those of non-transformed hierarchical Bayesian interpolation by using cross-validation. The software developed by [11] is used for Bayesian non-stationary multivariate space-time interpolation. It is observed that the transformed hierarchical Bayesian method provides more accuracy than the non-transformed hierarchical Bayesian method.
A General Model for Estimating Macroevolutionary Landscapes.
Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef
2018-03-01
The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
Mcmc Signal Extraction For 21-cm Global Signal Experiments
NASA Astrophysics Data System (ADS)
Harker, Geraint
2012-05-01
Measurements of the highly redshifted 21-cm line promise to provide a great deal of information about the dark ages of the Universe, the cosmic dawn and the epoch of reionization. It is generally accepted that strong astrophysical foregrounds are a major obstacle to overcome before this promise is realised, largely because of the way they are filtered through a complicated instrumental response. A great deal of work has therefore been devoted to studying foreground removal for observations with the low-frequency radio arrays which are starting to collect data. The case of so-called 'global signal' experiments has received less attention, however. I will compare the foreground fitting problem in these two types of experiments, and describe a foreground fitting methodology which has been developed for a proposed global signal experiment, the Dark Ages Radio Explorer (DARE), which will make use of the pristine radio-frequency environment over the far side of the Moon. The method, a fully Bayesian technique based on a Markov Chain Monte Carlo code will, however, be applicable more generally to other space- and ground-based experiments, including the prototype DARE antenna being deployed in Western Australia. For ground-based experiments, we must also contend with effects from the Earth's ionosphere and low-level radio-frequency interference. I will show early results from applying our algorithm to data from the prototype and the EDGES experiment. GH is a member of the LUNAR consortium, which is funded by the NASA Lunar Science Institute (via Cooperative Agreement NNA09DB30A) to investigate concepts for astrophysical observatories on the Moon.
A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data
Huynh, Tran; Quick, Harrison; Ramachandran, Gurumurthy; Banerjee, Sudipto; Stenzel, Mark; Sandler, Dale P.; Engel, Lawrence S.; Kwok, Richard K.; Blair, Aaron; Stewart, Patricia A.
2016-01-01
Classical statistical methods for analyzing exposure data with values below the detection limits are well described in the occupational hygiene literature, but an evaluation of a Bayesian approach for handling such data is currently lacking. Here, we first describe a Bayesian framework for analyzing censored data. We then present the results of a simulation study conducted to compare the β-substitution method with a Bayesian method for exposure datasets drawn from lognormal distributions and mixed lognormal distributions with varying sample sizes, geometric standard deviations (GSDs), and censoring for single and multiple limits of detection. For each set of factors, estimates for the arithmetic mean (AM), geometric mean, GSD, and the 95th percentile (X0.95) of the exposure distribution were obtained. We evaluated the performance of each method using relative bias, the root mean squared error (rMSE), and coverage (the proportion of the computed 95% uncertainty intervals containing the true value). The Bayesian method using non-informative priors and the β-substitution method were generally comparable in bias and rMSE when estimating the AM and GM. For the GSD and the 95th percentile, the Bayesian method with non-informative priors was more biased and had a higher rMSE than the β-substitution method, but use of more informative priors generally improved the Bayesian method’s performance, making both the bias and the rMSE more comparable to the β-substitution method. An advantage of the Bayesian method is that it provided estimates of uncertainty for these parameters of interest and good coverage, whereas the β-substitution method only provided estimates of uncertainty for the AM, and coverage was not as consistent. Selection of one or the other method depends on the needs of the practitioner, the availability of prior information, and the distribution characteristics of the measurement data. We suggest the use of Bayesian methods if the practitioner has the computational resources and prior information, as the method would generally provide accurate estimates and also provides the distributions of all of the parameters, which could be useful for making decisions in some applications. PMID:26209598
NASA Astrophysics Data System (ADS)
Mendez, Rene A.; Claveria, Ruben M.; Orchard, Marcos E.; Silva, Jorge F.
2017-11-01
We present orbital elements and mass sums for 18 visual binary stars of spectral types B to K (five of which are new orbits) with periods ranging from 20 to more than 500 yr. For two double-line spectroscopic binaries with no previous orbits, the individual component masses, using combined astrometric and radial velocity data, have a formal uncertainty of ˜ 0.1 {M}⊙ . Adopting published photometry and trigonometric parallaxes, plus our own measurements, we place these objects on an H-R diagram and discuss their evolutionary status. These objects are part of a survey to characterize the binary population of stars in the Southern Hemisphere using the SOAR 4 m telescope+HRCAM at CTIO. Orbital elements are computed using a newly developed Markov chain Monte Carlo (MCMC) algorithm that delivers maximum-likelihood estimates of the parameters, as well as posterior probability density functions that allow us to evaluate the uncertainty of our derived parameters in a robust way. For spectroscopic binaries, using our approach, it is possible to derive a self-consistent parallax for the system from the combined astrometric and radial velocity data (“orbital parallax”), which compares well with the trigonometric parallaxes. We also present a mathematical formalism that allows a dimensionality reduction of the feature space from seven to three search parameters (or from 10 to seven dimensions—including parallax—in the case of spectroscopic binaries with astrometric data), which makes it possible to explore a smaller number of parameters in each case, improving the computational efficiency of our MCMC code. Based on observations obtained at the Southern Astrophysical Research (SOAR) telescope, which is a joint project of the Ministério da Ciência, Tecnologia, e Inovação (MCTI) da República Federativa do Brasil, the U.S. National Optical Astronomy Observatory (NOAO), the University of North Carolina at Chapel Hill (UNC), and Michigan State University (MSU).
Fernando, Rohan L; Cheng, Hao; Golden, Bruce L; Garrick, Dorian J
2016-12-08
Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci. We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped. We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM. We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.
NASA Astrophysics Data System (ADS)
Yan, Hongxiang; Moradkhani, Hamid; Abbaszadeh, Peyman
2017-04-01
Assimilation of satellite soil moisture and streamflow data into hydrologic models using has received increasing attention over the past few years. Currently, these observations are increasingly used to improve the model streamflow and soil moisture predictions. However, the performance of this land data assimilation (DA) system still suffers from two limitations: 1) satellite data scarcity and quality; and 2) particle weight degeneration. In order to overcome these two limitations, we propose two possible solutions in this study. First, the general Gaussian geostatistical approach is proposed to overcome the limitation in the space/time resolution of satellite soil moisture products thus improving their accuracy at uncovered/biased grid cells. Secondly, an evolutionary PF approach based on Genetic Algorithm (GA) and Markov Chain Monte Carlo (MCMC), the so-called EPF-MCMC, is developed to further reduce weight degeneration and improve the robustness of the land DA system. This study provides a detailed analysis of the joint and separate assimilation of streamflow and satellite soil moisture into a distributed Sacramento Soil Moisture Accounting (SAC-SMA) model, with the use of recently developed EPF-MCMC and the general Gaussian geostatistical approach. Performance is assessed over several basins in the USA selected from Model Parameter Estimation Experiment (MOPEX) and located in different climate regions. The results indicate that: 1) the general Gaussian approach can predict the soil moisture at uncovered grid cells within the expected satellite data quality threshold; 2) assimilation of satellite soil moisture inferred from the general Gaussian model can significantly improve the soil moisture predictions; and 3) in terms of both deterministic and probabilistic measures, the EPF-MCMC can achieve better streamflow predictions. These results recommend that the geostatistical model is a helpful tool to aid the remote sensing technique and the EPF-MCMC is a reliable and effective DA approach in hydrologic applications.
Bayesian flood forecasting methods: A review
NASA Astrophysics Data System (ADS)
Han, Shasha; Coulibaly, Paulin
2017-08-01
Over the past few decades, floods have been seen as one of the most common and largely distributed natural disasters in the world. If floods could be accurately forecasted in advance, then their negative impacts could be greatly minimized. It is widely recognized that quantification and reduction of uncertainty associated with the hydrologic forecast is of great importance for flood estimation and rational decision making. Bayesian forecasting system (BFS) offers an ideal theoretic framework for uncertainty quantification that can be developed for probabilistic flood forecasting via any deterministic hydrologic model. It provides suitable theoretical structure, empirically validated models and reasonable analytic-numerical computation method, and can be developed into various Bayesian forecasting approaches. This paper presents a comprehensive review on Bayesian forecasting approaches applied in flood forecasting from 1999 till now. The review starts with an overview of fundamentals of BFS and recent advances in BFS, followed with BFS application in river stage forecasting and real-time flood forecasting, then move to a critical analysis by evaluating advantages and limitations of Bayesian forecasting methods and other predictive uncertainty assessment approaches in flood forecasting, and finally discusses the future research direction in Bayesian flood forecasting. Results show that the Bayesian flood forecasting approach is an effective and advanced way for flood estimation, it considers all sources of uncertainties and produces a predictive distribution of the river stage, river discharge or runoff, thus gives more accurate and reliable flood forecasts. Some emerging Bayesian forecasting methods (e.g. ensemble Bayesian forecasting system, Bayesian multi-model combination) were shown to overcome limitations of single model or fixed model weight and effectively reduce predictive uncertainty. In recent years, various Bayesian flood forecasting approaches have been developed and widely applied, but there is still room for improvements. Future research in the context of Bayesian flood forecasting should be on assimilation of various sources of newly available information and improvement of predictive performance assessment methods.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
Natanegara, Fanni; Neuenschwander, Beat; Seaman, John W; Kinnersley, Nelson; Heilmann, Cory R; Ohlssen, David; Rochester, George
2014-01-01
Bayesian applications in medical product development have recently gained popularity. Despite many advances in Bayesian methodology and computations, increase in application across the various areas of medical product development has been modest. The DIA Bayesian Scientific Working Group (BSWG), which includes representatives from industry, regulatory agencies, and academia, has adopted the vision to ensure Bayesian methods are well understood, accepted more broadly, and appropriately utilized to improve decision making and enhance patient outcomes. As Bayesian applications in medical product development are wide ranging, several sub-teams were formed to focus on various topics such as patient safety, non-inferiority, prior specification, comparative effectiveness, joint modeling, program-wide decision making, analytical tools, and education. The focus of this paper is on the recent effort of the BSWG Education sub-team to administer a Bayesian survey to statisticians across 17 organizations involved in medical product development. We summarize results of this survey, from which we provide recommendations on how to accelerate progress in Bayesian applications throughout medical product development. The survey results support findings from the literature and provide additional insight on regulatory acceptance of Bayesian methods and information on the need for a Bayesian infrastructure within an organization. The survey findings support the claim that only modest progress in areas of education and implementation has been made recently, despite substantial progress in Bayesian statistical research and software availability. Copyright © 2013 John Wiley & Sons, Ltd.
Model Diagnostics for Bayesian Networks
ERIC Educational Resources Information Center
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
On the validity of cosmological Fisher matrix forecasts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolz, Laura; Kilbinger, Martin; Weller, Jochen
2012-09-01
We present a comparison of Fisher matrix forecasts for cosmological probes with Monte Carlo Markov Chain (MCMC) posterior likelihood estimation methods. We analyse the performance of future Dark Energy Task Force (DETF) stage-III and stage-IV dark-energy surveys using supernovae, baryon acoustic oscillations and weak lensing as probes. We concentrate in particular on the dark-energy equation of state parameters w{sub 0} and w{sub a}. For purely geometrical probes, and especially when marginalising over w{sub a}, we find considerable disagreement between the two methods, since in this case the Fisher matrix can not reproduce the highly non-elliptical shape of the likelihood function.more » More quantitatively, the Fisher method underestimates the marginalized errors for purely geometrical probes between 30%-70%. For cases including structure formation such as weak lensing, we find that the posterior probability contours from the Fisher matrix estimation are in good agreement with the MCMC contours and the forecasted errors only changing on the 5% level. We then explore non-linear transformations resulting in physically-motivated parameters and investigate whether these parameterisations exhibit a Gaussian behaviour. We conclude that for the purely geometrical probes and, more generally, in cases where it is not known whether the likelihood is close to Gaussian, the Fisher matrix is not the appropriate tool to produce reliable forecasts.« less
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Iterative Importance Sampling Algorithms for Parameter Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.
In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
Park, Eun Sug; Symanski, Elaine; Han, Daikwon; Spiegelman, Clifford
2015-06-01
A major difficulty with assessing source-specific health effects is that source-specific exposures cannot be measured directly; rather, they need to be estimated by a source-apportionment method such as multivariate receptor modeling. The uncertainty in source apportionment (uncertainty in source-specific exposure estimates and model uncertainty due to the unknown number of sources and identifiability conditions) has been largely ignored in previous studies. Also, spatial dependence of multipollutant data collected from multiple monitoring sites has not yet been incorporated into multivariate receptor modeling. The objectives of this project are (1) to develop a multipollutant approach that incorporates both sources of uncertainty in source-apportionment into the assessment of source-specific health effects and (2) to develop enhanced multivariate receptor models that can account for spatial correlations in the multipollutant data collected from multiple sites. We employed a Bayesian hierarchical modeling framework consisting of multivariate receptor models, health-effects models, and a hierarchical model on latent source contributions. For the health model, we focused on the time-series design in this project. Each combination of number of sources and identifiability conditions (additional constraints on model parameters) defines a different model. We built a set of plausible models with extensive exploratory data analyses and with information from previous studies, and then computed posterior model probability to estimate model uncertainty. Parameter estimation and model uncertainty estimation were implemented simultaneously by Markov chain Monte Carlo (MCMC*) methods. We validated the methods using simulated data. We illustrated the methods using PM2.5 (particulate matter ≤ 2.5 μm in aerodynamic diameter) speciation data and mortality data from Phoenix, Arizona, and Houston, Texas. The Phoenix data included counts of cardiovascular deaths and daily PM2.5 speciation data from 1995-1997. The Houston data included respiratory mortality data and 24-hour PM2.5 speciation data sampled every six days from a region near the Houston Ship Channel in years 2002-2005. We also developed a Bayesian spatial multivariate receptor modeling approach that, while simultaneously dealing with the unknown number of sources and identifiability conditions, incorporated spatial correlations in the multipollutant data collected from multiple sites into the estimation of source profiles and contributions based on the discrete process convolution model for multivariate spatial processes. This new modeling approach was applied to 24-hour ambient air concentrations of 17 volatile organic compounds (VOCs) measured at nine monitoring sites in Harris County, Texas, during years 2000 to 2005. Simulation results indicated that our methods were accurate in identifying the true model and estimated parameters were close to the true values. The results from our methods agreed in general with previous studies on the source apportionment of the Phoenix data in terms of estimated source profiles and contributions. However, we had a greater number of statistically insignificant findings, which was likely a natural consequence of incorporating uncertainty in the estimated source contributions into the health-effects parameter estimation. For the Houston data, a model with five sources (that seemed to be Sulfate-Rich Secondary Aerosol, Motor Vehicles, Industrial Combustion, Soil/Crustal Matter, and Sea Salt) showed the highest posterior model probability among the candidate models considered when fitted simultaneously to the PM2.5 and mortality data. There was a statistically significant positive association between respiratory mortality and same-day PM2.5 concentrations attributed to one of the sources (probably industrial combustion). The Bayesian spatial multivariate receptor modeling approach applied to the VOC data led to a highest posterior model probability for a model with five sources (that seemed to be refinery, petrochemical production, gasoline evaporation, natural gas, and vehicular exhaust) among several candidate models, with the number of sources varying between three and seven and with different identifiability conditions. Our multipollutant approach assessing source-specific health effects is more advantageous than a single-pollutant approach in that it can estimate total health effects from multiple pollutants and can also identify emission sources that are responsible for adverse health effects. Our Bayesian approach can incorporate not only uncertainty in the estimated source contributions, but also model uncertainty that has not been addressed in previous studies on assessing source-specific health effects. The new Bayesian spatial multivariate receptor modeling approach enables predictions of source contributions at unmonitored sites, minimizing exposure misclassification and providing improved exposure estimates along with their uncertainty estimates, as well as accounting for uncertainty in the number of sources and identifiability conditions.
Single neuron modeling and data assimilation in BNST neurons
NASA Astrophysics Data System (ADS)
Farsian, Reza
Neurons, although tiny in size, are vastly complicated systems, which are responsible for the most basic yet essential functions of any nervous system. Even the most simple models of single neurons are usually high dimensional, nonlinear, and contain many parameters and states which are unobservable in a typical neurophysiological experiment. One of the most fundamental problems in experimental neurophysiology is the estimation of these parameters and states, since knowing their values is essential in identification, model construction, and forward prediction of biological neurons. Common methods of parameter and state estimation do not perform well for neural models due to their high dimensionality and nonlinearity. In this dissertation, two alternative approaches for parameters and state estimation of biological neurons have been demonstrated: dynamical parameter estimation (DPE) and a Markov Chain Monte Carlo (MCMC) method. The first method uses elements of chaos control and synchronization theory for parameter and state estimation. MCMC is a statistical approach which uses a path integral formulation to evaluate a mean and an error bound for these unobserved parameters and states. These methods have been applied to biological system of neurons in Bed Nucleus of Stria Termialis neurons (BNST) of rats. State and parameters of neurons in both systems were estimated, and their value were used for recreating a realistic model and predicting the behavior of the neurons successfully. The knowledge of biological parameters can ultimately provide a better understanding of the internal dynamics of a neuron in order to build robust models of neuron networks.
A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data.
Huynh, Tran; Quick, Harrison; Ramachandran, Gurumurthy; Banerjee, Sudipto; Stenzel, Mark; Sandler, Dale P; Engel, Lawrence S; Kwok, Richard K; Blair, Aaron; Stewart, Patricia A
2016-01-01
Classical statistical methods for analyzing exposure data with values below the detection limits are well described in the occupational hygiene literature, but an evaluation of a Bayesian approach for handling such data is currently lacking. Here, we first describe a Bayesian framework for analyzing censored data. We then present the results of a simulation study conducted to compare the β-substitution method with a Bayesian method for exposure datasets drawn from lognormal distributions and mixed lognormal distributions with varying sample sizes, geometric standard deviations (GSDs), and censoring for single and multiple limits of detection. For each set of factors, estimates for the arithmetic mean (AM), geometric mean, GSD, and the 95th percentile (X0.95) of the exposure distribution were obtained. We evaluated the performance of each method using relative bias, the root mean squared error (rMSE), and coverage (the proportion of the computed 95% uncertainty intervals containing the true value). The Bayesian method using non-informative priors and the β-substitution method were generally comparable in bias and rMSE when estimating the AM and GM. For the GSD and the 95th percentile, the Bayesian method with non-informative priors was more biased and had a higher rMSE than the β-substitution method, but use of more informative priors generally improved the Bayesian method's performance, making both the bias and the rMSE more comparable to the β-substitution method. An advantage of the Bayesian method is that it provided estimates of uncertainty for these parameters of interest and good coverage, whereas the β-substitution method only provided estimates of uncertainty for the AM, and coverage was not as consistent. Selection of one or the other method depends on the needs of the practitioner, the availability of prior information, and the distribution characteristics of the measurement data. We suggest the use of Bayesian methods if the practitioner has the computational resources and prior information, as the method would generally provide accurate estimates and also provides the distributions of all of the parameters, which could be useful for making decisions in some applications. © The Author 2015. Published by Oxford University Press on behalf of the British Occupational Hygiene Society.
LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation
2015-09-30
the complexity of the model: (from simplest to most complex) Kalman filter , Markov chain Monte-Carlo (MCMC) and ABC. Many of these methods have been...using SMMs fitted using Kalman filters . Therefore, using the DTAG data, we can estimate the distributions associated with 2D horizontal displacement...speed (a key problem in the previous Kalman filter implementation). This new approach also allows the animal’s horizontal movement direction to differ
Asteroid mass estimation using Markov-Chain Monte Carlo techniques
NASA Astrophysics Data System (ADS)
Siltala, Lauri; Granvik, Mikael
2016-10-01
Estimates for asteroid masses are based on their gravitational perturbations on the orbits of other objects such as Mars, spacecraft, or other asteroids and/or their satellites. In the case of asteroid-asteroid perturbations, this leads to a 13-dimensional inverse problem where the aim is to derive the mass of the perturbing asteroid and six orbital elements for both the perturbing asteroid and the test asteroid using astrometric observations. We have developed and implemented three different mass estimation algorithms utilizing asteroid-asteroid perturbations into the OpenOrb asteroid-orbit-computation software: the very rough 'marching' approximation, in which the asteroid orbits are fixed at a given epoch, reducing the problem to a one-dimensional estimation of the mass, an implementation of the Nelder-Mead simplex method, and most significantly, a Markov-Chain Monte Carlo (MCMC) approach. We will introduce each of these algorithms with particular focus on the MCMC algorithm, and present example results for both synthetic and real data. Our results agree with the published mass estimates, but suggest that the published uncertainties may be misleading as a consequence of using linearized mass-estimation methods. Finally, we discuss remaining challenges with the algorithms as well as future plans, particularly in connection with ESA's Gaia mission.
Hierarchical models and the analysis of bird survey information
Sauer, J.R.; Link, W.A.
2003-01-01
Management of birds often requires analysis of collections of estimates. We describe a hierarchical modeling approach to the analysis of these data, in which parameters associated with the individual species estimates are treated as random variables, and probability statements are made about the species parameters conditioned on the data. A Markov-Chain Monte Carlo (MCMC) procedure is used to fit the hierarchical model. This approach is computer intensive, and is based upon simulation. MCMC allows for estimation both of parameters and of derived statistics. To illustrate the application of this method, we use the case in which we are interested in attributes of a collection of estimates of population change. Using data for 28 species of grassland-breeding birds from the North American Breeding Bird Survey, we estimate the number of species with increasing populations, provide precision-adjusted rankings of species trends, and describe a measure of population stability as the probability that the trend for a species is within a certain interval. Hierarchical models can be applied to a variety of bird survey applications, and we are investigating their use in estimation of population change from survey data.
Du, Yuanwei; Guo, Yubin
2015-01-01
The intrinsic mechanism of multimorbidity is difficult to recognize and prediction and diagnosis are difficult to carry out accordingly. Bayesian networks can help to diagnose multimorbidity in health care, but it is difficult to obtain the conditional probability table (CPT) because of the lack of clinically statistical data. Today, expert knowledge and experience are increasingly used in training Bayesian networks in order to help predict or diagnose diseases, but the CPT in Bayesian networks is usually irrational or ineffective for ignoring realistic constraints especially in multimorbidity. In order to solve these problems, an evidence reasoning (ER) approach is employed to extract and fuse inference data from experts using a belief distribution and recursive ER algorithm, based on which evidence reasoning method for constructing conditional probability tables in Bayesian network of multimorbidity is presented step by step. A multimorbidity numerical example is used to demonstrate the method and prove its feasibility and application. Bayesian network can be determined as long as the inference assessment is inferred by each expert according to his/her knowledge or experience. Our method is more effective than existing methods for extracting expert inference data accurately and is fused effectively for constructing CPTs in a Bayesian network of multimorbidity.