A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Yang, Ziheng; Zhu, Tianqi
2018-02-20
The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.
Link, William; Sauer, John R.
2016-01-01
The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.
Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
NASA Astrophysics Data System (ADS)
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Universal Darwinism As a Process of Bayesian Inference
Campbell, John O.
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an “experiment” in the external world environment, and the results of that “experiment” or the “surprise” entailed by predicted and actual outcomes of the “experiment.” Minimization of free energy implies that the implicit measure of “surprise” experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438
Metrics for evaluating performance and uncertainty of Bayesian network models
Bruce G. Marcot
2012-01-01
This paper presents a selected set of existing and new metrics for gauging Bayesian network model performance and uncertainty. Selected existing and new metrics are discussed for conducting model sensitivity analysis (variance reduction, entropy reduction, case file simulation); evaluating scenarios (influence analysis); depicting model complexity (numbers of model...
Hippert, Henrique S; Taylor, James W
2010-04-01
Artificial neural networks have frequently been proposed for electricity load forecasting because of their capabilities for the nonlinear modelling of large multivariate data sets. Modelling with neural networks is not an easy task though; two of the main challenges are defining the appropriate level of model complexity, and choosing the input variables. This paper evaluates techniques for automatic neural network modelling within a Bayesian framework, as applied to six samples containing daily load and weather data for four different countries. We analyse input selection as carried out by the Bayesian 'automatic relevance determination', and the usefulness of the Bayesian 'evidence' for the selection of the best structure (in terms of number of neurones), as compared to methods based on cross-validation. Copyright 2009 Elsevier Ltd. All rights reserved.
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk
2005-01-01
We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Bayesian accounts of covert selective attention: A tutorial review.
Vincent, Benjamin T
2015-05-01
Decision making and optimal observer models offer an important theoretical approach to the study of covert selective attention. While their probabilistic formulation allows quantitative comparison to human performance, the models can be complex and their insights are not always immediately apparent. Part 1 establishes the theoretical appeal of the Bayesian approach, and introduces the way in which probabilistic approaches can be applied to covert search paradigms. Part 2 presents novel formulations of Bayesian models of 4 important covert attention paradigms, illustrating optimal observer predictions over a range of experimental manipulations. Graphical model notation is used to present models in an accessible way and Supplementary Code is provided to help bridge the gap between model theory and practical implementation. Part 3 reviews a large body of empirical and modelling evidence showing that many experimental phenomena in the domain of covert selective attention are a set of by-products. These effects emerge as the result of observers conducting Bayesian inference with noisy sensory observations, prior expectations, and knowledge of the generative structure of the stimulus environment.
Thomas, D.L.; Johnson, D.; Griffith, B.
2006-01-01
Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a Bayesian hierarchical discrete-choice model for resource selection can provide managers with 2 components of population-level inference: average population selection and variability of selection. Both components are necessary to make sound management decisions based on animal selection.
NASA Astrophysics Data System (ADS)
Farrell, Kathryn; Oden, J. Tinsley; Faghihi, Danial
2015-08-01
A general adaptive modeling algorithm for selection and validation of coarse-grained models of atomistic systems is presented. A Bayesian framework is developed to address uncertainties in parameters, data, and model selection. Algorithms for computing output sensitivities to parameter variances, model evidence and posterior model plausibilities for given data, and for computing what are referred to as Occam Categories in reference to a rough measure of model simplicity, make up components of the overall approach. Computational results are provided for representative applications.
Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan
2009-01-01
The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397
Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework
NASA Astrophysics Data System (ADS)
Achieng, K. O.; Zhu, J.
2017-12-01
There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?
ERIC Educational Resources Information Center
Vrieze, Scott I.
2012-01-01
This article reviews the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in model selection and the appraisal of psychological theory. The focus is on latent variable models, given their growing use in theory testing and construction. Theoretical statistical results in regression are discussed, and more important…
Bayesian Group Bridge for Bi-level Variable Selection.
Mallick, Himel; Yi, Nengjun
2017-06-01
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Protein construct storage: Bayesian variable selection and prediction with mixtures.
Clyde, M A; Parmigiani, G
1998-07-01
Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.
Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction
Shen, Li; Qi, Yuan; Kim, Sungeun; Nho, Kwangsik; Wan, Jing; Risacher, Shannon L.; Saykin, Andrew J.
2010-01-01
We apply sparse Bayesian learning methods, automatic relevance determination (ARD) and predictive ARD (PARD), to Alzheimer’s disease (AD) classification to make accurate prediction and identify critical imaging markers relevant to AD at the same time. ARD is one of the most successful Bayesian feature selection methods. PARD is a powerful Bayesian feature selection method, and provides sparse models that is easy to interpret. PARD selects the model with the best estimate of the predictive performance instead of choosing the one with the largest marginal model likelihood. Comparative study with support vector machine (SVM) shows that ARD/PARD in general outperform SVM in terms of prediction accuracy. Additional comparison with surface-based general linear model (GLM) analysis shows that regions with strongest signals are identified by both GLM and ARD/PARD. While GLM P-map returns significant regions all over the cortex, ARD/PARD provide a small number of relevant and meaningful imaging markers with predictive power, including both cortical and subcortical measures. PMID:20879451
Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.
Dettmer, Jan; Dosso, Stan E; Osler, John C
2010-12-01
This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.
ERIC Educational Resources Information Center
Hsieh, Chueh-An; Maier, Kimberly S.
2009-01-01
The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…
NASA Astrophysics Data System (ADS)
Ben Abdessalem, Anis; Dervilis, Nikolaos; Wagg, David; Worden, Keith
2018-01-01
This paper will introduce the use of the approximate Bayesian computation (ABC) algorithm for model selection and parameter estimation in structural dynamics. ABC is a likelihood-free method typically used when the likelihood function is either intractable or cannot be approached in a closed form. To circumvent the evaluation of the likelihood function, simulation from a forward model is at the core of the ABC algorithm. The algorithm offers the possibility to use different metrics and summary statistics representative of the data to carry out Bayesian inference. The efficacy of the algorithm in structural dynamics is demonstrated through three different illustrative examples of nonlinear system identification: cubic and cubic-quintic models, the Bouc-Wen model and the Duffing oscillator. The obtained results suggest that ABC is a promising alternative to deal with model selection and parameter estimation issues, specifically for systems with complex behaviours.
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
Adaptive selection and validation of models of complex systems in the presence of uncertainty
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell-Maupin, Kathryn; Oden, J. T.
This study describes versions of OPAL, the Occam-Plausibility Algorithm in which the use of Bayesian model plausibilities is replaced with information theoretic methods, such as the Akaike Information Criterion and the Bayes Information Criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
Adaptive selection and validation of models of complex systems in the presence of uncertainty
Farrell-Maupin, Kathryn; Oden, J. T.
2017-08-01
This study describes versions of OPAL, the Occam-Plausibility Algorithm in which the use of Bayesian model plausibilities is replaced with information theoretic methods, such as the Akaike Information Criterion and the Bayes Information Criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
ERIC Educational Resources Information Center
Sebro, Negusse Yohannes; Goshu, Ayele Taye
2017-01-01
This study aims to explore Bayesian multilevel modeling to investigate variations of average academic achievement of grade eight school students. A sample of 636 students is randomly selected from 26 private and government schools by a two-stage stratified sampling design. Bayesian method is used to estimate the fixed and random effects. Input and…
Model selection and assessment for multi-species occupancy models
Broms, Kristin M.; Hooten, Mevin B.; Fitzpatrick, Ryan M.
2016-01-01
While multi-species occupancy models (MSOMs) are emerging as a popular method for analyzing biodiversity data, formal checking and validation approaches for this class of models have lagged behind. Concurrent with the rise in application of MSOMs among ecologists, a quiet regime shift is occurring in Bayesian statistics where predictive model comparison approaches are experiencing a resurgence. Unlike single-species occupancy models that use integrated likelihoods, MSOMs are usually couched in a Bayesian framework and contain multiple levels. Standard model checking and selection methods are often unreliable in this setting and there is only limited guidance in the ecological literature for this class of models. We examined several different contemporary Bayesian hierarchical approaches for checking and validating MSOMs and applied these methods to a freshwater aquatic study system in Colorado, USA, to better understand the diversity and distributions of plains fishes. Our findings indicated distinct differences among model selection approaches, with cross-validation techniques performing the best in terms of prediction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Estimation of selection intensity under overdominance by Bayesian methods.
Buzbas, Erkan Ozge; Joyce, Paul; Abdo, Zaid
2009-01-01
A balanced pattern in the allele frequencies of polymorphic loci is a potential sign of selection, particularly of overdominance. Although this type of selection is of some interest in population genetics, there exists no likelihood based approaches specifically tailored to make inference on selection intensity. To fill this gap, we present Bayesian methods to estimate selection intensity under k-allele models with overdominance. Our model allows for an arbitrary number of loci and alleles within a locus. The neutral and selected variability within each locus are modeled with corresponding k-allele models. To estimate the posterior distribution of the mean selection intensity in a multilocus region, a hierarchical setup between loci is used. The methods are demonstrated with data at the Human Leukocyte Antigen loci from world-wide populations.
Bayesian Modeling of a Human MMORPG Player
NASA Astrophysics Data System (ADS)
Synnaeve, Gabriel; Bessière, Pierre
2011-03-01
This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-01-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
Comparing Families of Dynamic Causal Models
Penny, Will D.; Stephan, Klaas E.; Daunizeau, Jean; Rosa, Maria J.; Friston, Karl J.; Schofield, Thomas M.; Leff, Alex P.
2010-01-01
Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data. PMID:20300649
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Fitting Residual Error Structures for Growth Models in SAS PROC MCMC
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In behavioral sciences broadly, estimating growth models with Bayesian methods is becoming increasingly common, especially to combat small samples common with longitudinal data. Although Mplus is becoming an increasingly common program for applied research employing Bayesian methods, the limited selection of prior distributions for the elements of…
Model selection and Bayesian inference for high-resolution seabed reflection inversion.
Dettmer, Jan; Dosso, Stan E; Holland, Charles W
2009-02-01
This paper applies Bayesian inference, including model selection and posterior parameter inference, to inversion of seabed reflection data to resolve sediment structure at a spatial scale below the pulse length of the acoustic source. A practical approach to model selection is used, employing the Bayesian information criterion to decide on the number of sediment layers needed to sufficiently fit the data while satisfying parsimony to avoid overparametrization. Posterior parameter inference is carried out using an efficient Metropolis-Hastings algorithm for high-dimensional models, and results are presented as marginal-probability depth distributions for sound velocity, density, and attenuation. The approach is applied to plane-wave reflection-coefficient inversion of single-bounce data collected on the Malta Plateau, Mediterranean Sea, which indicate complex fine structure close to the water-sediment interface. This fine structure is resolved in the geoacoustic inversion results in terms of four layers within the upper meter of sediments. The inversion results are in good agreement with parameter estimates from a gravity core taken at the experiment site.
Spatio-temporal Bayesian model selection for disease mapping
Carroll, R; Lawson, AB; Faes, C; Kirby, RS; Aregay, M; Watjou, K
2016-01-01
Spatio-temporal analysis of small area health data often involves choosing a fixed set of predictors prior to the final model fit. In this paper, we propose a spatio-temporal approach of Bayesian model selection to implement model selection for certain areas of the study region as well as certain years in the study time line. Here, we examine the usefulness of this approach by way of a large-scale simulation study accompanied by a case study. Our results suggest that a special case of the model selection methods, a mixture model allowing a weight parameter to indicate if the appropriate linear predictor is spatial, spatio-temporal, or a mixture of the two, offers the best option to fitting these spatio-temporal models. In addition, the case study illustrates the effectiveness of this mixture model within the model selection setting by easily accommodating lifestyle, socio-economic, and physical environmental variables to select a predominantly spatio-temporal linear predictor. PMID:28070156
Bayesian model selection applied to artificial neural networks used for water resources modeling
NASA Astrophysics Data System (ADS)
Kingston, Greer B.; Maier, Holger R.; Lambert, Martin F.
2008-04-01
Artificial neural networks (ANNs) have proven to be extremely valuable tools in the field of water resources engineering. However, one of the most difficult tasks in developing an ANN is determining the optimum level of complexity required to model a given problem, as there is no formal systematic model selection method. This paper presents a Bayesian model selection (BMS) method for ANNs that provides an objective approach for comparing models of varying complexity in order to select the most appropriate ANN structure. The approach uses Markov Chain Monte Carlo posterior simulations to estimate the evidence in favor of competing models and, in this study, three known methods for doing this are compared in terms of their suitability for being incorporated into the proposed BMS framework for ANNs. However, it is acknowledged that it can be particularly difficult to accurately estimate the evidence of ANN models. Therefore, the proposed BMS approach for ANNs incorporates a further check of the evidence results by inspecting the marginal posterior distributions of the hidden-to-output layer weights, which unambiguously indicate any redundancies in the hidden layer nodes. The fact that this check is available is one of the greatest advantages of the proposed approach over conventional model selection methods, which do not provide such a test and instead rely on the modeler's subjective choice of selection criterion. The advantages of a total Bayesian approach to ANN development, including training and model selection, are demonstrated on two synthetic and one real world water resources case study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Portone, Teresa; Niederhaus, John Henry; Sanchez, Jason James
This report introduces the concepts of Bayesian model selection, which provides a systematic means of calibrating and selecting an optimal model to represent a phenomenon. This has many potential applications, including for comparing constitutive models. The ideas described herein are applied to a model selection problem between different yield models for hardened steel under extreme loading conditions.
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Wheeler, David C.; Hickson, DeMarc A.; Waller, Lance A.
2010-01-01
Many diagnostic tools and goodness-of-fit measures, such as the Akaike information criterion (AIC) and the Bayesian deviance information criterion (DIC), are available to evaluate the overall adequacy of linear regression models. In addition, visually assessing adequacy in models has become an essential part of any regression analysis. In this paper, we focus on a spatial consideration of the local DIC measure for model selection and goodness-of-fit evaluation. We use a partitioning of the DIC into the local DIC, leverage, and deviance residuals to assess local model fit and influence for both individual observations and groups of observations in a Bayesian framework. We use visualization of the local DIC and differences in local DIC between models to assist in model selection and to visualize the global and local impacts of adding covariates or model parameters. We demonstrate the utility of the local DIC in assessing model adequacy using HIV prevalence data from pregnant women in the Butare province of Rwanda during 1989-1993 using a range of linear model specifications, from global effects only to spatially varying coefficient models, and a set of covariates related to sexual behavior. Results of applying the diagnostic visualization approach include more refined model selection and greater understanding of the models as applied to the data. PMID:21243121
Ferragina, A.; de los Campos, G.; Vazquez, A. I.; Cecchinato, A.; Bittante, G.
2017-01-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict “difficult-to-predict” dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm−1 were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R2 value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R2 (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R2 of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. PMID:26387015
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Bayesian conditional-independence modeling of the AIDS epidemic in England and Wales
NASA Astrophysics Data System (ADS)
Gilks, Walter R.; De Angelis, Daniela; Day, Nicholas E.
We describe the use of conditional-independence modeling, Bayesian inference and Markov chain Monte Carlo, to model and project the HIV-AIDS epidemic in homosexual/bisexual males in England and Wales. Complexity in this analysis arises through selectively missing data, indirectly observed underlying processes, and measurement error. Our emphasis is on presentation and discussion of the concepts, not on the technicalities of this analysis, which can be found elsewhere [D. De Angelis, W.R. Gilks, N.E. Day, Bayesian projection of the the acquired immune deficiency syndrome epidemic (with discussion), Applied Statistics, in press].
A solution to the static frame validation challenge problem using Bayesian model selection
Grigoriu, M. D.; Field, R. V.
2007-12-23
Within this paper, we provide a solution to the static frame validation challenge problem (see this issue) in a manner that is consistent with the guidelines provided by the Validation Challenge Workshop tasking document. The static frame problem is constructed such that variability in material properties is known to be the only source of uncertainty in the system description, but there is ignorance on the type of model that best describes this variability. Hence both types of uncertainty, aleatoric and epistemic, are present and must be addressed. Our approach is to consider a collection of competing probabilistic models for themore » material properties, and calibrate these models to the information provided; models of different levels of complexity and numerical efficiency are included in the analysis. A Bayesian formulation is used to select the optimal model from the collection, which is then used for the regulatory assessment. Lastly, bayesian credible intervals are used to provide a measure of confidence to our regulatory assessment.« less
Increasing selection response by Bayesian modeling of heterogeneous environmental variances
USDA-ARS?s Scientific Manuscript database
Heterogeneity of environmental variance among genotypes reduces selection response because genotypes with higher variance are more likely to be selected than low-variance genotypes. Modeling heterogeneous variances to obtain weighted means corrected for heterogeneous variances is difficult in likel...
Refining value-at-risk estimates using a Bayesian Markov-switching GJR-GARCH copula-EVT model.
Sampid, Marius Galabe; Hasim, Haslifah M; Dai, Hongsheng
2018-01-01
In this paper, we propose a model for forecasting Value-at-Risk (VaR) using a Bayesian Markov-switching GJR-GARCH(1,1) model with skewed Student's-t innovation, copula functions and extreme value theory. A Bayesian Markov-switching GJR-GARCH(1,1) model that identifies non-constant volatility over time and allows the GARCH parameters to vary over time following a Markov process, is combined with copula functions and EVT to formulate the Bayesian Markov-switching GJR-GARCH(1,1) copula-EVT VaR model, which is then used to forecast the level of risk on financial asset returns. We further propose a new method for threshold selection in EVT analysis, which we term the hybrid method. Empirical and back-testing results show that the proposed VaR models capture VaR reasonably well in periods of calm and in periods of crisis.
Calus, M P L; de Haas, Y; Veerkamp, R F
2013-10-01
Genomic selection holds the promise to be particularly beneficial for traits that are difficult or expensive to measure, such that access to phenotypes on large daughter groups of bulls is limited. Instead, cow reference populations can be generated, potentially supplemented with existing information from the same or (highly) correlated traits available on bull reference populations. The objective of this study, therefore, was to develop a model to perform genomic predictions and genome-wide association studies based on a combined cow and bull reference data set, with the accuracy of the phenotypes differing between the cow and bull genomic selection reference populations. The developed bivariate Bayesian stochastic search variable selection model allowed for an unbalanced design by imputing residuals in the residual updating scheme for all missing records. The performance of this model is demonstrated on a real data example, where the analyzed trait, being milk fat or protein yield, was either measured only on a cow or a bull reference population, or recorded on both. Our results were that the developed bivariate Bayesian stochastic search variable selection model was able to analyze 2 traits, even though animals had measurements on only 1 of 2 traits. The Bayesian stochastic search variable selection model yielded consistently higher accuracy for fat yield compared with a model without variable selection, both for the univariate and bivariate analyses, whereas the accuracy of both models was very similar for protein yield. The bivariate model identified several additional quantitative trait loci peaks compared with the single-trait models on either trait. In addition, the bivariate models showed a marginal increase in accuracy of genomic predictions for the cow traits (0.01-0.05), although a greater increase in accuracy is expected as the size of the bull population increases. Our results emphasize that the chosen value of priors in Bayesian genomic prediction models are especially important in small data sets. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Model Selection Methods for Mixture Dichotomous IRT Models
ERIC Educational Resources Information Center
Li, Feiming; Cohen, Allan S.; Kim, Seock-Ho; Cho, Sun-Joo
2009-01-01
This study examines model selection indices for use with dichotomous mixture item response theory (IRT) models. Five indices are considered: Akaike's information coefficient (AIC), Bayesian information coefficient (BIC), deviance information coefficient (DIC), pseudo-Bayes factor (PsBF), and posterior predictive model checks (PPMC). The five…
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
IRT Model Selection Methods for Dichotomous Items
ERIC Educational Resources Information Center
Kang, Taehoon; Cohen, Allan S.
2007-01-01
Fit of the model to the data is important if the benefits of item response theory (IRT) are to be obtained. In this study, the authors compared model selection results using the likelihood ratio test, two information-based criteria, and two Bayesian methods. An example illustrated the potential for inconsistency in model selection depending on…
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R(2) value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R(2) (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R(2) of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Thomson, James R; Kimmerer, Wim J; Brown, Larry R; Newman, Ken B; Mac Nally, Ralph; Bennett, William A; Feyrer, Frederick; Fleishman, Erica
2010-07-01
We examined trends in abundance of four pelagic fish species (delta smelt, longfin smelt, striped bass, and threadfin shad) in the upper San Francisco Estuary, California, USA, over 40 years using Bayesian change point models. Change point models identify times of abrupt or unusual changes in absolute abundance (step changes) or in rates of change in abundance (trend changes). We coupled Bayesian model selection with linear regression splines to identify biotic or abiotic covariates with the strongest associations with abundances of each species. We then refitted change point models conditional on the selected covariates to explore whether those covariates could explain statistical trends or change points in species abundances. We also fitted a multispecies change point model that identified change points common to all species. All models included hierarchical structures to model data uncertainties, including observation errors and missing covariate values. There were step declines in abundances of all four species in the early 2000s, with a likely common decline in 2002. Abiotic variables, including water clarity, position of the 2 per thousand isohaline (X2), and the volume of freshwater exported from the estuary, explained some variation in species' abundances over the time series, but no selected covariates could explain statistically the post-2000 change points for any species.
Silva Junqueira, Vinícius; de Azevedo Peixoto, Leonardo; Galvêas Laviola, Bruno; Lopes Bhering, Leonardo; Mendonça, Simone; Agostini Costa, Tania da Silveira; Antoniassi, Rosemar
2016-01-01
The biggest challenge for jatropha breeding is to identify superior genotypes that present high seed yield and seed oil content with reduced toxicity levels. Therefore, the objective of this study was to estimate genetic parameters for three important traits (weight of 100 seed, oil seed content, and phorbol ester concentration), and to select superior genotypes to be used as progenitors in jatropha breeding. Additionally, the genotypic values and the genetic parameters estimated under the Bayesian multi-trait approach were used to evaluate different selection indices scenarios of 179 half-sib families. Three different scenarios and economic weights were considered. It was possible to simultaneously reduce toxicity and increase seed oil content and weight of 100 seed by using index selection based on genotypic value estimated by the Bayesian multi-trait approach. Indeed, we identified two families that present these characteristics by evaluating genetic diversity using the Ward clustering method, which suggested nine homogenous clusters. Future researches must integrate the Bayesian multi-trait methods with realized relationship matrix, aiming to build accurate selection indices models. PMID:27281340
Posada, David; Buckley, Thomas R
2004-10-01
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
NASA Astrophysics Data System (ADS)
Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. P.
2013-09-01
We study uncertainty quantification in remote sensing of aerosols in the atmosphere with top of the atmosphere reflectance measurements from the nadir-viewing Ozone Monitoring Instrument (OMI). Focus is on the uncertainty in aerosol model selection of pre-calculated aerosol models and on the statistical modelling of the model inadequacies. The aim is to apply statistical methodologies that improve the uncertainty estimates of the aerosol optical thickness (AOT) retrieval by propagating model selection and model error related uncertainties more realistically. We utilise Bayesian model selection and model averaging methods for the model selection problem and use Gaussian processes to model the smooth systematic discrepancies from the modelled to observed reflectance. The systematic model error is learned from an ensemble of operational retrievals. The operational OMI multi-wavelength aerosol retrieval algorithm OMAERO is used for cloud free, over land pixels of the OMI instrument with the additional Bayesian model selection and model discrepancy techniques. The method is demonstrated with four examples with different aerosol properties: weakly absorbing aerosols, forest fires over Greece and Russia, and Sahara dessert dust. The presented statistical methodology is general; it is not restricted to this particular satellite retrieval application.
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H
2017-01-01
In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Model Selection in Historical Research Using Approximate Bayesian Computation
Rubio-Campillo, Xavier
2016-01-01
Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. Case Study This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Impact Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. PMID:26730953
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Wöhling, Thomas; Nowak, Wolfgang
2014-05-01
Bayesian model averaging ranks the predictive capabilities of alternative conceptual models based on Bayes' theorem. The individual models are weighted with their posterior probability to be the best one in the considered set of models. Finally, their predictions are combined into a robust weighted average and the predictive uncertainty can be quantified. This rigorous procedure does, however, not yet account for possible instabilities due to measurement noise in the calibration data set. This is a major drawback, since posterior model weights may suffer a lack of robustness related to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new statistical concept to account for measurement noise as source of uncertainty for the weights in Bayesian model averaging. Our suggested upgrade reflects the limited information content of data for the purpose of model selection. It allows us to assess the significance of the determined posterior model weights, the confidence in model selection, and the accuracy of the quantified predictive uncertainty. Our approach rests on a brute-force Monte Carlo framework. We determine the robustness of model weights against measurement noise by repeatedly perturbing the observed data with random realizations of measurement error. Then, we analyze the induced variability in posterior model weights and introduce this "weighting variance" as an additional term into the overall prediction uncertainty analysis scheme. We further determine the theoretical upper limit in performance of the model set which is imposed by measurement noise. As an extension to the merely relative model ranking, this analysis provides a measure of absolute model performance. To finally decide, whether better data or longer time series are needed to ensure a robust basis for model selection, we resample the measurement time series and assess the convergence of model weights for increasing time series length. We illustrate our suggested approach with an application to model selection between different soil-plant models following up on a study by Wöhling et al. (2013). Results show that measurement noise compromises the reliability of model ranking and causes a significant amount of weighting uncertainty, if the calibration data time series is not long enough to compensate for its noisiness. This additional contribution to the overall predictive uncertainty is neglected without our approach. Thus, we strongly advertise to include our suggested upgrade in the Bayesian model averaging routine.
A Rational Analysis of the Selection Task as Optimal Data Selection.
ERIC Educational Resources Information Center
Oaksford, Mike; Chater, Nick
1994-01-01
Experimental data on human reasoning in hypothesis-testing tasks is reassessed in light of a Bayesian model of optimal data selection in inductive hypothesis testing. The rational analysis provided by the model suggests that reasoning in such tasks may be rational rather than subject to systematic bias. (SLD)
Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.
2009-01-01
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
Application of Bayesian model averaging to measurements of the primordial power spectrum
NASA Astrophysics Data System (ADS)
Parkinson, David; Liddle, Andrew R.
2010-11-01
Cosmological parameter uncertainties are often stated assuming a particular model, neglecting the model uncertainty, even when Bayesian model selection is unable to identify a conclusive best model. Bayesian model averaging is a method for assessing parameter uncertainties in situations where there is also uncertainty in the underlying model. We apply model averaging to the estimation of the parameters associated with the primordial power spectra of curvature and tensor perturbations. We use CosmoNest and MultiNest to compute the model evidences and posteriors, using cosmic microwave data from WMAP, ACBAR, BOOMERanG, and CBI, plus large-scale structure data from the SDSS DR7. We find that the model-averaged 95% credible interval for the spectral index using all of the data is 0.940
A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures.
Tanner, Martin A.; Peng, Fengchun; Jacobs, Robert A.
1997-03-01
There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectures. These architectures combine aspects of generalized linear models with those of finite mixture models in order to perform tasks via a recursive "divide-and-conquer" strategy. Markov chain Monte Carlo methodology is used to estimate the distribution of the architectures' parameters. One part of our approach to model selection attempts to estimate the worth of each component of an architecture so that relatively unused components can be pruned from the architecture's structure. A second part of this approach uses a Bayesian hypothesis testing procedure in order to differentiate inputs that carry useful information from nuisance inputs. Simulation results suggest that the approach presented here adheres to the dictum of Occam's razor; simple architectures that are adequate for summarizing the data are favored over more complex structures. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.
Bayesian Model Selection under Time Constraints
NASA Astrophysics Data System (ADS)
Hoege, M.; Nowak, W.; Illman, W. A.
2017-12-01
Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
Garrard, Lili; Price, Larry R.; Bott, Marjorie J.; Gajewski, Byron J.
2016-01-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts’ bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts’ information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts’ content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development. PMID:27667878
Garrard, Lili; Price, Larry R; Bott, Marjorie J; Gajewski, Byron J
2016-10-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts' bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts' information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts' content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development.
Bayesian Inference in Satellite Gravity Inversion
NASA Technical Reports Server (NTRS)
Kis, K. I.; Taylor, Patrick T.; Wittmann, G.; Kim, Hyung Rae; Torony, B.; Mayer-Guerr, T.
2005-01-01
To solve a geophysical inverse problem means applying measurements to determine the parameters of the selected model. The inverse problem is formulated as the Bayesian inference. The Gaussian probability density functions are applied in the Bayes's equation. The CHAMP satellite gravity data are determined at the altitude of 400 kilometer altitude over the South part of the Pannonian basin. The model of interpretation is the right vertical cylinder. The parameters of the model are obtained from the minimum problem solved by the Simplex method.
2012-01-01
Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Adrion, Christine; Mansmann, Ulrich
2012-09-10
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
NASA Astrophysics Data System (ADS)
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Zhang, Xiang; Faries, Douglas E; Boytsov, Natalie; Stamey, James D; Seaman, John W
2016-09-01
Observational studies are frequently used to assess the effectiveness of medical interventions in routine clinical practice. However, the use of observational data for comparative effectiveness is challenged by selection bias and the potential of unmeasured confounding. This is especially problematic for analyses using a health care administrative database, in which key clinical measures are often not available. This paper provides an approach to conducting a sensitivity analyses to investigate the impact of unmeasured confounding in observational studies. In a real world osteoporosis comparative effectiveness study, the bone mineral density (BMD) score, an important predictor of fracture risk and a factor in the selection of osteoporosis treatments, is unavailable in the data base and lack of baseline BMD could potentially lead to significant selection bias. We implemented Bayesian twin-regression models, which simultaneously model both the observed outcome and the unobserved unmeasured confounder, using information from external sources. A sensitivity analysis was also conducted to assess the robustness of our conclusions to changes in such external data. The use of Bayesian modeling in this study suggests that the lack of baseline BMD did have a strong impact on the analysis, reversing the direction of the estimated effect (odds ratio of fracture incidence at 24 months: 0.40 vs. 1.36, with/without adjusting for unmeasured baseline BMD). The Bayesian twin-regression models provide a flexible sensitivity analysis tool to quantitatively assess the impact of unmeasured confounding in observational studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Bayesian transformation cure frailty models with multivariate failure time data.
Yin, Guosheng
2008-12-10
We propose a class of transformation cure frailty models to accommodate a survival fraction in multivariate failure time data. Established through a general power transformation, this family of cure frailty models includes the proportional hazards and the proportional odds modeling structures as two special cases. Within the Bayesian paradigm, we obtain the joint posterior distribution and the corresponding full conditional distributions of the model parameters for the implementation of Gibbs sampling. Model selection is based on the conditional predictive ordinate statistic and deviance information criterion. As an illustration, we apply the proposed method to a real data set from dentistry.
ERIC Educational Resources Information Center
Beretvas, S. Natasha; Murphy, Daniel L.
2013-01-01
The authors assessed correct model identification rates of Akaike's information criterion (AIC), corrected criterion (AICC), consistent AIC (CAIC), Hannon and Quinn's information criterion (HQIC), and Bayesian information criterion (BIC) for selecting among cross-classified random effects models. Performance of default values for the 5…
Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C
2013-01-01
Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686
Bayesian effect estimation accounting for adjustment uncertainty.
Wang, Chi; Parmigiani, Giovanni; Dominici, Francesca
2012-09-01
Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter, ω, denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (ω= 1), BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC, with ω > 1, estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We, then, compare BAC, a recent approach of Crainiceanu, Dominici, and Parmigiani (2008, Biometrika 95, 635-651), and traditional BMA in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999-2005. Using each approach, we estimate the short-term effects of on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty. © 2012, The International Biometric Society.
Safari, Parviz; Danyali, Syyedeh Fatemeh; Rahimi, Mehdi
2018-06-02
Drought is the main abiotic stress seriously influencing wheat production. Information about the inheritance of drought tolerance is necessary to determine the most appropriate strategy to develop tolerant cultivars and populations. In this study, generation means analysis to identify the genetic effects controlling grain yield inheritance in water deficit and normal conditions was considered as a model selection problem in a Bayesian framework. Stochastic search variable selection (SSVS) was applied to identify the most important genetic effects and the best fitted models using different generations obtained from two crosses applying two water regimes in two growing seasons. The SSVS is used to evaluate the effect of each variable on the dependent variable via posterior variable inclusion probabilities. The model with the highest posterior probability is selected as the best model. In this study, the grain yield was controlled by the main effects (additive and non-additive effects) and epistatic. The results demonstrate that breeding methods such as recurrent selection and subsequent pedigree method and hybrid production can be useful to improve grain yield.
Nowakowska, Marzena
2017-04-01
The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Archambeau, Cédric; Verleysen, Michel
2007-01-01
A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.
NASA Astrophysics Data System (ADS)
Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.
2015-12-01
Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.
Comparisons of Means Using Exploratory and Confirmatory Approaches
ERIC Educational Resources Information Center
Kuiper, Rebecca M.; Hoijtink, Herbert
2010-01-01
This article discusses comparisons of means using exploratory and confirmatory approaches. Three methods are discussed: hypothesis testing, model selection based on information criteria, and Bayesian model selection. Throughout the article, an example is used to illustrate and evaluate the two approaches and the three methods. We demonstrate that…
Maximum entropy perception-action space: a Bayesian model of eye movement selection
NASA Astrophysics Data System (ADS)
Colas, Francis; Bessière, Pierre; Girard, Benoît
2011-03-01
In this article, we investigate the issue of the selection of eye movements in a free-eye Multiple Object Tracking task. We propose a Bayesian model of retinotopic maps with a complex logarithmic mapping. This model is structured in two parts: a representation of the visual scene, and a decision model based on the representation. We compare different decision models based on different features of the representation and we show that taking into account uncertainty helps predict the eye movements of subjects recorded in a psychophysics experiment. Finally, based on experimental data, we postulate that the complex logarithmic mapping has a functional relevance, as the density of objects in this space in more uniform than expected. This may indicate that the representation space and control strategies are such that the object density is of maximum entropy.
Scheel, Ida; Ferkingstad, Egil; Frigessi, Arnoldo; Haug, Ola; Hinnerichsen, Mikkel; Meze-Hausken, Elisabeth
2013-01-01
Climate change will affect the insurance industry. We develop a Bayesian hierarchical statistical approach to explain and predict insurance losses due to weather events at a local geographic scale. The number of weather-related insurance claims is modelled by combining generalized linear models with spatially smoothed variable selection. Using Gibbs sampling and reversible jump Markov chain Monte Carlo methods, this model is fitted on daily weather and insurance data from each of the 319 municipalities which constitute southern and central Norway for the period 1997–2006. Precise out-of-sample predictions validate the model. Our results show interesting regional patterns in the effect of different weather covariates. In addition to being useful for insurance pricing, our model can be used for short-term predictions based on weather forecasts and for long-term predictions based on downscaled climate models. PMID:23396890
Morales, Dinora Araceli; Bengoetxea, Endika; Larrañaga, Pedro; García, Miguel; Franco, Yosu; Fresnada, Mónica; Merino, Marisa
2008-05-01
In vitro fertilization (IVF) is a medically assisted reproduction technique that enables infertile couples to achieve successful pregnancy. Given the uncertainty of the treatment, we propose an intelligent decision support system based on supervised classification by Bayesian classifiers to aid to the selection of the most promising embryos that will form the batch to be transferred to the woman's uterus. The aim of the supervised classification system is to improve overall success rate of each IVF treatment in which a batch of embryos is transferred each time, where the success is achieved when implantation (i.e. pregnancy) is obtained. Due to ethical reasons, different legislative restrictions apply in every country on this technique. In Spain, legislation allows a maximum of three embryos to form each transfer batch. As a result, clinicians prefer to select the embryos by non-invasive embryo examination based on simple methods and observation focused on morphology and dynamics of embryo development after fertilization. This paper proposes the application of Bayesian classifiers to this embryo selection problem in order to provide a decision support system that allows a more accurate selection than with the actual procedures which fully rely on the expertise and experience of embryologists. For this, we propose to take into consideration a reduced subset of feature variables related to embryo morphology and clinical data of patients, and from this data to induce Bayesian classification models. Results obtained applying a filter technique to choose the subset of variables, and the performance of Bayesian classifiers using them, are presented.
Power in Bayesian Mediation Analysis for Small Sample Research
Miočević, Milica; MacKinnon, David P.; Levy, Roy
2018-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results. PMID:29662296
Power in Bayesian Mediation Analysis for Small Sample Research.
Miočević, Milica; MacKinnon, David P; Levy, Roy
2017-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.
Bayesian Covariate Selection in Mixed-Effects Models For Longitudinal Shape Analysis
Muralidharan, Prasanna; Fishbaugh, James; Kim, Eun Young; Johnson, Hans J.; Paulsen, Jane S.; Gerig, Guido; Fletcher, P. Thomas
2016-01-01
The goal of longitudinal shape analysis is to understand how anatomical shape changes over time, in response to biological processes, including growth, aging, or disease. In many imaging studies, it is also critical to understand how these shape changes are affected by other factors, such as sex, disease diagnosis, IQ, etc. Current approaches to longitudinal shape analysis have focused on modeling age-related shape changes, but have not included the ability to handle covariates. In this paper, we present a novel Bayesian mixed-effects shape model that incorporates simultaneous relationships between longitudinal shape data and multiple predictors or covariates to the model. Moreover, we place an Automatic Relevance Determination (ARD) prior on the parameters, that lets us automatically select which covariates are most relevant to the model based on observed data. We evaluate our proposed model and inference procedure on a longitudinal study of Huntington's disease from PREDICT-HD. We first show the utility of the ARD prior for model selection in a univariate modeling of striatal volume, and next we apply the full high-dimensional longitudinal shape model to putamen shapes. PMID:28090246
A Bayesian network model for predicting pregnancy after in vitro fertilization.
Corani, G; Magli, C; Giusti, A; Gianaroli, L; Gambardella, L M
2013-11-01
We present a Bayesian network model for predicting the outcome of in vitro fertilization (IVF). The problem is characterized by a particular missingness process; we propose a simple but effective averaging approach which improves parameter estimates compared to the traditional MAP estimation. We present results with generated data and the analysis of a real data set. Moreover, we assess by means of a simulation study the effectiveness of the model in supporting the selection of the embryos to be transferred. © 2013 Elsevier Ltd. All rights reserved.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula
NASA Astrophysics Data System (ADS)
Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.
2016-03-01
A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.
Meinerz, Kelsey; Beeman, Scott C; Duan, Chong; Bretthorst, G Larry; Garbow, Joel R; Ackerman, Joseph J H
2018-01-01
Recently, a number of MRI protocols have been reported that seek to exploit the effect of dissolved oxygen (O 2 , paramagnetic) on the longitudinal 1 H relaxation of tissue water, thus providing image contrast related to tissue oxygen content. However, tissue water relaxation is dependent on a number of mechanisms, and this raises the issue of how best to model the relaxation data. This problem, the model selection problem, occurs in many branches of science and is optimally addressed by Bayesian probability theory. High signal-to-noise, densely sampled, longitudinal 1 H relaxation data were acquired from rat brain in vivo and from a cross-linked bovine serum albumin (xBSA) phantom, a sample that recapitulates the relaxation characteristics of tissue water in vivo . Bayesian-based model selection was applied to a cohort of five competing relaxation models: (i) monoexponential, (ii) stretched-exponential, (iii) biexponential, (iv) Gaussian (normal) R 1 -distribution, and (v) gamma R 1 -distribution. Bayesian joint analysis of multiple replicate datasets revealed that water relaxation of both the xBSA phantom and in vivo rat brain was best described by a biexponential model, while xBSA relaxation datasets truncated to remove evidence of the fast relaxation component were best modeled as a stretched exponential. In all cases, estimated model parameters were compared to the commonly used monoexponential model. Reducing the sampling density of the relaxation data and adding Gaussian-distributed noise served to simulate cases in which the data are acquisition-time or signal-to-noise restricted, respectively. As expected, reducing either the number of data points or the signal-to-noise increases the uncertainty in estimated parameters and, ultimately, reduces support for more complex relaxation models.
A Bayesian method for assessing multiscalespecies-habitat relationships
Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.
2017-01-01
ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and testing hypotheses of scaling relationships.
Model weights and the foundations of multimodel inference
Link, W.A.; Barker, R.J.
2006-01-01
Statistical thinking in wildlife biology and ecology has been profoundly influenced by the introduction of AIC (Akaike?s information criterion) as a tool for model selection and as a basis for model averaging. In this paper, we advocate the Bayesian paradigm as a broader framework for multimodel inference, one in which model averaging and model selection are naturally linked, and in which the performance of AIC-based tools is naturally evaluated. Prior model weights implicitly associated with the use of AIC are seen to highly favor complex models: in some cases, all but the most highly parameterized models in the model set are virtually ignored a priori. We suggest the usefulness of the weighted BIC (Bayesian information criterion) as a computationally simple alternative to AIC, based on explicit selection of prior model probabilities rather than acceptance of default priors associated with AIC. We note, however, that both procedures are only approximate to the use of exact Bayes factors. We discuss and illustrate technical difficulties associated with Bayes factors, and suggest approaches to avoiding these difficulties in the context of model selection for a logistic regression. Our example highlights the predisposition of AIC weighting to favor complex models and suggests a need for caution in using the BIC for computing approximate posterior model weights.
Posada, David
2006-01-01
ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
FBST for Cointegration Problems
NASA Astrophysics Data System (ADS)
Diniz, M.; Pereira, C. A. B.; Stern, J. M.
2008-11-01
In order to estimate causal relations, the time series econometrics has to be aware of spurious correlation, a problem first mentioned by Yule [21]. To solve the problem, one can work with differenced series or use multivariate models like VAR or VEC models. In this case, the analysed series are going to present a long run relation i.e. a cointegration relation. Even though the Bayesian literature about inference on VAR/VEC models is quite advanced, Bauwens et al. [2] highlight that "the topic of selecting the cointegrating rank has not yet given very useful and convincing results." This paper presents the Full Bayesian Significance Test applied to cointegration rank selection tests in multivariate (VAR/VEC) time series models and shows how to implement it using available in the literature and simulated data sets. A standard non-informative prior is assumed.
Impact of petrophysical uncertainty on Bayesian hydrogeophysical inversion and model selection
NASA Astrophysics Data System (ADS)
Brunetti, Carlotta; Linde, Niklas
2018-01-01
Quantitative hydrogeophysical studies rely heavily on petrophysical relationships that link geophysical properties to hydrogeological properties and state variables. Coupled inversion studies are frequently based on the questionable assumption that these relationships are perfect (i.e., no scatter). Using synthetic examples and crosshole ground-penetrating radar (GPR) data from the South Oyster Bacterial Transport Site in Virginia, USA, we investigate the impact of spatially-correlated petrophysical uncertainty on inferred posterior porosity and hydraulic conductivity distributions and on Bayes factors used in Bayesian model selection. Our study shows that accounting for petrophysical uncertainty in the inversion (I) decreases bias of the inferred variance of hydrogeological subsurface properties, (II) provides more realistic uncertainty assessment and (III) reduces the overconfidence in the ability of geophysical data to falsify conceptual hydrogeological models.
The discounting model selector: Statistical software for delay discounting applications.
Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A
2017-05-01
Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
A bayesian hierarchical model for classification with selection of functional predictors.
Zhu, Hongxiao; Vannucci, Marina; Cox, Dennis D
2010-06-01
In functional data classification, functional observations are often contaminated by various systematic effects, such as random batch effects caused by device artifacts, or fixed effects caused by sample-related factors. These effects may lead to classification bias and thus should not be neglected. Another issue of concern is the selection of functions when predictors consist of multiple functions, some of which may be redundant. The above issues arise in a real data application where we use fluorescence spectroscopy to detect cervical precancer. In this article, we propose a Bayesian hierarchical model that takes into account random batch effects and selects effective functions among multiple functional predictors. Fixed effects or predictors in nonfunctional form are also included in the model. The dimension of the functional data is reduced through orthonormal basis expansion or functional principal components. For posterior sampling, we use a hybrid Metropolis-Hastings/Gibbs sampler, which suffers slow mixing. An evolutionary Monte Carlo algorithm is applied to improve the mixing. Simulation and real data application show that the proposed model provides accurate selection of functional predictors as well as good classification.
Bayesian GGE biplot models applied to maize multi-environments trials.
de Oliveira, L A; da Silva, C P; Nuvunga, J J; da Silva, A Q; Balestre, M
2016-06-17
The additive main effects and multiplicative interaction (AMMI) and the genotype main effects and genotype x environment interaction (GGE) models stand out among the linear-bilinear models used in genotype x environment interaction studies. Despite the advantages of their use to describe genotype x environment (AMMI) or genotype and genotype x environment (GGE) interactions, these methods have known limitations that are inherent to fixed effects models, including difficulty in treating variance heterogeneity and missing data. Traditional biplots include no measure of uncertainty regarding the principal components. The present study aimed to apply the Bayesian approach to GGE biplot models and assess the implications for selecting stable and adapted genotypes. Our results demonstrated that the Bayesian approach applied to GGE models with non-informative priors was consistent with the traditional GGE biplot analysis, although the credible region incorporated into the biplot enabled distinguishing, based on probability, the performance of genotypes, and their relationships with the environments in the biplot. Those regions also enabled the identification of groups of genotypes and environments with similar effects in terms of adaptability and stability. The relative position of genotypes and environments in biplots is highly affected by the experimental accuracy. Thus, incorporation of uncertainty in biplots is a key tool for breeders to make decisions regarding stability selection and adaptability and the definition of mega-environments.
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2015-12-01
Models in biogeoscience involve uncertainties in observation data, model inputs, model structure, model processes and modeling scenarios. To accommodate for different sources of uncertainty, multimodal analysis such as model combination, model selection, model elimination or model discrimination are becoming more popular. To illustrate theoretical and practical challenges of multimodal analysis, we use an example about microbial soil respiration modeling. Global soil respiration releases more than ten times more carbon dioxide to the atmosphere than all anthropogenic emissions. Thus, improving our understanding of microbial soil respiration is essential for improving climate change models. This study focuses on a poorly understood phenomena, which is the soil microbial respiration pulses in response to episodic rainfall pulses (the "Birch effect"). We hypothesize that the "Birch effect" is generated by the following three mechanisms. To test our hypothesis, we developed and assessed five evolving microbial-enzyme models against field measurements from a semiarid Savannah that is characterized by pulsed precipitation. These five model evolve step-wise such that the first model includes none of these three mechanism, while the fifth model includes the three mechanisms. The basic component of Bayesian multimodal analysis is the estimation of marginal likelihood to rank the candidate models based on their overall likelihood with respect to observation data. The first part of the study focuses on using this Bayesian scheme to discriminate between these five candidate models. The second part discusses some theoretical and practical challenges, which are mainly the effect of likelihood function selection and the marginal likelihood estimation methods on both model ranking and Bayesian model averaging. The study shows that making valid inference from scientific data is not a trivial task, since we are not only uncertain about the candidate scientific models, but also about the statistical methods that are used to discriminate between these models.
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests
ERIC Educational Resources Information Center
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Yu, Rongjie; Abdel-Aty, Mohamed
2013-07-01
The Bayesian inference method has been frequently adopted to develop safety performance functions. One advantage of the Bayesian inference is that prior information for the independent variables can be included in the inference procedures. However, there are few studies that discussed how to formulate informative priors for the independent variables and evaluated the effects of incorporating informative priors in developing safety performance functions. This paper addresses this deficiency by introducing four approaches of developing informative priors for the independent variables based on historical data and expert experience. Merits of these informative priors have been tested along with two types of Bayesian hierarchical models (Poisson-gamma and Poisson-lognormal models). Deviance information criterion (DIC), R-square values, and coefficients of variance for the estimations were utilized as evaluation measures to select the best model(s). Comparison across the models indicated that the Poisson-gamma model is superior with a better model fit and it is much more robust with the informative priors. Moreover, the two-stage Bayesian updating informative priors provided the best goodness-of-fit and coefficient estimation accuracies. Furthermore, informative priors for the inverse dispersion parameter have also been introduced and tested. Different types of informative priors' effects on the model estimations and goodness-of-fit have been compared and concluded. Finally, based on the results, recommendations for future research topics and study applications have been made. Copyright © 2013 Elsevier Ltd. All rights reserved.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Comparing models for perfluorooctanoic acid pharmacokinetics using Bayesian analysis
Selecting the appropriate pharmacokinetic (PK) model given the available data is investigated for perfluorooctanoic acid (PFOA), which has been widely analyzed with an empirical, one-compartment model. This research examined the results of experiments [Kemper R. A., DuPont Haskel...
On the predictive information criteria for model determination in seismic hazard analysis
NASA Astrophysics Data System (ADS)
Varini, Elisa; Rotondi, Renata
2016-04-01
Many statistical tools have been developed for evaluating, understanding, and comparing models, from both frequentist and Bayesian perspectives. In particular, the problem of model selection can be addressed according to whether the primary goal is explanation or, alternatively, prediction. In the former case, the criteria for model selection are defined over the parameter space whose physical interpretation can be difficult; in the latter case, they are defined over the space of the observations, which has a more direct physical meaning. In the frequentist approaches, model selection is generally based on an asymptotic approximation which may be poor for small data sets (e.g. the F-test, the Kolmogorov-Smirnov test, etc.); moreover, these methods often apply under specific assumptions on models (e.g. models have to be nested in the likelihood ratio test). In the Bayesian context, among the criteria for explanation, the ratio of the observed marginal densities for two competing models, named Bayes Factor (BF), is commonly used for both model choice and model averaging (Kass and Raftery, J. Am. Stat. Ass., 1995). But BF does not apply to improper priors and, even when the prior is proper, it is not robust to the specification of the prior. These limitations can be extended to two famous penalized likelihood methods as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), since they are proved to be approximations of -2log BF . In the perspective that a model is as good as its predictions, the predictive information criteria aim at evaluating the predictive accuracy of Bayesian models or, in other words, at estimating expected out-of-sample prediction error using a bias-correction adjustment of within-sample error (Gelman et al., Stat. Comput., 2014). In particular, the Watanabe criterion is fully Bayesian because it averages the predictive distribution over the posterior distribution of parameters rather than conditioning on a point estimate, but it is hardly applicable to data which are not independent given parameters (Watanabe, J. Mach. Learn. Res., 2010). A solution is given by Ando and Tsay criterion where the joint density may be decomposed into the product of the conditional densities (Ando and Tsay, Int. J. Forecast., 2010). The above mentioned criteria are global summary measures of model performance, but more detailed analysis could be required to discover the reasons for poor global performance. In this latter case, a retrospective predictive analysis is performed on each individual observation. In this study we performed the Bayesian analysis of Italian data sets by four versions of a long-term hazard model known as the stress release model (Vere-Jones, J. Physics Earth, 1978; Bebbington and Harte, Geophys. J. Int., 2003; Varini and Rotondi, Environ. Ecol. Stat., 2015). Then we illustrate the results on their performance evaluated by Bayes Factor, predictive information criteria and retrospective predictive analysis.
Modeling Dynamic Contrast-Enhanced MRI Data with a Constrained Local AIF.
Duan, Chong; Kallehauge, Jesper F; Pérez-Torres, Carlos J; Bretthorst, G Larry; Beeman, Scott C; Tanderup, Kari; Ackerman, Joseph J H; Garbow, Joel R
2018-02-01
This study aims to develop a constrained local arterial input function (cL-AIF) to improve quantitative analysis of dynamic contrast-enhanced (DCE)-magnetic resonance imaging (MRI) data by accounting for the contrast-agent bolus amplitude error in the voxel-specific AIF. Bayesian probability theory-based parameter estimation and model selection were used to compare tracer kinetic modeling employing either the measured remote-AIF (R-AIF, i.e., the traditional approach) or an inferred cL-AIF against both in silico DCE-MRI data and clinical, cervical cancer DCE-MRI data. When the data model included the cL-AIF, tracer kinetic parameters were correctly estimated from in silico data under contrast-to-noise conditions typical of clinical DCE-MRI experiments. Considering the clinical cervical cancer data, Bayesian model selection was performed for all tumor voxels of the 16 patients (35,602 voxels in total). Among those voxels, a tracer kinetic model that employed the voxel-specific cL-AIF was preferred (i.e., had a higher posterior probability) in 80 % of the voxels compared to the direct use of a single R-AIF. Maps of spatial variation in voxel-specific AIF bolus amplitude and arrival time for heterogeneous tissues, such as cervical cancer, are accessible with the cL-AIF approach. The cL-AIF method, which estimates unique local-AIF amplitude and arrival time for each voxel within the tissue of interest, provides better modeling of DCE-MRI data than the use of a single, measured R-AIF. The Bayesian-based data analysis described herein affords estimates of uncertainties for each model parameter, via posterior probability density functions, and voxel-wise comparison across methods/models, via model selection in data modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Kathryn, E-mail: kfarrell@ices.utexas.edu; Oden, J. Tinsley, E-mail: oden@ices.utexas.edu; Faghihi, Danial, E-mail: danial@ices.utexas.edu
A general adaptive modeling algorithm for selection and validation of coarse-grained models of atomistic systems is presented. A Bayesian framework is developed to address uncertainties in parameters, data, and model selection. Algorithms for computing output sensitivities to parameter variances, model evidence and posterior model plausibilities for given data, and for computing what are referred to as Occam Categories in reference to a rough measure of model simplicity, make up components of the overall approach. Computational results are provided for representative applications.
BayeSED: A General Approach to Fitting the Spectral Energy Distribution of Galaxies
NASA Astrophysics Data System (ADS)
Han, Yunkun; Han, Zhanwen
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large Ks -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has been performed for the first time. We found that the 2003 model by Bruzual & Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the Ks -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.
BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Yunkun; Han, Zhanwen, E-mail: hanyk@ynao.ac.cn, E-mail: zhanwenhan@ynao.ac.cn
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large K{sub s} -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has beenmore » performed for the first time. We found that the 2003 model by Bruzual and Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the K{sub s} -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.« less
Profile-Based LC-MS Data Alignment—A Bayesian Approach
Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.
2014-01-01
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL)
NASA Astrophysics Data System (ADS)
Razali, Nazim; Mustapha, Aida; Yatim, Faiz Ahmad; Aziz, Ruhaya Ab
2017-08-01
The issues of modeling asscoiation football prediction model has become increasingly popular in the last few years and many different approaches of prediction models have been proposed with the point of evaluating the attributes that lead a football team to lose, draw or win the match. There are three types of approaches has been considered for predicting football matches results which include statistical approaches, machine learning approaches and Bayesian approaches. Lately, many studies regarding football prediction models has been produced using Bayesian approaches. This paper proposes a Bayesian Networks (BNs) to predict the results of football matches in term of home win (H), away win (A) and draw (D). The English Premier League (EPL) for three seasons of 2010-2011, 2011-2012 and 2012-2013 has been selected and reviewed. K-fold cross validation has been used for testing the accuracy of prediction model. The required information about the football data is sourced from a legitimate site at http://www.football-data.co.uk. BNs achieved predictive accuracy of 75.09% in average across three seasons. It is hoped that the results could be used as the benchmark output for future research in predicting football matches results.
NASA Astrophysics Data System (ADS)
Cucchi, K.; Kawa, N.; Hesse, F.; Rubin, Y.
2017-12-01
In order to reduce uncertainty in the prediction of subsurface flow and transport processes, practitioners should use all data available. However, classic inverse modeling frameworks typically only make use of information contained in in-situ field measurements to provide estimates of hydrogeological parameters. Such hydrogeological information about an aquifer is difficult and costly to acquire. In this data-scarce context, the transfer of ex-situ information coming from previously investigated sites can be critical for improving predictions by better constraining the estimation procedure. Bayesian inverse modeling provides a coherent framework to represent such ex-situ information by virtue of the prior distribution and combine them with in-situ information from the target site. In this study, we present an innovative data-driven approach for defining such informative priors for hydrogeological parameters at the target site. Our approach consists in two steps, both relying on statistical and machine learning methods. The first step is data selection; it consists in selecting sites similar to the target site. We use clustering methods for selecting similar sites based on observable hydrogeological features. The second step is data assimilation; it consists in assimilating data from the selected similar sites into the informative prior. We use a Bayesian hierarchical model to account for inter-site variability and to allow for the assimilation of multiple types of site-specific data. We present the application and validation of the presented methods on an established database of hydrogeological parameters. Data and methods are implemented in the form of an open-source R-package and therefore facilitate easy use by other practitioners.
Wu, Wei Mo; Wang, Jia Qiang; Cao, Qi; Wu, Jia Ping
2017-02-01
Accurate prediction of soil organic carbon (SOC) distribution is crucial for soil resources utilization and conservation, climate change adaptation, and ecosystem health. In this study, we selected a 1300 m×1700 m solonchak sampling area in northern Tarim Basin, Xinjiang, China, and collected a total of 144 soil samples (5-10 cm). The objectives of this study were to build a Baye-sian geostatistical model to predict SOC content, and to assess the performance of the Bayesian model for the prediction of SOC content by comparing with other three geostatistical approaches [ordinary kriging (OK), sequential Gaussian simulation (SGS), and inverse distance weighting (IDW)]. In the study area, soil organic carbon contents ranged from 1.59 to 9.30 g·kg -1 with a mean of 4.36 g·kg -1 and a standard deviation of 1.62 g·kg -1 . Sample semivariogram was best fitted by an exponential model with the ratio of nugget to sill being 0.57. By using the Bayesian geostatistical approach, we generated the SOC content map, and obtained the prediction variance, upper 95% and lower 95% of SOC contents, which were then used to evaluate the prediction uncertainty. Bayesian geostatistical approach performed better than that of the OK, SGS and IDW, demonstrating the advantages of Bayesian approach in SOC prediction.
Sa-Ngamuang, Chaitawat; Haddawy, Peter; Luvira, Viravarn; Piyaphanee, Watcharapong; Iamsirithaworn, Sopon; Lawpoolsri, Saranath
2018-06-18
Differentiating dengue patients from other acute febrile illness patients is a great challenge among physicians. Several dengue diagnosis methods are recommended by WHO. The application of specific laboratory tests is still limited due to high cost, lack of equipment, and uncertain validity. Therefore, clinical diagnosis remains a common practice especially in resource limited settings. Bayesian networks have been shown to be a useful tool for diagnostic decision support. This study aimed to construct Bayesian network models using basic demographic, clinical, and laboratory profiles of acute febrile illness patients to diagnose dengue. Data of 397 acute undifferentiated febrile illness patients who visited the fever clinic of the Bangkok Hospital for Tropical Diseases, Thailand, were used for model construction and validation. The two best final models were selected: one with and one without NS1 rapid test result. The diagnostic accuracy of the models was compared with that of physicians on the same set of patients. The Bayesian network models provided good diagnostic accuracy of dengue infection, with ROC AUC of 0.80 and 0.75 for models with and without NS1 rapid test result, respectively. The models had approximately 80% specificity and 70% sensitivity, similar to the diagnostic accuracy of the hospital's fellows in infectious disease. Including information on NS1 rapid test improved the specificity, but reduced the sensitivity, both in model and physician diagnoses. The Bayesian network model developed in this study could be useful to assist physicians in diagnosing dengue, particularly in regions where experienced physicians and laboratory confirmation tests are limited.
A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.
Marucci-Wellman, H; Lehto, M; Corns, H
2011-12-01
Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review. Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding. When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories. A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.
Cruz-Ramírez, Nicandro; Acosta-Mesa, Héctor Gabriel; Mezura-Montes, Efrén; Guerra-Hernández, Alejandro; Hoyos-Rivera, Guillermo de Jesús; Barrientos-Martínez, Rocío Erandi; Gutiérrez-Fragoso, Karina; Nava-Fernández, Luis Alonso; González-Gaspar, Patricia; Novoa-del-Toro, Elva María; Aguilera-Rueda, Vicente Josué; Ameca-Alducin, María Yaneli
2014-01-01
The bias-variance dilemma is a well-known and important problem in Machine Learning. It basically relates the generalization capability (goodness of fit) of a learning method to its corresponding complexity. When we have enough data at hand, it is possible to use these data in such a way so as to minimize overfitting (the risk of selecting a complex model that generalizes poorly). Unfortunately, there are many situations where we simply do not have this required amount of data. Thus, we need to find methods capable of efficiently exploiting the available data while avoiding overfitting. Different metrics have been proposed to achieve this goal: the Minimum Description Length principle (MDL), Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC), among others. In this paper, we focus on crude MDL and empirically evaluate its performance in selecting models with a good balance between goodness of fit and complexity: the so-called bias-variance dilemma, decomposition or tradeoff. Although the graphical interaction between these dimensions (bias and variance) is ubiquitous in the Machine Learning literature, few works present experimental evidence to recover such interaction. In our experiments, we argue that the resulting graphs allow us to gain insights that are difficult to unveil otherwise: that crude MDL naturally selects balanced models in terms of bias-variance, which not necessarily need be the gold-standard ones. We carry out these experiments using a specific model: a Bayesian network. In spite of these motivating results, we also should not overlook three other components that may significantly affect the final model selection: the search procedure, the noise rate and the sample size.
Cruz-Ramírez, Nicandro; Acosta-Mesa, Héctor Gabriel; Mezura-Montes, Efrén; Guerra-Hernández, Alejandro; Hoyos-Rivera, Guillermo de Jesús; Barrientos-Martínez, Rocío Erandi; Gutiérrez-Fragoso, Karina; Nava-Fernández, Luis Alonso; González-Gaspar, Patricia; Novoa-del-Toro, Elva María; Aguilera-Rueda, Vicente Josué; Ameca-Alducin, María Yaneli
2014-01-01
The bias-variance dilemma is a well-known and important problem in Machine Learning. It basically relates the generalization capability (goodness of fit) of a learning method to its corresponding complexity. When we have enough data at hand, it is possible to use these data in such a way so as to minimize overfitting (the risk of selecting a complex model that generalizes poorly). Unfortunately, there are many situations where we simply do not have this required amount of data. Thus, we need to find methods capable of efficiently exploiting the available data while avoiding overfitting. Different metrics have been proposed to achieve this goal: the Minimum Description Length principle (MDL), Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC), among others. In this paper, we focus on crude MDL and empirically evaluate its performance in selecting models with a good balance between goodness of fit and complexity: the so-called bias-variance dilemma, decomposition or tradeoff. Although the graphical interaction between these dimensions (bias and variance) is ubiquitous in the Machine Learning literature, few works present experimental evidence to recover such interaction. In our experiments, we argue that the resulting graphs allow us to gain insights that are difficult to unveil otherwise: that crude MDL naturally selects balanced models in terms of bias-variance, which not necessarily need be the gold-standard ones. We carry out these experiments using a specific model: a Bayesian network. In spite of these motivating results, we also should not overlook three other components that may significantly affect the final model selection: the search procedure, the noise rate and the sample size. PMID:24671204
Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S
2016-06-01
Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.
Bayesian Model Averaging of Artificial Intelligence Models for Hydraulic Conductivity Estimation
NASA Astrophysics Data System (ADS)
Nadiri, A.; Chitsazan, N.; Tsai, F. T.; Asghari Moghaddam, A.
2012-12-01
This research presents a Bayesian artificial intelligence model averaging (BAIMA) method that incorporates multiple artificial intelligence (AI) models to estimate hydraulic conductivity and evaluate estimation uncertainties. Uncertainty in the AI model outputs stems from error in model input as well as non-uniqueness in selecting different AI methods. Using one single AI model tends to bias the estimation and underestimate uncertainty. BAIMA employs Bayesian model averaging (BMA) technique to address the issue of using one single AI model for estimation. BAIMA estimates hydraulic conductivity by averaging the outputs of AI models according to their model weights. In this study, the model weights were determined using the Bayesian information criterion (BIC) that follows the parsimony principle. BAIMA calculates the within-model variances to account for uncertainty propagation from input data to AI model output. Between-model variances are evaluated to account for uncertainty due to model non-uniqueness. We employed Takagi-Sugeno fuzzy logic (TS-FL), artificial neural network (ANN) and neurofuzzy (NF) to estimate hydraulic conductivity for the Tasuj plain aquifer, Iran. BAIMA combined three AI models and produced better fitting than individual models. While NF was expected to be the best AI model owing to its utilization of both TS-FL and ANN models, the NF model is nearly discarded by the parsimony principle. The TS-FL model and the ANN model showed equal importance although their hydraulic conductivity estimates were quite different. This resulted in significant between-model variances that are normally ignored by using one AI model.
NASA Astrophysics Data System (ADS)
Perkins, S. J.; Marais, P. C.; Zwart, J. T. L.; Natarajan, I.; Tasse, C.; Smirnov, O.
2015-09-01
We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters to produce multiple model visibilities. χ2 values computed from the model and observed visibilities are used as likelihood values to drive the Bayesian sampling process and select the best sky model. As most of the elements of the RIME and χ2 calculation are independent of one another, they are highly amenable to parallel computation. Additionally, Montblanc caters for iterative RIME evaluation to produce multiple χ2 values. Modified model parameters are transferred to the GPU between each iteration. We implemented Montblanc as a Python package based upon NVIDIA's CUDA architecture. As such, it is easy to extend and implement different pipelines. At present, Montblanc supports point and Gaussian morphologies, but is designed for easy addition of new source profiles. Montblanc's RIME implementation is performant: On an NVIDIA K40, it is approximately 250 times faster than MEQTREES on a dual hexacore Intel E5-2620v2 CPU. Compared to the OSKAR simulator's GPU-implemented RIME components it is 7.7 and 12 times faster on the same K40 for single and double-precision floating point respectively. However, OSKAR's RIME implementation is more general than Montblanc's BIRO-tailored RIME. Theoretical analysis of Montblanc's dominant CUDA kernel suggests that it is memory bound. In practice, profiling shows that is balanced between compute and memory, as much of the data required by the problem is retained in L1 and L2 caches.
NASA Astrophysics Data System (ADS)
Hernández-López, Mario R.; Romero-Cuéllar, Jonathan; Camilo Múnera-Estrada, Juan; Coccia, Gabriele; Francés, Félix
2017-04-01
It is noticeably important to emphasize the role of uncertainty particularly when the model forecasts are used to support decision-making and water management. This research compares two approaches for the evaluation of the predictive uncertainty in hydrological modeling. First approach is the Bayesian Joint Inference of hydrological and error models. Second approach is carried out through the Model Conditional Processor using the Truncated Normal Distribution in the transformed space. This comparison is focused on the predictive distribution reliability. The case study is applied to two basins included in the Model Parameter Estimation Experiment (MOPEX). These two basins, which have different hydrological complexity, are the French Broad River (North Carolina) and the Guadalupe River (Texas). The results indicate that generally, both approaches are able to provide similar predictive performances. However, the differences between them can arise in basins with complex hydrology (e.g. ephemeral basins). This is because obtained results with Bayesian Joint Inference are strongly dependent on the suitability of the hypothesized error model. Similarly, the results in the case of the Model Conditional Processor are mainly influenced by the selected model of tails or even by the selected full probability distribution model of the data in the real space, and by the definition of the Truncated Normal Distribution in the transformed space. In summary, the different hypotheses that the modeler choose on each of the two approaches are the main cause of the different results. This research also explores a proper combination of both methodologies which could be useful to achieve less biased hydrological parameter estimation. For this approach, firstly the predictive distribution is obtained through the Model Conditional Processor. Secondly, this predictive distribution is used to derive the corresponding additive error model which is employed for the hydrological parameter estimation with the Bayesian Joint Inference methodology.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Constructing a Bayesian network model for improving safety behavior of employees at workplaces.
Mohammadfam, Iraj; Ghasemi, Fakhradin; Kalatpour, Omid; Moghimbeigi, Abbas
2017-01-01
Unsafe behavior increases the risk of accident at workplaces and needs to be managed properly. The aim of the present study was to provide a model for managing and improving safety behavior of employees using the Bayesian networks approach. The study was conducted in several power plant construction projects in Iran. The data were collected using a questionnaire composed of nine factors, including management commitment, supporting environment, safety management system, employees' participation, safety knowledge, safety attitude, motivation, resource allocation, and work pressure. In order for measuring the score of each factor assigned by a responder, a measurement model was constructed for each of them. The Bayesian network was constructed using experts' opinions and Dempster-Shafer theory. Using belief updating, the best intervention strategies for improving safety behavior also were selected. The result of the present study demonstrated that the majority of employees do not tend to consider safety rules, regulation, procedures and norms in their behavior at the workplace. Safety attitude, safety knowledge, and supporting environment were the best predictor of safety behavior. Moreover, it was determined that instantaneous improvement of supporting environment and employee participation is the best strategy to reach a high proportion of safety behavior at the workplace. The lack of a comprehensive model that can be used for explaining safety behavior was one of the most problematic issues of the study. Furthermore, it can be concluded that belief updating is a unique feature of Bayesian networks that is very useful in comparing various intervention strategies and selecting the best one form them. Copyright © 2016 Elsevier Ltd. All rights reserved.
Context Relevant Prediction Model for COPD Domain Using Bayesian Belief Network
Saleh, Lokman; Ajami, Hicham; Mili, Hafedh
2017-01-01
In the last three decades, researchers have examined extensively how context-aware systems can assist people, specifically those suffering from incurable diseases, to help them cope with their medical illness. Over the years, a huge number of studies on Chronic Obstructive Pulmonary Disease (COPD) have been published. However, how to derive relevant attributes and early detection of COPD exacerbations remains a challenge. In this research work, we will use an efficient algorithm to select relevant attributes where there is no proper approach in this domain. Such algorithm predicts exacerbations with high accuracy by adding discretization process, and organizes the pertinent attributes in priority order based on their impact to facilitate the emergency medical treatment. In this paper, we propose an extension of our existing Helper Context-Aware Engine System (HCES) for COPD. This project uses Bayesian network algorithm to depict the dependency between the COPD symptoms (attributes) in order to overcome the insufficiency and the independency hypothesis of naïve Bayesian. In addition, the dependency in Bayesian network is realized using TAN algorithm rather than consulting pneumologists. All these combined algorithms (discretization, selection, dependency, and the ordering of the relevant attributes) constitute an effective prediction model, comparing to effective ones. Moreover, an investigation and comparison of different scenarios of these algorithms are also done to verify which sequence of steps of prediction model gives more accurate results. Finally, we designed and validated a computer-aided support application to integrate different steps of this model. The findings of our system HCES has shown promising results using Area Under Receiver Operating Characteristic (AUC = 81.5%). PMID:28644419
A selection model for accounting for publication bias in a full network meta-analysis.
Mavridis, Dimitris; Welton, Nicky J; Sutton, Alex; Salanti, Georgia
2014-12-30
Copas and Shi suggested a selection model to explore the potential impact of publication bias via sensitivity analysis based on assumptions for the probability of publication of trials conditional on the precision of their results. Chootrakool et al. extended this model to three-arm trials but did not fully account for the implications of the consistency assumption, and their model is difficult to generalize for complex network structures with more than three treatments. Fitting these selection models within a frequentist setting requires maximization of a complex likelihood function, and identification problems are common. We have previously presented a Bayesian implementation of the selection model when multiple treatments are compared with a common reference treatment. We now present a general model suitable for complex, full network meta-analysis that accounts for consistency when adjusting results for publication bias. We developed a design-by-treatment selection model to describe the mechanism by which studies with different designs (sets of treatments compared in a trial) and precision may be selected for publication. We fit the model in a Bayesian setting because it avoids the numerical problems encountered in the frequentist setting, it is generalizable with respect to the number of treatments and study arms, and it provides a flexible framework for sensitivity analysis using external knowledge. Our model accounts for the additional uncertainty arising from publication bias more successfully compared to the standard Copas model or its previous extensions. We illustrate the methodology using a published triangular network for the failure of vascular graft or arterial patency. Copyright © 2014 John Wiley & Sons, Ltd.
Bobb, Jennifer F; Dominici, Francesca; Peng, Roger D
2011-12-01
Estimating the risks heat waves pose to human health is a critical part of assessing the future impact of climate change. In this article, we propose a flexible class of time series models to estimate the relative risk of mortality associated with heat waves and conduct Bayesian model averaging (BMA) to account for the multiplicity of potential models. Applying these methods to data from 105 U.S. cities for the period 1987-2005, we identify those cities having a high posterior probability of increased mortality risk during heat waves, examine the heterogeneity of the posterior distributions of mortality risk across cities, assess sensitivity of the results to the selection of prior distributions, and compare our BMA results to a model selection approach. Our results show that no single model best predicts risk across the majority of cities, and that for some cities heat-wave risk estimation is sensitive to model choice. Although model averaging leads to posterior distributions with increased variance as compared to statistical inference conditional on a model obtained through model selection, we find that the posterior mean of heat wave mortality risk is robust to accounting for model uncertainty over a broad class of models. © 2011, The International Biometric Society.
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture. At the National Center for Cool and Cold Water Aquaculture (NCCCWA), we have pursued selective breeding to increase rainbow trout genetic resistance against BCWD and found that post-challenge survival is ...
Attention in a Bayesian Framework
Whiteley, Louise; Sahani, Maneesh
2012-01-01
The behavioral phenomena of sensory attention are thought to reflect the allocation of a limited processing resource, but there is little consensus on the nature of the resource or why it should be limited. Here we argue that a fundamental bottleneck emerges naturally within Bayesian models of perception, and use this observation to frame a new computational account of the need for, and action of, attention – unifying diverse attentional phenomena in a way that goes beyond previous inferential, probabilistic and Bayesian models. Attentional effects are most evident in cluttered environments, and include both selective phenomena, where attention is invoked by cues that point to particular stimuli, and integrative phenomena, where attention is invoked dynamically by endogenous processing. However, most previous Bayesian accounts of attention have focused on describing relatively simple experimental settings, where cues shape expectations about a small number of upcoming stimuli and thus convey “prior” information about clearly defined objects. While operationally consistent with the experiments it seeks to describe, this view of attention as prior seems to miss many essential elements of both its selective and integrative roles, and thus cannot be easily extended to complex environments. We suggest that the resource bottleneck stems from the computational intractability of exact perceptual inference in complex settings, and that attention reflects an evolved mechanism for approximate inference which can be shaped to refine the local accuracy of perception. We show that this approach extends the simple picture of attention as prior, so as to provide a unified and computationally driven account of both selective and integrative attentional phenomena. PMID:22712010
Model Uncertainty and Bayesian Model Averaged Benchmark Dose Estimation for Continuous Data
The benchmark dose (BMD) approach has gained acceptance as a valuable risk assessment tool, but risk assessors still face significant challenges associated with selecting an appropriate BMD/BMDL estimate from the results of a set of acceptable dose-response models. Current approa...
Nicoulaud-Gouin, V; Garcia-Sanchez, L; Giacalone, M; Attard, J C; Martin-Garin, A; Bois, F Y
2016-10-01
This paper addresses the methodological conditions -particularly experimental design and statistical inference- ensuring the identifiability of sorption parameters from breakthrough curves measured during stirred flow-through reactor experiments also known as continuous flow stirred-tank reactor (CSTR) experiments. The equilibrium-kinetic (EK) sorption model was selected as nonequilibrium parameterization embedding the K d approach. Parameter identifiability was studied formally on the equations governing outlet concentrations. It was also studied numerically on 6 simulated CSTR experiments on a soil with known equilibrium-kinetic sorption parameters. EK sorption parameters can not be identified from a single breakthrough curve of a CSTR experiment, because K d,1 and k - were diagnosed collinear. For pairs of CSTR experiments, Bayesian inference allowed to select the correct models of sorption and error among sorption alternatives. Bayesian inference was conducted with SAMCAT software (Sensitivity Analysis and Markov Chain simulations Applied to Transfer models) which launched the simulations through the embedded simulation engine GNU-MCSim, and automated their configuration and post-processing. Experimental designs consisting in varying flow rates between experiments reaching equilibrium at contamination stage were found optimal, because they simultaneously gave accurate sorption parameters and predictions. Bayesian results were comparable to maximum likehood method but they avoided convergence problems, the marginal likelihood allowed to compare all models, and credible interval gave directly the uncertainty of sorption parameters θ. Although these findings are limited to the specific conditions studied here, in particular the considered sorption model, the chosen parameter values and error structure, they help in the conception and analysis of future CSTR experiments with radionuclides whose kinetic behaviour is suspected. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data
Zhao, Xin; Cheung, Leo Wang-Kit
2007-01-01
Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Bayesian Recurrent Neural Network for Language Modeling.
Chien, Jen-Tzung; Ku, Yuan-Chu
2016-02-01
A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
OBJECTIVE: To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. METHODS: Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. FINDINGS: A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. CONCLUSION: Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic. PMID:12973640
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.
NASA Astrophysics Data System (ADS)
Silva, F. E. O. E.; Naghettini, M. D. C.; Fernandes, W.
2014-12-01
This paper evaluated the uncertainties associated with the estimation of the parameters of a conceptual rainfall-runoff model, through the use of Bayesian inference techniques by Monte Carlo simulation. The Pará River sub-basin, located in the upper São Francisco river basin, in southeastern Brazil, was selected for developing the studies. In this paper, we used the Rio Grande conceptual hydrologic model (EHR/UFMG, 2001) and the Markov Chain Monte Carlo simulation method named DREAM (VRUGT, 2008a). Two probabilistic models for the residues were analyzed: (i) the classic [Normal likelihood - r ≈ N (0, σ²)]; and (ii) a generalized likelihood (SCHOUPS & VRUGT, 2010), in which it is assumed that the differences between observed and simulated flows are correlated, non-stationary, and distributed as a Skew Exponential Power density. The assumptions made for both models were checked to ensure that the estimation of uncertainties in the parameters was not biased. The results showed that the Bayesian approach proved to be adequate to the proposed objectives, enabling and reinforcing the importance of assessing the uncertainties associated with hydrological modeling.
NASA Astrophysics Data System (ADS)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less
Bayesian model selection validates a biokinetic model for zirconium processing in humans
2012-01-01
Background In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection. Results We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler. Conclusions In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology. PMID:22863152
Inflation model selection meets dark radiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tram, Thomas; Vallance, Robert; Vennin, Vincent, E-mail: thomas.tram@port.ac.uk, E-mail: robert.vallance@student.manchester.ac.uk, E-mail: vincent.vennin@port.ac.uk
2017-01-01
We investigate how inflation model selection is affected by the presence of additional free-streaming relativistic degrees of freedom, i.e. dark radiation. We perform a full Bayesian analysis of both inflation parameters and cosmological parameters taking reheating into account self-consistently. We compute the Bayesian evidence for a few representative inflation scenarios in both the standard ΛCDM model and an extension including dark radiation parametrised by its effective number of relativistic species N {sub eff}. Using a minimal dataset (Planck low-ℓ polarisation, temperature power spectrum and lensing reconstruction), we find that the observational status of most inflationary models is unchanged. The exceptionsmore » are potentials such as power-law inflation that predict large values for the scalar spectral index that can only be realised when N {sub eff} is allowed to vary. Adding baryon acoustic oscillations data and the B-mode data from BICEP2/Keck makes power-law inflation disfavoured, while adding local measurements of the Hubble constant H {sub 0} makes power-law inflation slightly favoured compared to the best single-field plateau potentials. This illustrates how the dark radiation solution to the H {sub 0} tension would have deep consequences for inflation model selection.« less
Bayesian block-diagonal variable selection and model averaging
Papaspiliopoulos, O.; Rossell, D.
2018-01-01
Summary We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs. Posterior probabilities for any number of models are obtained by evaluating a single one-dimensional integral, and other quantities of interest such as variable inclusion probabilities and model-averaged regression estimates are obtained by an adaptive, deterministic one-dimensional numerical integration. The overall computational cost scales linearly with the number of blocks, which can be processed in parallel, and exponentially with the block size, rendering it most adequate in situations where predictors are organized in many moderately-sized blocks. For general designs, we approximate the Gram matrix by a block-diagonal matrix using spectral clustering and propose an iterative algorithm that capitalizes on the block-diagonal algorithms to explore efficiently the model space. All methods proposed in this paper are implemented in the R library mombf. PMID:29861501
Rowley, Mark I.; Coolen, Anthonius C. C.; Vojnovic, Borivoj; Barber, Paul R.
2016-01-01
We present novel Bayesian methods for the analysis of exponential decay data that exploit the evidence carried by every detected decay event and enables robust extension to advanced processing. Our algorithms are presented in the context of fluorescence lifetime imaging microscopy (FLIM) and particular attention has been paid to model the time-domain system (based on time-correlated single photon counting) with unprecedented accuracy. We present estimates of decay parameters for mono- and bi-exponential systems, offering up to a factor of two improvement in accuracy compared to previous popular techniques. Results of the analysis of synthetic and experimental data are presented, and areas where the superior precision of our techniques can be exploited in Förster Resonance Energy Transfer (FRET) experiments are described. Furthermore, we demonstrate two advanced processing methods: decay model selection to choose between differing models such as mono- and bi-exponential, and the simultaneous estimation of instrument and decay parameters. PMID:27355322
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Kaiguang; Valle, Denis; Popescu, Sorin
2013-05-15
Model specification remains challenging in spectroscopy of plant biochemistry, as exemplified by the availability of various spectral indices or band combinations for estimating the same biochemical. This lack of consensus in model choice across applications argues for a paradigm shift in hyperspectral methods to address model uncertainty and misspecification. We demonstrated one such method using Bayesian model averaging (BMA), which performs variable/band selection and quantifies the relative merits of many candidate models to synthesize a weighted average model with improved predictive performances. The utility of BMA was examined using a portfolio of 27 foliage spectral–chemical datasets representing over 80 speciesmore » across the globe to estimate multiple biochemical properties, including nitrogen, hydrogen, carbon, cellulose, lignin, chlorophyll (a or b), carotenoid, polar and nonpolar extractives, leaf mass per area, and equivalent water thickness. We also compared BMA with partial least squares (PLS) and stepwise multiple regression (SMR). Results showed that all the biochemicals except carotenoid were accurately estimated from hyerspectral data with R2 values > 0.80.« less
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
NASA Astrophysics Data System (ADS)
Määttä, A.; Laine, M.; Tamminen, J.; Veefkind, J. P.
2014-05-01
Satellite instruments are nowadays successfully utilised for measuring atmospheric aerosol in many applications as well as in research. Therefore, there is a growing need for rigorous error characterisation of the measurements. Here, we introduce a methodology for quantifying the uncertainty in the retrieval of aerosol optical thickness (AOT). In particular, we concentrate on two aspects: uncertainty due to aerosol microphysical model selection and uncertainty due to imperfect forward modelling. We apply the introduced methodology for aerosol optical thickness retrieval of the Ozone Monitoring Instrument (OMI) on board NASA's Earth Observing System (EOS) Aura satellite, launched in 2004. We apply statistical methodologies that improve the uncertainty estimates of the aerosol optical thickness retrieval by propagating aerosol microphysical model selection and forward model error more realistically. For the microphysical model selection problem, we utilise Bayesian model selection and model averaging methods. Gaussian processes are utilised to characterise the smooth systematic discrepancies between the measured and modelled reflectances (i.e. residuals). The spectral correlation is composed empirically by exploring a set of residuals. The operational OMI multi-wavelength aerosol retrieval algorithm OMAERO is used for cloud-free, over-land pixels of the OMI instrument with the additional Bayesian model selection and model discrepancy techniques introduced here. The method and improved uncertainty characterisation is demonstrated by several examples with different aerosol properties: weakly absorbing aerosols, forest fires over Greece and Russia, and Sahara desert dust. The statistical methodology presented is general; it is not restricted to this particular satellite retrieval application.
Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055
Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Musella, Vincenzo; Rinaldi, Laura; Lagazio, Corrado; Cringoli, Giuseppe; Biggeri, Annibale; Catelan, Dolores
2014-09-15
Model-based geostatistics and Bayesian approaches are appropriate in the context of Veterinary Epidemiology when point data have been collected by valid study designs. The aim is to predict a continuous infection risk surface. Little work has been done on the use of predictive infection probabilities at farm unit level. In this paper we show how to use predictive infection probability and related uncertainty from a Bayesian kriging model to draw a informative samples from the 8794 geo-referenced sheep farms of the Campania region (southern Italy). Parasitological data come from a first cross-sectional survey carried out to study the spatial distribution of selected helminths in sheep farms. A grid sampling was performed to select the farms for coprological examinations. Faecal samples were collected for 121 sheep farms and the presence of 21 different helminths were investigated using the FLOTAC technique. The 21 responses are very different in terms of geographical distribution and prevalence of infection. The observed prevalence range is from 0.83% to 96.69%. The distributions of the posterior predictive probabilities for all the 21 parasites are very heterogeneous. We show how the results of the Bayesian kriging model can be used to plan a second wave survey. Several alternatives can be chosen depending on the purposes of the second survey: weight by posterior predictive probabilities, their uncertainty or combining both information. The proposed Bayesian kriging model is simple, and the proposed samping strategy represents a useful tool to address targeted infection control treatments and surbveillance campaigns. It is easily extendable to other fields of research. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Mazrou, H.; Bezoubiri, F.
2018-07-01
In this work, a new program developed under MATLAB environment and supported by the Bayesian software WinBUGS has been combined to the traditional unfolding codes namely MAXED and GRAVEL, to evaluate a neutron spectrum from the Bonner spheres measured counts obtained around a shielded 241AmBe based-neutron irradiator located at a Secondary Standards Dosimetry Laboratory (SSDL) at CRNA. In the first step, the results obtained by the standalone Bayesian program, using a parametric neutron spectrum model based on a linear superposition of three components namely: a thermal-Maxwellian distribution, an epithermal (1/E behavior) and a kind of a Watt fission and Evaporation models to represent the fast component, were compared to those issued from MAXED and GRAVEL assuming a Monte Carlo default spectrum. Through the selection of new upper limits for some free parameters, taking into account the physical characteristics of the irradiation source, of both considered models, good agreement was obtained for investigated integral quantities i.e. fluence rate and ambient dose equivalent rate compared to MAXED and GRAVEL results. The difference was generally below 4% for investigated parameters suggesting, thereby, the reliability of the proposed models. In the second step, the Bayesian results obtained from the previous calculations were used, as initial guess spectra, for the traditional unfolding codes, MAXED and GRAVEL to derive the solution spectra. Here again the results were in very good agreement, confirming the stability of the Bayesian solution.
The multicategory case of the sequential Bayesian pixel selection and estimation procedure
NASA Technical Reports Server (NTRS)
Pore, M. D.; Dennis, T. B. (Principal Investigator)
1980-01-01
A Bayesian technique for stratified proportion estimation and a sampling based on minimizing the mean squared error of this estimator were developed and tested on LANDSAT multispectral scanner data using the beta density function to model the prior distribution in the two-class case. An extention of this procedure to the k-class case is considered. A generalization of the beta function is shown to be a density function for the general case which allows the procedure to be extended.
Hamilton, B H
1999-08-01
The fraction of US Medicare recipients enrolled in health maintenance organizations (HMOs) has increased substantially over the past 10 years. However, the impact of HMOs on health care costs is still hotly debated. In particular, it is argued that HMOs achieve cost reduction through 'cream-skimming' and enrolling relatively healthy patients. This paper develops a Bayesian panel data tobit model of HMO selection and Medicare expenditures for recent US retirees that accounts for mortality over the course of the panel. The model is estimated using Markov Chain Monte Carlo (MCMC) simulation methods, and is novel in that a multivariate t-link is used in place of normality to allow for the heavy-tailed distributions often found in health care expenditure data. The findings indicate that HMOs select individuals who are less likely to have positive health care expenditures prior to enrollment. However, there is no evidence that HMOs disenrol high cost patients. The results also indicate the importance of accounting for survival over the panel, since high mortality probabilities are associated with higher health care expenditures in the last year of life.
Chipman, Hugh A.; Hamada, Michael S.
2016-06-02
Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chipman, Hugh A.; Hamada, Michael S.
Regular two-level fractional factorial designs have complete aliasing in which the associated columns of multiple effects are identical. Here, we show how Bayesian variable selection can be used to analyze experiments that use such designs. In addition to sparsity and hierarchy, Bayesian variable selection naturally incorporates heredity . This prior information is used to identify the most likely combinations of active terms. We also demonstrate the method on simulated and real experiments.
eDNAoccupancy: An R package for multi-scale occupancy modeling of environmental DNA data
Dorazio, Robert; Erickson, Richard A.
2017-01-01
In this article we describe eDNAoccupancy, an R package for fitting Bayesian, multi-scale occupancy models. These models are appropriate for occupancy surveys that include three, nested levels of sampling: primary sample units within a study area, secondary sample units collected from each primary unit, and replicates of each secondary sample unit. This design is commonly used in occupancy surveys of environmental DNA (eDNA). eDNAoccupancy allows users to specify and fit multi-scale occupancy models with or without covariates, to estimate posterior summaries of occurrence and detection probabilities, and to compare different models using Bayesian model-selection criteria. We illustrate these features by analyzing two published data sets: eDNA surveys of a fungal pathogen of amphibians and eDNA surveys of an endangered fish species.
Bayesian survival analysis in clinical trials: What methods are used in practice?
Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V
2017-02-01
Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment effect was based on historical data in only four trials. Decision rules were pre-defined in eight cases when trials used Bayesian monitoring, and in only one case when trials adopted a Bayesian approach to the final analysis. Conclusion Few trials implemented a Bayesian survival analysis and few incorporated external data into priors. There is scope to improve the quality of reporting of Bayesian methods in survival trials. Extension of the Consolidated Standards of Reporting Trials statement for reporting Bayesian clinical trials is recommended.
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items
ERIC Educational Resources Information Center
Penfield, Randall D.
2006-01-01
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Quantitative trait nucleotide analysis using Bayesian model selection.
Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D
2005-10-01
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.
2010-01-01
Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788
Mujalli, Randa Oqab; de Oña, Juan
2011-10-01
This study describes a method for reducing the number of variables frequently considered in modeling the severity of traffic accidents. The method's efficiency is assessed by constructing Bayesian networks (BN). It is based on a two stage selection process. Several variable selection algorithms, commonly used in data mining, are applied in order to select subsets of variables. BNs are built using the selected subsets and their performance is compared with the original BN (with all the variables) using five indicators. The BNs that improve the indicators' values are further analyzed for identifying the most significant variables (accident type, age, atmospheric factors, gender, lighting, number of injured, and occupant involved). A new BN is built using these variables, where the results of the indicators indicate, in most of the cases, a statistically significant improvement with respect to the original BN. It is possible to reduce the number of variables used to model traffic accidents injury severity through BNs without reducing the performance of the model. The study provides the safety analysts a methodology that could be used to minimize the number of variables used in order to determine efficiently the injury severity of traffic accidents without reducing the performance of the model. Copyright © 2011 Elsevier Ltd. All rights reserved.
Spielman, Stephanie J; Wilke, Claus O
2016-11-01
The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Meirelles, S L C; Mokry, F B; Espasandín, A C; Dias, M A D; Baena, M M; de A Regitano, L C
2016-06-10
Correlation between genetic parameters and factors such as backfat thickness (BFT), rib eye area (REA), and body weight (BW) were estimated for Canchim beef cattle raised in natural pastures of Brazil. Data from 1648 animals were analyzed using multi-trait (BFT, REA, and BW) animal models by the Bayesian approach. This model included the effects of contemporary group, age, and individual heterozygosity as covariates. In addition, direct additive genetic and random residual effects were also analyzed. Heritability estimated for BFT (0.16), REA (0.50), and BW (0.44) indicated their potential for genetic improvements and response to selection processes. Furthermore, genetic correlations between BW and the remaining traits were high (P > 0.50), suggesting that selection for BW could improve REA and BFT. On the other hand, genetic correlation between BFT and REA was low (P = 0.39 ± 0.17), and included considerable variations, suggesting that these traits can be jointly included as selection criteria without influencing each other. We found that REA and BFT responded to the selection processes, as measured by ultrasound. Therefore, selection for yearling weight results in changes in REA and BFT.
Hydrologic Model Selection using Markov chain Monte Carlo methods
NASA Astrophysics Data System (ADS)
Marshall, L.; Sharma, A.; Nott, D.
2002-12-01
Estimation of parameter uncertainty (and in turn model uncertainty) allows assessment of the risk in likely applications of hydrological models. Bayesian statistical inference provides an ideal means of assessing parameter uncertainty whereby prior knowledge about the parameter is combined with information from the available data to produce a probability distribution (the posterior distribution) that describes uncertainty about the parameter and serves as a basis for selecting appropriate values for use in modelling applications. Widespread use of Bayesian techniques in hydrology has been hindered by difficulties in summarizing and exploring the posterior distribution. These difficulties have been largely overcome by recent advances in Markov chain Monte Carlo (MCMC) methods that involve random sampling of the posterior distribution. This study presents an adaptive MCMC sampling algorithm which has characteristics that are well suited to model parameters with a high degree of correlation and interdependence, as is often evident in hydrological models. The MCMC sampling technique is used to compare six alternative configurations of a commonly used conceptual rainfall-runoff model, the Australian Water Balance Model (AWBM), using 11 years of daily rainfall runoff data from the Bass river catchment in Australia. The alternative configurations considered fall into two classes - those that consider model errors to be independent of prior values, and those that model the errors as an autoregressive process. Each such class consists of three formulations that represent increasing levels of complexity (and parameterisation) of the original model structure. The results from this study point both to the importance of using Bayesian approaches in evaluating model performance, as well as the simplicity of the MCMC sampling framework that has the ability to bring such approaches within the reach of the applied hydrological community.
Bayesian inference in geomagnetism
NASA Technical Reports Server (NTRS)
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
Bayesian switching factor analysis for estimating time-varying functional connectivity in fMRI.
Taghia, Jalil; Ryali, Srikanth; Chen, Tianwen; Supekar, Kaustubh; Cai, Weidong; Menon, Vinod
2017-07-15
There is growing interest in understanding the dynamical properties of functional interactions between distributed brain regions. However, robust estimation of temporal dynamics from functional magnetic resonance imaging (fMRI) data remains challenging due to limitations in extant multivariate methods for modeling time-varying functional interactions between multiple brain areas. Here, we develop a Bayesian generative model for fMRI time-series within the framework of hidden Markov models (HMMs). The model is a dynamic variant of the static factor analysis model (Ghahramani and Beal, 2000). We refer to this model as Bayesian switching factor analysis (BSFA) as it integrates factor analysis into a generative HMM in a unified Bayesian framework. In BSFA, brain dynamic functional networks are represented by latent states which are learnt from the data. Crucially, BSFA is a generative model which estimates the temporal evolution of brain states and transition probabilities between states as a function of time. An attractive feature of BSFA is the automatic determination of the number of latent states via Bayesian model selection arising from penalization of excessively complex models. Key features of BSFA are validated using extensive simulations on carefully designed synthetic data. We further validate BSFA using fingerprint analysis of multisession resting-state fMRI data from the Human Connectome Project (HCP). Our results show that modeling temporal dependencies in the generative model of BSFA results in improved fingerprinting of individual participants. Finally, we apply BSFA to elucidate the dynamic functional organization of the salience, central-executive, and default mode networks-three core neurocognitive systems with central role in cognitive and affective information processing (Menon, 2011). Across two HCP sessions, we demonstrate a high level of dynamic interactions between these networks and determine that the salience network has the highest temporal flexibility among the three networks. Our proposed methods provide a novel and powerful generative model for investigating dynamic brain connectivity. Copyright © 2017 Elsevier Inc. All rights reserved.
A new Bayesian recursive technique for parameter estimation
NASA Astrophysics Data System (ADS)
Kaheil, Yasir H.; Gill, M. Kashif; McKee, Mac; Bastidas, Luis
2006-08-01
The performance of any model depends on how well its associated parameters are estimated. In the current application, a localized Bayesian recursive estimation (LOBARE) approach is devised for parameter estimation. The LOBARE methodology is an extension of the Bayesian recursive estimation (BARE) method. It is applied in this paper on two different types of models: an artificial intelligence (AI) model in the form of a support vector machine (SVM) application for forecasting soil moisture and a conceptual rainfall-runoff (CRR) model represented by the Sacramento soil moisture accounting (SAC-SMA) model. Support vector machines, based on statistical learning theory (SLT), represent the modeling task as a quadratic optimization problem and have already been used in various applications in hydrology. They require estimation of three parameters. SAC-SMA is a very well known model that estimates runoff. It has a 13-dimensional parameter space. In the LOBARE approach presented here, Bayesian inference is used in an iterative fashion to estimate the parameter space that will most likely enclose a best parameter set. This is done by narrowing the sampling space through updating the "parent" bounds based on their fitness. These bounds are actually the parameter sets that were selected by BARE runs on subspaces of the initial parameter space. The new approach results in faster convergence toward the optimal parameter set using minimum training/calibration data and fewer sets of parameter values. The efficacy of the localized methodology is also compared with the previously used BARE algorithm.
Feature selection for elderly faller classification based on wearable sensors.
Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D
2017-05-30
Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
Berthet, Pierre; Hellgren-Kotaleski, Jeanette; Lansner, Anders
2012-01-01
Several studies have shown a strong involvement of the basal ganglia (BG) in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE), i.e., the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behavior to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian–Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go) and the indirect (NoGo) pathway, as well as the reward prediction (RP) system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo, and RP system were utilized, e.g., using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioral data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available. PMID:23060764
Moving beyond qualitative evaluations of Bayesian models of cognition.
Hemmer, Pernille; Tauber, Sean; Steyvers, Mark
2015-06-01
Bayesian models of cognition provide a powerful way to understand the behavior and goals of individuals from a computational point of view. Much of the focus in the Bayesian cognitive modeling approach has been on qualitative model evaluations, where predictions from the models are compared to data that is often averaged over individuals. In many cognitive tasks, however, there are pervasive individual differences. We introduce an approach to directly infer individual differences related to subjective mental representations within the framework of Bayesian models of cognition. In this approach, Bayesian data analysis methods are used to estimate cognitive parameters and motivate the inference process within a Bayesian cognitive model. We illustrate this integrative Bayesian approach on a model of memory. We apply the model to behavioral data from a memory experiment involving the recall of heights of people. A cross-validation analysis shows that the Bayesian memory model with inferred subjective priors predicts withheld data better than a Bayesian model where the priors are based on environmental statistics. In addition, the model with inferred priors at the individual subject level led to the best overall generalization performance, suggesting that individual differences are important to consider in Bayesian models of cognition.
Streck, André Felipe; Homeier, Timo; Foerster, Tessa; Truyen, Uwe
2013-09-01
To estimate the impact of porcine parvovirus (PPV) vaccines on the emergence of new phenotypes, the population dynamic history of the virus was calculated using the Bayesian Markov chain Monte Carlo method with a Bayesian skyline coalescent model. Additionally, an in vitro model was performed with consecutive passages of the 'Challenge' strain (a virulent field strain) and NADL2 strain (a vaccine strain) in a PK-15 cell line supplemented with polyclonal antibodies raised against the vaccine strain. A decrease in genetic diversity was observed in the presence of antibodies in vitro or after vaccination (as estimated by the in silico model). We hypothesized that the antibodies induced a selective pressure that may reduce the incidence of neutral selection, which should play a major role in the emergence of new mutations. In this scenario, vaccine failures and non-vaccinated populations (e.g. wild boars) may have an important impact in the emergence of new phenotypes.
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
The Impact of Various Class-Distinction Features on Model Selection in the Mixture Rasch Model
ERIC Educational Resources Information Center
Choi, In-Hee; Paek, Insu; Cho, Sun-Joo
2017-01-01
The purpose of the current study is to examine the performance of four information criteria (Akaike's information criterion [AIC], corrected AIC [AICC] Bayesian information criterion [BIC], sample-size adjusted BIC [SABIC]) for detecting the correct number of latent classes in the mixture Rasch model through simulations. The simulation study…
A Bayesian Hierarchical Selection Model for Academic Growth with Missing Data
ERIC Educational Resources Information Center
Allen, Jeff
2017-01-01
Using a sample of schools testing annually in grades 9-11 with a vertically linked series of assessments, a latent growth curve model is used to model test scores with student intercepts and slopes nested within school. Missed assessments can occur because of student mobility, student dropout, absenteeism, and other reasons. Missing data…
A Bayesian Multilevel Model for Microcystin Prediction in ...
The frequency of cyanobacteria blooms in North American lakes is increasing. A major concernwith rising cyanobacteria blooms is microcystin, a common cyanobacterial hepatotoxin. Toexplore the conditions that promote high microcystin concentrations, we analyzed the US EPANational Lake Assessment (NLA) dataset collected in the summer of 2007. The NLA datasetis reported for nine eco-regions. We used the results of random forest modeling as a means ofvariable selection from which we developed a Bayesian multilevel model of microcystin concentrations.Model parameters under a multilevel modeling framework are eco-region specific, butthey are also assumed to be exchangeable across eco-regions for broad continental scaling. Theexchangeability assumption ensures that both the common patterns and eco-region specific featureswill be reflected in the model. Furthermore, the method incorporates appropriate estimatesof uncertainty. Our preliminary results show associations between microcystin and turbidity, totalnutrients, and N:P ratios. The NLA 2012 will be used for Bayesian updating. The results willhelp develop management strategies to alleviate microcystin impacts and improve lake quality. This work provides a probabilistic framework for predicting microcystin presences in lakes. It would allow for insights to be made about how changes in nutrient concentrations could potentially change toxin levels.
A Bayesian analysis of HAT-P-7b using the EXONEST algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Placek, Ben; Knuth, Kevin H.
2015-01-13
The study of exoplanets (planets orbiting other stars) is revolutionizing the way we view our universe. High-precision photometric data provided by the Kepler Space Telescope (Kepler) enables not only the detection of such planets, but also their characterization. This presents a unique opportunity to apply Bayesian methods to better characterize the multitude of previously confirmed exoplanets. This paper focuses on applying the EXONEST algorithm to characterize the transiting short-period-hot-Jupiter, HAT-P-7b (also referred to as Kepler-2b). EXONEST evaluates a suite of exoplanet photometric models by applying Bayesian Model Selection, which is implemented with the MultiNest algorithm. These models take into accountmore » planetary effects, such as reflected light and thermal emissions, as well as the effect of the planetary motion on the host star, such as Doppler beaming, or boosting, of light from the reflex motion of the host star, and photometric variations due to the planet-induced ellipsoidal shape of the host star. By calculating model evidences, one can determine which model best describes the observed data, thus identifying which effects dominate the planetary system. Presented are parameter estimates and model evidences for HAT-P-7b.« less
Andrinopoulou, Eleni-Rosalina; Rizopoulos, Dimitris
2016-11-20
The joint modeling of longitudinal and survival data has recently received much attention. Several extensions of the standard joint model that consists of one longitudinal and one survival outcome have been proposed including the use of different association structures between the longitudinal and the survival outcomes. However, in general, relatively little attention has been given to the selection of the most appropriate functional form to link the two outcomes. In common practice, it is assumed that the underlying value of the longitudinal outcome is associated with the survival outcome. However, it could be that different characteristics of the patients' longitudinal profiles influence the hazard. For example, not only the current value but also the slope or the area under the curve of the longitudinal outcome. The choice of which functional form to use is an important decision that needs to be investigated because it could influence the results. In this paper, we use a Bayesian shrinkage approach in order to determine the most appropriate functional forms. We propose a joint model that includes different association structures of different biomarkers and assume informative priors for the regression coefficients that correspond to the terms of the longitudinal process. Specifically, we assume Bayesian lasso, Bayesian ridge, Bayesian elastic net, and horseshoe. These methods are applied to a dataset consisting of patients with a chronic liver disease, where it is important to investigate which characteristics of the biomarkers have an influence on survival. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Sipkens, Timothy A.; Hadwin, Paul J.; Grauer, Samuel J.; Daun, Kyle J.
2018-03-01
Competing theories have been proposed to account for how the latent heat of vaporization of liquid iron varies with temperature, but experimental confirmation remains elusive, particularly at high temperatures. We propose time-resolved laser-induced incandescence measurements on iron nanoparticles combined with Bayesian model plausibility, as a novel method for evaluating these relationships. Our approach scores the explanatory power of candidate models, accounting for parameter uncertainty, model complexity, measurement noise, and goodness-of-fit. The approach is first validated with simulated data and then applied to experimental data for iron nanoparticles in argon. Our results justify the use of Román's equation to account for the temperature dependence of the latent heat of vaporization of liquid iron.
Advancing understanding of affect labeling with dynamic causal modeling
Torrisi, Salvatore J.; Lieberman, Matthew D.; Bookheimer, Susan Y.; Altshuler, Lori L.
2013-01-01
Mechanistic understandings of forms of incidental emotion regulation have implications for basic and translational research in the affective sciences. In this study we applied Dynamic Causal Modeling (DCM) for fMRI to a common paradigm of labeling facial affect to elucidate prefrontal to subcortical influences. Four brain regions were used to model affect labeling, including right ventrolateral prefrontal cortex (vlPFC), amygdala and Broca’s area. 64 models were compared, for each of 45 healthy subjects. Family level inference split the model space to a likely driving input and Bayesian Model Selection within the winning family of 32 models revealed a strong pattern of endogenous network connectivity. Modulatory effects of labeling were most prominently observed following Bayesian Model Averaging, with the dampening influence on amygdala originating from Broca’s area but much more strongly from right vlPFC. These results solidify and extend previous correlation and regression-based estimations of negative corticolimbic coupling. PMID:23774393
NASA Astrophysics Data System (ADS)
Li, L.; Xu, C.-Y.; Engeland, K.
2012-04-01
With respect to model calibration, parameter estimation and analysis of uncertainty sources, different approaches have been used in hydrological models. Bayesian method is one of the most widely used methods for uncertainty assessment of hydrological models, which incorporates different sources of information into a single analysis through Bayesian theorem. However, none of these applications can well treat the uncertainty in extreme flows of hydrological models' simulations. This study proposes a Bayesian modularization method approach in uncertainty assessment of conceptual hydrological models by considering the extreme flows. It includes a comprehensive comparison and evaluation of uncertainty assessments by a new Bayesian modularization method approach and traditional Bayesian models using the Metropolis Hasting (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions are used in combination with traditional Bayesian: the AR (1) plus Normal and time period independent model (Model 1), the AR (1) plus Normal and time period dependent model (Model 2) and the AR (1) plus multi-normal model (Model 3). The results reveal that (1) the simulations derived from Bayesian modularization method are more accurate with the highest Nash-Sutcliffe efficiency value, and (2) the Bayesian modularization method performs best in uncertainty estimates of entire flows and in terms of the application and computational efficiency. The study thus introduces a new approach for reducing the extreme flow's effect on the discharge uncertainty assessment of hydrological models via Bayesian. Keywords: extreme flow, uncertainty assessment, Bayesian modularization, hydrological model, WASMOD
Model selection criterion in survival analysis
NASA Astrophysics Data System (ADS)
Karabey, Uǧur; Tutkun, Nihal Ata
2017-07-01
Survival analysis deals with time until occurrence of an event of interest such as death, recurrence of an illness, the failure of an equipment or divorce. There are various survival models with semi-parametric or parametric approaches used in medical, natural or social sciences. The decision on the most appropriate model for the data is an important point of the analysis. In literature Akaike information criteria or Bayesian information criteria are used to select among nested models. In this study,the behavior of these information criterion is discussed for a real data set.
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2016-12-01
Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.
BMDS: A Collection of R Functions for Bayesian Multidimensional Scaling
ERIC Educational Resources Information Center
Okada, Kensuke; Shigemasu, Kazuo
2009-01-01
Bayesian multidimensional scaling (MDS) has attracted a great deal of attention because: (1) it provides a better fit than do classical MDS and ALSCAL; (2) it provides estimation errors of the distances; and (3) the Bayesian dimension selection criterion, MDSIC, provides a direct indication of optimal dimensionality. However, Bayesian MDS is not…
Item selection via Bayesian IRT models.
Arima, Serena
2015-02-10
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Comparison of two integration methods for dynamic causal modeling of electrophysiological data.
Lemaréchal, Jean-Didier; George, Nathalie; David, Olivier
2018-06-01
Dynamic causal modeling (DCM) is a methodological approach to study effective connectivity among brain regions. Based on a set of observations and a biophysical model of brain interactions, DCM uses a Bayesian framework to estimate the posterior distribution of the free parameters of the model (e.g. modulation of connectivity) and infer architectural properties of the most plausible model (i.e. model selection). When modeling electrophysiological event-related responses, the estimation of the model relies on the integration of the system of delay differential equations (DDEs) that describe the dynamics of the system. In this technical note, we compared two numerical schemes for the integration of DDEs. The first, and standard, scheme approximates the DDEs (more precisely, the state of the system, with respect to conduction delays among brain regions) using ordinary differential equations (ODEs) and solves it with a fixed step size. The second scheme uses a dedicated DDEs solver with adaptive step sizes to control error, making it theoretically more accurate. To highlight the effects of the approximation used by the first integration scheme in regard to parameter estimation and Bayesian model selection, we performed simulations of local field potentials using first, a simple model comprising 2 regions and second, a more complex model comprising 6 regions. In these simulations, the second integration scheme served as the standard to which the first one was compared. Then, the performances of the two integration schemes were directly compared by fitting a public mismatch negativity EEG dataset with different models. The simulations revealed that the use of the standard DCM integration scheme was acceptable for Bayesian model selection but underestimated the connectivity parameters and did not allow an accurate estimation of conduction delays. Fitting to empirical data showed that the models systematically obtained an increased accuracy when using the second integration scheme. We conclude that inference on connectivity strength and delay based on DCM for EEG/MEG requires an accurate integration scheme. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Bayesian model for fate and transport of polychlorinated biphenyl in upper Hudson River
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinberg, L.J.; Reckhow, K.H.; Wolpert, R.L.
1996-05-01
Modelers of contaminant fate and transport in surface waters typically rely on literature values when selecting parameter values for mechanistic models. While the expert judgment with which these selections are made is valuable, the information contained in contaminant concentration measurements should not be ignored. In this full-scale Bayesian analysis of polychlorinated biphenyl (PCB) contamination in the upper Hudson River, these two sources of information are combined using Bayes` theorem. A simulation model for the fate and transport of the PCBs in the upper Hudson River forms the basis of the likelihood function while the prior density is developed from literaturemore » values. The method provides estimates for the anaerobic biodegradation half-life, aerobic biodegradation plus volatilization half-life, contaminated sediment depth, and resuspension velocity of 4,400 d, 3.2 d, 0.32 m, and 0.02 m/yr, respectively. These are significantly different than values obtained with more traditional methods, and are shown to produce better predictions than those methods when used in a cross-validation study.« less
Inferring the Growth of Massive Galaxies Using Bayesian Spectral Synthesis Modeling
NASA Astrophysics Data System (ADS)
Stillman, Coley Michael; Poremba, Megan R.; Moustakas, John
2018-01-01
The most massive galaxies in the universe are typically found at the centers of massive galaxy clusters. Studying these galaxies can provide valuable insight into the hierarchical growth of massive dark matter halos. One of the key challenges of measuring the stellar mass growth of massive galaxies is converting the measured light profiles into stellar mass. We use Prospector, a state-of-the-art Bayesian spectral synthesis modeling code, to infer the total stellar masses of a pilot sample of massive central galaxies selected from the Sloan Digital Sky Survey. We compare our stellar mass estimates to previous measurements, and present some of the quantitative diagnostics provided by Prospector.
Multimodel Ensemble Methods for Prediction of Wake-Vortex Transport and Decay Originating NASA
NASA Technical Reports Server (NTRS)
Korner, Stephan; Ahmad, Nashat N.; Holzapfel, Frank; VanValkenburg, Randal L.
2017-01-01
Several multimodel ensemble methods are selected and further developed to improve the deterministic and probabilistic prediction skills of individual wake-vortex transport and decay models. The different multimodel ensemble methods are introduced, and their suitability for wake applications is demonstrated. The selected methods include direct ensemble averaging, Bayesian model averaging, and Monte Carlo simulation. The different methodologies are evaluated employing data from wake-vortex field measurement campaigns conducted in the United States and Germany.
Bayesian model evidence as a model evaluation metric
NASA Astrophysics Data System (ADS)
Guthke, Anneli; Höge, Marvin; Nowak, Wolfgang
2017-04-01
When building environmental systems models, we are typically confronted with the questions of how to choose an appropriate model (i.e., which processes to include or neglect) and how to measure its quality. Various metrics have been proposed that shall guide the modeller towards a most robust and realistic representation of the system under study. Criteria for evaluation often address aspects of accuracy (absence of bias) or of precision (absence of unnecessary variance) and need to be combined in a meaningful way in order to address the inherent bias-variance dilemma. We suggest using Bayesian model evidence (BME) as a model evaluation metric that implicitly performs a tradeoff between bias and variance. BME is typically associated with model weights in the context of Bayesian model averaging (BMA). However, it can also be seen as a model evaluation metric in a single-model context or in model comparison. It combines a measure for goodness of fit with a penalty for unjustifiable complexity. Unjustifiable refers to the fact that the appropriate level of model complexity is limited by the amount of information available for calibration. Derived in a Bayesian context, BME naturally accounts for measurement errors in the calibration data as well as for input and parameter uncertainty. BME is therefore perfectly suitable to assess model quality under uncertainty. We will explain in detail and with schematic illustrations what BME measures, i.e. how complexity is defined in the Bayesian setting and how this complexity is balanced with goodness of fit. We will further discuss how BME compares to other model evaluation metrics that address accuracy and precision such as the predictive logscore or other model selection criteria such as the AIC, BIC or KIC. Although computationally more expensive than other metrics or criteria, BME represents an appealing alternative because it provides a global measure of model quality. Even if not applicable to each and every case, we aim at stimulating discussion about how to judge the quality of hydrological models in the presence of uncertainty in general by dissecting the mechanism behind BME.
Prediction-error variance in Bayesian model updating: a comparative study
NASA Astrophysics Data System (ADS)
Asadollahi, Parisa; Li, Jian; Huang, Yong
2017-04-01
In Bayesian model updating, the likelihood function is commonly formulated by stochastic embedding in which the maximum information entropy probability model of prediction error variances plays an important role and it is Gaussian distribution subject to the first two moments as constraints. The selection of prediction error variances can be formulated as a model class selection problem, which automatically involves a trade-off between the average data-fit of the model class and the information it extracts from the data. Therefore, it is critical for the robustness in the updating of the structural model especially in the presence of modeling errors. To date, three ways of considering prediction error variances have been seem in the literature: 1) setting constant values empirically, 2) estimating them based on the goodness-of-fit of the measured data, and 3) updating them as uncertain parameters by applying Bayes' Theorem at the model class level. In this paper, the effect of different strategies to deal with the prediction error variances on the model updating performance is investigated explicitly. A six-story shear building model with six uncertain stiffness parameters is employed as an illustrative example. Transitional Markov Chain Monte Carlo is used to draw samples of the posterior probability density function of the structure model parameters as well as the uncertain prediction variances. The different levels of modeling uncertainty and complexity are modeled through three FE models, including a true model, a model with more complexity, and a model with modeling error. Bayesian updating is performed for the three FE models considering the three aforementioned treatments of the prediction error variances. The effect of number of measurements on the model updating performance is also examined in the study. The results are compared based on model class assessment and indicate that updating the prediction error variances as uncertain parameters at the model class level produces more robust results especially when the number of measurement is small.
A Bayesian Nonparametric Approach to Test Equating
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
Model Diagnostics for Bayesian Networks
ERIC Educational Resources Information Center
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection
Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B
2015-01-01
Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
Chan, Jennifer S K
2016-05-01
Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop-out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user-friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop-out on parameter estimates is evaluated through simulation studies. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian functional integral method for inferring continuous data from discrete measurements.
Heuett, William J; Miller, Bernard V; Racette, Susan B; Holloszy, John O; Chow, Carson C; Periwal, Vipul
2012-02-08
Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Hierarchical Bayesian spatial models for alcohol availability, drug "hot spots" and violent crime.
Zhu, Li; Gorman, Dennis M; Horel, Scott
2006-12-07
Ecologic studies have shown a relationship between alcohol outlet densities, illicit drug use and violence. The present study examined this relationship in the City of Houston, Texas, using a sample of 439 census tracts. Neighborhood sociostructural covariates, alcohol outlet density, drug crime density and violent crime data were collected for the year 2000, and analyzed using hierarchical Bayesian models. Model selection was accomplished by applying the Deviance Information Criterion. The counts of violent crime in each census tract were modelled as having a conditional Poisson distribution. Four neighbourhood explanatory variables were identified using principal component analysis. The best fitted model was selected as the one considering both unstructured and spatial dependence random effects. The results showed that drug-law violation explained a greater amount of variance in violent crime rates than alcohol outlet densities. The relative risk for drug-law violation was 2.49 and that for alcohol outlet density was 1.16. Of the neighbourhood sociostructural covariates, males of age 15 to 24 showed an effect on violence, with a 16% decrease in relative risk for each increase the size of its standard deviation. Both unstructured heterogeneity random effect and spatial dependence need to be included in the model. The analysis presented suggests that activity around illicit drug markets is more strongly associated with violent crime than is alcohol outlet density. Unique among the ecological studies in this field, the present study not only shows the direction and magnitude of impact of neighbourhood sociostructural covariates as well as alcohol and illicit drug activities in a neighbourhood, it also reveals the importance of applying hierarchical Bayesian models in this research field as both spatial dependence and heterogeneity random effects need to be considered simultaneously.
Bayesian Model Averaging for Propensity Score Analysis
ERIC Educational Resources Information Center
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Bayesian-information-gap decision theory with an application to CO 2 sequestration
O'Malley, D.; Vesselinov, V. V.
2015-09-04
Decisions related to subsurface engineering problems such as groundwater management, fossil fuel production, and geologic carbon sequestration are frequently challenging because of an overabundance of uncertainties (related to conceptualizations, parameters, observations, etc.). Because of the importance of these problems to agriculture, energy, and the climate (respectively), good decisions that are scientifically defensible must be made despite the uncertainties. We describe a general approach to making decisions for challenging problems such as these in the presence of severe uncertainties that combines probabilistic and non-probabilistic methods. The approach uses Bayesian sampling to assess parametric uncertainty and Information-Gap Decision Theory (IGDT) to addressmore » model inadequacy. The combined approach also resolves an issue that frequently arises when applying Bayesian methods to real-world engineering problems related to the enumeration of possible outcomes. In the case of zero non-probabilistic uncertainty, the method reduces to a Bayesian method. Lastly, to illustrate the approach, we apply it to a site-selection decision for geologic CO 2 sequestration.« less
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
ESS++: a C++ objected-oriented algorithm for Bayesian stochastic search model exploration
Bottolo, Leonardo; Langley, Sarah R.; Petretto, Enrico; Tiret, Laurence; Tregouet, David; Richardson, Sylvia
2011-01-01
Summary: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the ‘large p, small n’ case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. Availability: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html. Contact: l.bottolo@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21233165
Model-Selection Theory: The Need for a More Nuanced Picture of Use-Novelty and Double-Counting.
Steele, Katie; Werndl, Charlotte
2018-06-01
This article argues that common intuitions regarding (a) the specialness of 'use-novel' data for confirmation and (b) that this specialness implies the 'no-double-counting rule', which says that data used in 'constructing' (calibrating) a model cannot also play a role in confirming the model's predictions, are too crude. The intuitions in question are pertinent in all the sciences, but we appeal to a climate science case study to illustrate what is at stake. Our strategy is to analyse the intuitive claims in light of prominent accounts of confirmation of model predictions. We show that on the Bayesian account of confirmation, and also on the standard classical hypothesis-testing account, claims (a) and (b) are not generally true; but for some select cases, it is possible to distinguish data used for calibration from use-novel data, where only the latter confirm. The more specialized classical model-selection methods, on the other hand, uphold a nuanced version of claim (a), but this comes apart from (b), which must be rejected in favour of a more refined account of the relationship between calibration and confirmation. Thus, depending on the framework of confirmation, either the scope or the simplicity of the intuitive position must be revised. 1 Introduction 2 A Climate Case Study 3 The Bayesian Method vis-à-vis Intuitions 4 Classical Tests vis-à-vis Intuitions 5 Classical Model-Selection Methods vis-à-vis Intuitions 5.1 Introducing classical model-selection methods 5.2 Two cases 6 Re-examining Our Case Study 7 Conclusion .
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Swartz, Michael D; Cai, Yi; Chan, Wenyaw; Symanski, Elaine; Mitchell, Laura E; Danysh, Heather E; Langlois, Peter H; Lupo, Philip J
2015-02-09
While there is evidence that maternal exposure to benzene is associated with spina bifida in offspring, to our knowledge there have been no assessments to evaluate the role of multiple hazardous air pollutants (HAPs) simultaneously on the risk of this relatively common birth defect. In the current study, we evaluated the association between maternal exposure to HAPs identified by the United States Environmental Protection Agency (U.S. EPA) and spina bifida in offspring using hierarchical Bayesian modeling that includes Stochastic Search Variable Selection (SSVS). The Texas Birth Defects Registry provided data on spina bifida cases delivered between 1999 and 2004. The control group was a random sample of unaffected live births, frequency matched to cases on year of birth. Census tract-level estimates of annual HAP levels were obtained from the U.S. EPA's 1999 Assessment System for Population Exposure Nationwide. Using the distribution among controls, exposure was categorized as high exposure (>95(th) percentile), medium exposure (5(th)-95(th) percentile), and low exposure (<5(th) percentile, reference). We used hierarchical Bayesian logistic regression models with SSVS to evaluate the association between HAPs and spina bifida by computing an odds ratio (OR) for each HAP using the posterior mean, and a 95% credible interval (CI) using the 2.5(th) and 97.5(th) quantiles of the posterior samples. Based on previous assessments, any pollutant with a Bayes factor greater than 1 was selected for inclusion in a final model. Twenty-five HAPs were selected in the final analysis to represent "bins" of highly correlated HAPs (ρ > 0.80). We identified two out of 25 HAPs with a Bayes factor greater than 1: quinoline (ORhigh = 2.06, 95% CI: 1.11-3.87, Bayes factor = 1.01) and trichloroethylene (ORmedium = 2.00, 95% CI: 1.14-3.61, Bayes factor = 3.79). Overall there is evidence that quinoline and trichloroethylene may be significant contributors to the risk of spina bifida. Additionally, the use of Bayesian hierarchical models with SSVS is an alternative approach in the evaluation of multiple environmental pollutants on disease risk. This approach can be easily extended to environmental exposures, where novel approaches are needed in the context of multi-pollutant modeling.
Funamizu, Akihiro; Ito, Makoto; Doya, Kenji; Kanzaki, Ryohei; Takahashi, Hirokazu
2012-01-01
The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20–549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate. PMID:22487046
Bayesian evidence and predictivity of the inflationary paradigm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gubitosi, Giulia; Lagos, Macarena; Magueijo, João
In this paper we consider the issue of paradigm evaluation by applying Bayes' theorem along the following nested hierarchy of progressively more complex structures: i) parameter estimation (within a model), ii) model selection and comparison (within a paradigm), iii) paradigm evaluation. In such a hierarchy the Bayesian evidence works both as the posterior's normalization at a given level and as the likelihood function at the next level up. Whilst raising no objections to the standard application of the procedure at the two lowest levels, we argue that it should receive a considerable modification when evaluating paradigms, when testability and fittingmore » data are equally important. By considering toy models we illustrate how models and paradigms that are difficult to falsify are always favoured by the Bayes factor. We argue that the evidence for a paradigm should not only be high for a given dataset, but exceptional with respect to what it would have been, had the data been different. With this motivation we propose a measure which we term predictivity , as well as a prior to be incorporated into the Bayesian framework, penalising unpredictivity as much as not fitting data. We apply this measure to inflation seen as a whole, and to a scenario where a specific inflationary model is hypothetically deemed as the only one viable as a result of information alien to cosmology (e.g. Solar System gravity experiments, or particle physics input). We conclude that cosmic inflation is currently hard to falsify, but that this could change were external/additional information to cosmology to select one of its many models. We also compare this state of affairs to bimetric varying speed of light cosmology.« less
A Bayesian Multilevel Model for Microcystin Prediction in ...
The frequency of cyanobacteria blooms in North American lakes is increasing. A major concern with rising cyanobacteria blooms is microcystin, a common cyanobacterial hepatotoxin. To explore the conditions that promote high microcystin concentrations, we analyzed the US EPA National Lake Assessment (NLA) dataset collected in the summer of 2007. The NLA dataset is reported for nine eco-regions. We used the results of random forest modeling as a means ofvariable selection from which we developed a Bayesian multilevel model of microcystin concentrations. Model parameters under a multilevel modeling framework are eco-region specific, butthey are also assumed to be exchangeable across eco-regions for broad continental scaling. The exchangeability assumption ensures that both the common patterns and eco-region specific features will be reflected in the model. Furthermore, the method incorporates appropriate estimates of uncertainty. Our preliminary results show associations between microcystin and turbidity, total nutrients, and N:P ratios. Upon release of a comparable 2012 NLA dataset, we will apply Bayesian updating. The results will help develop management strategies to alleviate microcystin impacts and improve lake quality. This work provides a probabilistic framework for predicting microcystin presences in lakes. It would allow for insights to be made about how changes in nutrient concentrations could potentially change toxin levels.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
NASA Astrophysics Data System (ADS)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less
Digest: Demographic inferences accounting for selection at linked sites†.
Simon, Alexis; Duranton, Maud
2018-05-16
Complex demography and selection at linked sites can generate spurious signatures of divergent selection. Unfortunately, many attempts at demographic inference consider overly simple models and neglect the effect of selection at linked sites. In this issue, Rougemont and Bernatchez (2018) applied an approximate Bayesian computation (ABC) framework that accounts for indirect selection to reveal a complex history of secondary contacts in Atlantic salmon (Salmo salar) that might explain a high rate of latitudinal clines in this species. © 2018 The Author(s). Evolution © 2018 The Society for the Study of Evolution.
Bayesian inference of selection in a heterogeneous environment from genetic time-series data.
Gompert, Zachariah
2016-01-01
Evolutionary geneticists have sought to characterize the causes and molecular targets of selection in natural populations for many years. Although this research programme has been somewhat successful, most statistical methods employed were designed to detect consistent, weak to moderate selection. In contrast, phenotypic studies in nature show that selection varies in time and that individual bouts of selection can be strong. Measurements of the genomic consequences of such fluctuating selection could help test and refine hypotheses concerning the causes of ecological specialization and the maintenance of genetic variation in populations. Herein, I proposed a Bayesian nonhomogeneous hidden Markov model to estimate effective population sizes and quantify variable selection in heterogeneous environments from genetic time-series data. The model is described and then evaluated using a series of simulated data, including cases where selection occurs on a trait with a simple or polygenic molecular basis. The proposed method accurately distinguished neutral loci from non-neutral loci under strong selection, but not from those under weak selection. Selection coefficients were accurately estimated when selection was constant or when the fitness values of genotypes varied linearly with the environment, but these estimates were less accurate when fitness was polygenic or the relationship between the environment and the fitness of genotypes was nonlinear. Past studies of temporal evolutionary dynamics in laboratory populations have been remarkably successful. The proposed method makes similar analyses of genetic time-series data from natural populations more feasible and thereby could help answer fundamental questions about the causes and consequences of evolution in the wild. © 2015 John Wiley & Sons Ltd.
Mossadegh, Somayyeh; He, Shan; Parker, Paul
2016-05-01
Various injury severity scores exist for trauma; it is known that they do not correlate accurately to military injuries. A promising anatomical scoring system for blast pelvic and perineal injury led to the development of an improved scoring system using machine-learning techniques. An unbiased genetic algorithm selected optimal anatomical and physiological parameters from 118 military cases. A Naïve Bayesian model was built using the proposed parameters to predict the probability of survival. Ten-fold cross validation was employed to evaluate its performance. Our model significantly out-performed Injury Severity Score (ISS), Trauma ISS, New ISS, and the Revised Trauma Score in virtually all areas; positive predictive value 0.8941, specificity 0.9027, accuracy 0.9056, and area under curve 0.9059. A two-sample t test showed that the predictive performance of the proposed scoring system was significantly better than the other systems (p < 0.001). With limited resources and the simplest of Bayesian methodologies, we have demonstrated that the Naïve Bayesian model performed significantly better in virtually all areas assessed by current scoring systems used for trauma. This is encouraging and highlights that more can be done to improve trauma systems not only for our military injured, but also for civilian trauma victims. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Data-driven confounder selection via Markov and Bayesian networks.
Häggström, Jenny
2018-06-01
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, X, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, for example, a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given X and in a setting where unconfoundedness only holds given subsets of X. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given X, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation. © 2017, The International Biometric Society.
Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S
2015-08-07
Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Model-Selection Theory: The Need for a More Nuanced Picture of Use-Novelty and Double-Counting
Steele, Katie; Werndl, Charlotte
2018-01-01
Abstract This article argues that common intuitions regarding (a) the specialness of ‘use-novel’ data for confirmation and (b) that this specialness implies the ‘no-double-counting rule’, which says that data used in ‘constructing’ (calibrating) a model cannot also play a role in confirming the model’s predictions, are too crude. The intuitions in question are pertinent in all the sciences, but we appeal to a climate science case study to illustrate what is at stake. Our strategy is to analyse the intuitive claims in light of prominent accounts of confirmation of model predictions. We show that on the Bayesian account of confirmation, and also on the standard classical hypothesis-testing account, claims (a) and (b) are not generally true; but for some select cases, it is possible to distinguish data used for calibration from use-novel data, where only the latter confirm. The more specialized classical model-selection methods, on the other hand, uphold a nuanced version of claim (a), but this comes apart from (b), which must be rejected in favour of a more refined account of the relationship between calibration and confirmation. Thus, depending on the framework of confirmation, either the scope or the simplicity of the intuitive position must be revised. 1 Introduction2 A Climate Case Study3 The Bayesian Method vis-à-vis Intuitions4 Classical Tests vis-à-vis Intuitions5 Classical Model-Selection Methods vis-à-vis Intuitions 5.1 Introducing classical model-selection methods 5.2 Two cases6 Re-examining Our Case Study7 Conclusion PMID:29780170
Bayesian model reduction and empirical Bayes for group (DCM) studies
Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter
2016-01-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Variational Bayesian Learning for Wavelet Independent Component Analysis
NASA Astrophysics Data System (ADS)
Roussos, E.; Roberts, S.; Daubechies, I.
2005-11-01
In an exploratory approach to data analysis, it is often useful to consider the observations as generated from a set of latent generators or "sources" via a generally unknown mapping. For the noisy overcomplete case, where we have more sources than observations, the problem becomes extremely ill-posed. Solutions to such inverse problems can, in many cases, be achieved by incorporating prior knowledge about the problem, captured in the form of constraints. This setting is a natural candidate for the application of the Bayesian methodology, allowing us to incorporate "soft" constraints in a natural manner. The work described in this paper is mainly driven by problems in functional magnetic resonance imaging of the brain, for the neuro-scientific goal of extracting relevant "maps" from the data. This can be stated as a `blind' source separation problem. Recent experiments in the field of neuroscience show that these maps are sparse, in some appropriate sense. The separation problem can be solved by independent component analysis (ICA), viewed as a technique for seeking sparse components, assuming appropriate distributions for the sources. We derive a hybrid wavelet-ICA model, transforming the signals into a domain where the modeling assumption of sparsity of the coefficients with respect to a dictionary is natural. We follow a graphical modeling formalism, viewing ICA as a probabilistic generative model. We use hierarchical source and mixing models and apply Bayesian inference to the problem. This allows us to perform model selection in order to infer the complexity of the representation, as well as automatic denoising. Since exact inference and learning in such a model is intractable, we follow a variational Bayesian mean-field approach in the conjugate-exponential family of distributions, for efficient unsupervised learning in multi-dimensional settings. The performance of the proposed algorithm is demonstrated on some representative experiments.
Zador, Zsolt; Sperrin, Matthew; King, Andrew T
2016-01-01
Traumatic brain injury remains a global health problem. Understanding the relative importance of outcome predictors helps optimize our treatment strategies by informing assessment protocols, clinical decisions and trial designs. In this study we establish importance ranking for outcome predictors based on receiver operating indices to identify key predictors of outcome and create simple predictive models. We then explore the associations between key outcome predictors using Bayesian networks to gain further insight into predictor importance. We analyzed the corticosteroid randomization after significant head injury (CRASH) trial database of 10008 patients and included patients for whom demographics, injury characteristics, computer tomography (CT) findings and Glasgow Outcome Scale (GCS) were recorded (total of 13 predictors, which would be available to clinicians within a few hours following the injury in 6945 patients). Predictions of clinical outcome (death or severe disability at 6 months) were performed using logistic regression models with 5-fold cross validation. Predictive performance was measured using standardized partial area (pAUC) under the receiver operating curve (ROC) and we used Delong test for comparisons. Variable importance ranking was based on pAUC targeted at specificity (pAUCSP) and sensitivity (pAUCSE) intervals of 90-100%. Probabilistic associations were depicted using Bayesian networks. Complete AUC analysis showed very good predictive power (AUC = 0.8237, 95% CI: 0.8138-0.8336) for the complete model. Specificity focused importance ranking highlighted age, pupillary, motor responses, obliteration of basal cisterns/3rd ventricle and midline shift. Interestingly when targeting model sensitivity, the highest-ranking variables were age, severe extracranial injury, verbal response, hematoma on CT and motor response. Simplified models, which included only these key predictors, had similar performance (pAUCSP = 0.6523, 95% CI: 0.6402-0.6641 and pAUCSE = 0.6332, 95% CI: 0.62-0.6477) compared to the complete models (pAUCSP = 0.6664, 95% CI: 0.6543-0.679, pAUCSE = 0.6436, 95% CI: 0.6289-0.6585, de Long p value 0.1165 and 0.3448 respectively). Bayesian networks showed the predictors that did not feature in the simplified models were associated with those that did. We demonstrate that importance based variable selection allows simplified predictive models to be created while maintaining prediction accuracy. Variable selection targeting specificity confirmed key components of clinical assessment in TBI whereas sensitivity based ranking suggested extracranial injury as one of the important predictors. These results help refine our approach to head injury assessment, decision-making and outcome prediction targeted at model sensitivity and specificity. Bayesian networks proved to be a comprehensive tool for depicting probabilistic associations for key predictors giving insight into why the simplified model has maintained accuracy.
Bayesian Analysis of High Dimensional Classification
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Subhadeep; Liang, Faming
2009-12-01
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
A Bayesian approach to estimate the biomass of anchovies off the coast of Perú.
Quiroz, Zaida C; Prates, Marcos O; Rue, Håvard
2015-03-01
The Northern Humboldt Current System (NHCS) is the world's most productive ecosystem in terms of fish. In particular, the Peruvian anchovy (Engraulis ringens) is the major prey of the main top predators, like seabirds, fish, humans, and other mammals. In this context, it is important to understand the dynamics of the anchovy distribution to preserve it as well as to exploit its economic capacities. Using the data collected by the "Instituto del Mar del Perú" (IMARPE) during a scientific survey in 2005, we present a statistical analysis that has as main goals: (i) to adapt to the characteristics of the sampled data, such as spatial dependence, high proportions of zeros and big size of samples; (ii) to provide important insights on the dynamics of the anchovy population; and (iii) to propose a model for estimation and prediction of anchovy biomass in the NHCS offshore from Perú. These data were analyzed in a Bayesian framework using the integrated nested Laplace approximation (INLA) method. Further, to select the best model and to study the predictive power of each model, we performed model comparisons and predictive checks, respectively. Finally, we carried out a Bayesian spatial influence diagnostic for the preferred model. © 2014, The International Biometric Society.
Nagy, László G; Urban, Alexander; Orstadius, Leif; Papp, Tamás; Larsson, Ellen; Vágvölgyi, Csaba
2010-12-01
Recently developed comparative phylogenetic methods offer a wide spectrum of applications in evolutionary biology, although it is generally accepted that their statistical properties are incompletely known. Here, we examine and compare the statistical power of the ML and Bayesian methods with regard to selection of best-fit models of fruiting-body evolution and hypothesis testing of ancestral states on a real-life data set of a physiological trait (autodigestion) in the family Psathyrellaceae. Our phylogenies are based on the first multigene data set generated for the family. Two different coding regimes (binary and multistate) and two data sets differing in taxon sampling density are examined. The Bayesian method outperformed Maximum Likelihood with regard to statistical power in all analyses. This is particularly evident if the signal in the data is weak, i.e. in cases when the ML approach does not provide support to choose among competing hypotheses. Results based on binary and multistate coding differed only modestly, although it was evident that multistate analyses were less conclusive in all cases. It seems that increased taxon sampling density has favourable effects on inference of ancestral states, while model parameters are influenced to a smaller extent. The model best fitting our data implies that the rate of losses of deliquescence equals zero, although model selection in ML does not provide proper support to reject three of the four candidate models. The results also support the hypothesis that non-deliquescence (lack of autodigestion) has been ancestral in Psathyrellaceae, and that deliquescent fruiting bodies represent the preferred state, having evolved independently several times during evolution. Copyright © 2010 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.
2012-01-01
The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.
NASA Astrophysics Data System (ADS)
Li, Lu; Xu, Chong-Yu; Engeland, Kolbjørn
2013-04-01
SummaryWith respect to model calibration, parameter estimation and analysis of uncertainty sources, various regression and probabilistic approaches are used in hydrological modeling. A family of Bayesian methods, which incorporates different sources of information into a single analysis through Bayes' theorem, is widely used for uncertainty assessment. However, none of these approaches can well treat the impact of high flows in hydrological modeling. This study proposes a Bayesian modularization uncertainty assessment approach in which the highest streamflow observations are treated as suspect information that should not influence the inference of the main bulk of the model parameters. This study includes a comprehensive comparison and evaluation of uncertainty assessments by our new Bayesian modularization method and standard Bayesian methods using the Metropolis-Hastings (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions were used in combination with standard Bayesian method: the AR(1) plus Normal model independent of time (Model 1), the AR(1) plus Normal model dependent on time (Model 2) and the AR(1) plus Multi-normal model (Model 3). The results reveal that the Bayesian modularization method provides the most accurate streamflow estimates measured by the Nash-Sutcliffe efficiency and provide the best in uncertainty estimates for low, medium and entire flows compared to standard Bayesian methods. The study thus provides a new approach for reducing the impact of high flows on the discharge uncertainty assessment of hydrological models via Bayesian method.
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Dynamics of attentional selection under conflict: toward a rational Bayesian account.
Yu, Angela J; Dayan, Peter; Cohen, Jonathan D
2009-06-01
The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. The authors' goal is to construct a rational account for why attentional control appears suboptimal under conditions of conflict and what this implies about the underlying computational principles. The formal framework used is based on Bayesian probability theory, which provides a convenient language for delineating the rationale and dynamics of attentional selection. The authors illustrate these issues with the Eriksen flanker task, a classical paradigm that explores the effects of competing sensory inputs on response tendencies. The authors show how 2 distinctly formulated models, based on compatibility bias and spatial uncertainty principles, can account for the behavioral data. They also suggest novel experiments that may differentiate these models. In addition, they elaborate a simplified model that approximates optimal computation and may map more directly onto the underlying neural machinery. This approximate model uses conflict monitoring, putatively mediated by the anterior cingulate cortex, as a proxy for compatibility representation. The authors also consider how this conflict information might be disseminated and used to control processing. (c) 2009 APA, all rights reserved.
NASA Astrophysics Data System (ADS)
Olson, R.; Evans, J. P.; Fan, Y.
2015-12-01
NARCliM (NSW/ACT Regional Climate Modelling Project) is a regional climate project for Australia and the surrounding region. It dynamically downscales 4 General Circulation Models (GCMs) using three Regional Climate Models (RCMs) to provide climate projections for the CORDEX-AustralAsia region at 50 km resolution, and for south-east Australia at 10 km resolution. The project differs from previous work in the level of sophistication of model selection. Specifically, the selection process for GCMs included (i) conducting literature review to evaluate model performance, (ii) analysing model independence, and (iii) selecting models that span future temperature and precipitation change space. RCMs for downscaling the GCMs were chosen based on their performance for several precipitation events over South-East Australia, and on model independence.Bayesian Model Averaging (BMA) provides a statistically consistent framework for weighing the models based on their likelihood given the available observations. These weights are used to provide probability distribution functions (pdfs) for model projections. We develop a BMA framework for constructing probabilistic climate projections for spatially-averaged variables from the NARCliM project. The first step in the procedure is smoothing model output in order to exclude the influence of internal climate variability. Our statistical model for model-observations residuals is a homoskedastic iid process. Comparing RCMs with Australian Water Availability Project (AWAP) observations is used to determine model weights through Monte Carlo integration. Posterior pdfs of statistical parameters of model-data residuals are obtained using Markov Chain Monte Carlo. The uncertainty in the properties of the model-data residuals is fully accounted for when constructing the projections. We present the preliminary results of the BMA analysis for yearly maximum temperature for New South Wales state planning regions for the period 2060-2079.
Resolution analysis of marine seismic full waveform data by Bayesian inversion
NASA Astrophysics Data System (ADS)
Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.
2015-12-01
The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.
Bayesian model reduction and empirical Bayes for group (DCM) studies.
Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter
2016-03-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Frost, Andrew J.; Thyer, Mark A.; Srikanthan, R.; Kuczera, George
2007-07-01
SummaryMulti-site simulation of hydrological data are required for drought risk assessment of large multi-reservoir water supply systems. In this paper, a general Bayesian framework is presented for the calibration and evaluation of multi-site hydrological data at annual timescales. Models included within this framework are the hidden Markov model (HMM) and the widely used lag-1 autoregressive (AR(1)) model. These models are extended by the inclusion of a Box-Cox transformation and a spatial correlation function in a multi-site setting. Parameter uncertainty is evaluated using Markov chain Monte Carlo techniques. Models are evaluated by their ability to reproduce a range of important extreme statistics and compared using Bayesian model selection techniques which evaluate model probabilities. The case study, using multi-site annual rainfall data situated within catchments which contribute to Sydney's main water supply, provided the following results: Firstly, in terms of model probabilities and diagnostics, the inclusion of the Box-Cox transformation was preferred. Secondly the AR(1) and HMM performed similarly, while some other proposed AR(1)/HMM models with regionally pooled parameters had greater posterior probability than these two models. The practical significance of parameter and model uncertainty was illustrated using a case study involving drought security analysis for urban water supply. It was shown that ignoring parameter uncertainty resulted in a significant overestimate of reservoir yield and an underestimation of system vulnerability to severe drought.
Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A.; Valdés-Hernández, Pedro A.; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A.
2017-01-01
The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website. PMID:29200994
Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A; Valdés-Hernández, Pedro A; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A
2017-01-01
The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website.
Houngbedji, Clarisse A; Chammartin, Frédérique; Yapi, Richard B; Hürlimann, Eveline; N'Dri, Prisca B; Silué, Kigbafori D; Soro, Gotianwa; Koudou, Benjamin G; Assi, Serge-Brice; N'Goran, Eliézer K; Fantodji, Agathe; Utzinger, Jürg; Vounatsou, Penelope; Raso, Giovanna
2016-09-07
In Côte d'Ivoire, malaria remains a major public health issue, and thus a priority to be tackled. The aim of this study was to identify spatially explicit indicators of Plasmodium falciparum infection among school-aged children and to undertake a model-based spatial prediction of P. falciparum infection risk using environmental predictors. A cross-sectional survey was conducted, including parasitological examinations and interviews with more than 5,000 children from 93 schools across Côte d'Ivoire. A finger-prick blood sample was obtained from each child to determine Plasmodium species-specific infection and parasitaemia using Giemsa-stained thick and thin blood films. Household socioeconomic status was assessed through asset ownership and household characteristics. Children were interviewed for preventive measures against malaria. Environmental data were gathered from satellite images and digitized maps. A Bayesian geostatistical stochastic search variable selection procedure was employed to identify factors related to P. falciparum infection risk. Bayesian geostatistical logistic regression models were used to map the spatial distribution of P. falciparum infection and to predict the infection prevalence at non-sampled locations via Bayesian kriging. Complete data sets were available from 5,322 children aged 5-16 years across Côte d'Ivoire. P. falciparum was the predominant species (94.5 %). The Bayesian geostatistical variable selection procedure identified land cover and socioeconomic status as important predictors for infection risk with P. falciparum. Model-based prediction identified high P. falciparum infection risk in the north, central-east, south-east, west and south-west of Côte d'Ivoire. Low-risk areas were found in the south-eastern area close to Abidjan and the south-central and west-central part of the country. The P. falciparum infection risk and related uncertainty estimates for school-aged children in Côte d'Ivoire represent the most up-to-date malaria risk maps. These tools can be used for spatial targeting of malaria control interventions.
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
Simultaneous Optimization of Decisions Using a Linear Utility Function.
ERIC Educational Resources Information Center
Vos, Hans J.
1990-01-01
An approach is presented to simultaneously optimize decision rules for combinations of elementary decisions through a framework derived from Bayesian decision theory. The developed linear utility model for selection-mastery decisions was applied to a sample of 43 first year medical students to illustrate the procedure. (SLD)
USING BAYESIAN SPATIAL MODELS TO FACILITATE WATER QUALITY MONITORING
The Clean Water Act of 1972 requires states to monitor the quality of their surface water. The number of sites sampled on streams and rivers varies widely by state. A few states are now using probability survey designs to select sites, while most continue to rely on other proce...
Stochastic Modeling of the Environmental Impacts of the Mingtang Tunneling Project
NASA Astrophysics Data System (ADS)
Li, Xiaojun; Li, Yandong; Chang, Ching-Fu; Chen, Ziyang; Tan, Benjamin Zhi Wen; Sege, Jon; Wang, Changhong; Rubin, Yoram
2017-04-01
This paper investigates the environmental impacts of a major tunneling project in China. Of particular interest is the drawdown of the water table, due to its potential impacts on ecosystem health and on agricultural activity. Due to scarcity of data, the study pursues a Bayesian stochastic approach, which is built around a numerical model. We adopted the Bayesian approach with the goal of deriving the posterior distributions of the dependent variables conditional on local data. The choice of the Bayesian approach for this study is somewhat non-trivial because of the scarcity of in-situ measurements. The thought guiding this selection is that prior distributions for the model input variables are valuable tools even if that all inputs are available, the Bayesian approach could provide a good starting point for further updates as and if additional data becomes available. To construct effective priors, a systematic approach was developed and implemented for constructing informative priors based on other, well-documented sites which bear geological and hydrological similarity to the target site (the Mingtang tunneling project). The approach is built around two classes of similarity criteria: a physically-based set of criteria and an additional set covering epistemic criteria. The prior construction strategy was implemented for the hydraulic conductivity of various types of rocks at the site (Granite and Gneiss) and for modeling the geometry and conductivity of the fault zones. Additional elements of our strategy include (1) modeling the water table through bounding surfaces representing upper and lower limits, and (2) modeling the effective conductivity as a random variable (varying between realizations, not in space). The approach was tested successfully against its ability to predict the tunnel infiltration fluxes and against observations of drying soils.
Funamizu, Akihiro; Ito, Makoto; Doya, Kenji; Kanzaki, Ryohei; Takahashi, Hirokazu
2012-04-01
The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20-549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty
Baele, Guy; Lemey, Philippe; Suchard, Marc A.
2016-01-01
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of “working distributions” to facilitate—or shorten—the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a “working” distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different “working” distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses. PMID:26526428
Aerosol-type retrieval and uncertainty quantification from OMI data
NASA Astrophysics Data System (ADS)
Kauppi, Anu; Kolmonen, Pekka; Laine, Marko; Tamminen, Johanna
2017-11-01
We discuss uncertainty quantification for aerosol-type selection in satellite-based atmospheric aerosol retrieval. The retrieval procedure uses precalculated aerosol microphysical models stored in look-up tables (LUTs) and top-of-atmosphere (TOA) spectral reflectance measurements to solve the aerosol characteristics. The forward model approximations cause systematic differences between the modelled and observed reflectance. Acknowledging this model discrepancy as a source of uncertainty allows us to produce more realistic uncertainty estimates and assists the selection of the most appropriate LUTs for each individual retrieval.This paper focuses on the aerosol microphysical model selection and characterisation of uncertainty in the retrieved aerosol type and aerosol optical depth (AOD). The concept of model evidence is used as a tool for model comparison. The method is based on Bayesian inference approach, in which all uncertainties are described as a posterior probability distribution. When there is no single best-matching aerosol microphysical model, we use a statistical technique based on Bayesian model averaging to combine AOD posterior probability densities of the best-fitting models to obtain an averaged AOD estimate. We also determine the shared evidence of the best-matching models of a certain main aerosol type in order to quantify how plausible it is that it represents the underlying atmospheric aerosol conditions.The developed method is applied to Ozone Monitoring Instrument (OMI) measurements using a multiwavelength approach for retrieving the aerosol type and AOD estimate with uncertainty quantification for cloud-free over-land pixels. Several larger pixel set areas were studied in order to investigate the robustness of the developed method. We evaluated the retrieved AOD by comparison with ground-based measurements at example sites. We found that the uncertainty of AOD expressed by posterior probability distribution reflects the difficulty in model selection. The posterior probability distribution can provide a comprehensive characterisation of the uncertainty in this kind of problem for aerosol-type selection. As a result, the proposed method can account for the model error and also include the model selection uncertainty in the total uncertainty budget.
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
ERIC Educational Resources Information Center
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Discriminative Bayesian Dictionary Learning for Classification.
Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal
2016-12-01
We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.
Allele frequency changes due to hitch-hiking in genomic selection programs
2014-01-01
Background Genomic selection makes it possible to reduce pedigree-based inbreeding over best linear unbiased prediction (BLUP) by increasing emphasis on own rather than family information. However, pedigree inbreeding might not accurately reflect loss of genetic variation and the true level of inbreeding due to changes in allele frequencies and hitch-hiking. This study aimed at understanding the impact of using long-term genomic selection on changes in allele frequencies, genetic variation and level of inbreeding. Methods Selection was performed in simulated scenarios with a population of 400 animals for 25 consecutive generations. Six genetic models were considered with different heritabilities and numbers of QTL (quantitative trait loci) affecting the trait. Four selection criteria were used, including selection on own phenotype and on estimated breeding values (EBV) derived using phenotype-BLUP, genomic BLUP and Bayesian Lasso. Changes in allele frequencies at QTL, markers and linked neutral loci were investigated for the different selection criteria and different scenarios, along with the loss of favourable alleles and the rate of inbreeding measured by pedigree and runs of homozygosity. Results For each selection criterion, hitch-hiking in the vicinity of the QTL appeared more extensive when accuracy of selection was higher and the number of QTL was lower. When inbreeding was measured by pedigree information, selection on genomic BLUP EBV resulted in lower levels of inbreeding than selection on phenotype BLUP EBV, but this did not always apply when inbreeding was measured by runs of homozygosity. Compared to genomic BLUP, selection on EBV from Bayesian Lasso led to less genetic drift, reduced loss of favourable alleles and more effectively controlled the rate of both pedigree and genomic inbreeding in all simulated scenarios. In addition, selection on EBV from Bayesian Lasso showed a higher selection differential for mendelian sampling terms than selection on genomic BLUP EBV. Conclusions Neutral variation can be shaped to a great extent by the hitch-hiking effects associated with selection, rather than just by genetic drift. When implementing long-term genomic selection, strategies for genomic control of inbreeding are essential, due to a considerable hitch-hiking effect, regardless of the method that is used for prediction of EBV. PMID:24495634
Fan, Yue; Wang, Xiao; Peng, Qinke
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Bayesian Peak Picking for NMR Spectra
Cheng, Yichen; Gao, Xin; Liang, Faming
2013-01-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method. PMID:24184964
NASA Astrophysics Data System (ADS)
Skilling, John
2005-11-01
This tutorial gives a basic overview of Bayesian methodology, from its axiomatic foundation through the conventional development of data analysis and model selection to its rôle in quantum mechanics, and ending with some comments on inference in general human affairs. The central theme is that probability calculus is the unique language within which we can develop models of our surroundings that have predictive capability. These models are patterns of belief; there is no need to claim external reality. 1. Logic and probability 2. Probability and inference 3. Probability and model selection 4. Prior probabilities 5. Probability and frequency 6. Probability and quantum mechanics 7. Probability and fundamentalism 8. Probability and deception 9. Prediction and truth
A Bayesian model for estimating population means using a link-tracing sampling design.
St Clair, Katherine; O'Connell, Daniel
2012-03-01
Link-tracing sampling designs can be used to study human populations that contain "hidden" groups who tend to be linked together by a common social trait. These links can be used to increase the sampling intensity of a hidden domain by tracing links from individuals selected in an initial wave of sampling to additional domain members. Chow and Thompson (2003, Survey Methodology 29, 197-205) derived a Bayesian model to estimate the size or proportion of individuals in the hidden population for certain link-tracing designs. We propose an addition to their model that will allow for the modeling of a quantitative response. We assess properties of our model using a constructed population and a real population of at-risk individuals, both of which contain two domains of hidden and nonhidden individuals. Our results show that our model can produce good point and interval estimates of the population mean and domain means when our population assumptions are satisfied. © 2011, The International Biometric Society.
Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir
2018-04-05
The concordance correlation coefficient (CCC) is a widely used scaled index in the study of agreement. In this article, we propose estimating the CCC by a unified Bayesian framework that can (1) accommodate symmetric or asymmetric and light- or heavy-tailed data; (2) select model from several candidates; and (3) address other issues frequently encountered in practice such as confounding covariates and missing data. The performance of the proposal was studied and demonstrated using simulated as well as real-life biomarker data from a clinical study of an insomnia drug. The implementation of the proposal is accessible through a package in the Comprehensive R Archive Network.
A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.
ERIC Educational Resources Information Center
McKinley, Robert L.; Reckase, Mark D.
A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…
Bayesian LASSO, scale space and decision making in association genetics.
Pasanen, Leena; Holmström, Lasse; Sillanpää, Mikko J
2015-01-01
LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection. We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
NASA Astrophysics Data System (ADS)
Gong, Maozhen
Selecting an appropriate prior distribution is a fundamental issue in Bayesian Statistics. In this dissertation, under the framework provided by Berger and Bernardo, I derive the reference priors for several models which include: Analysis of Variance (ANOVA)/Analysis of Covariance (ANCOVA) models with a categorical variable under common ordering constraints, the conditionally autoregressive (CAR) models and the simultaneous autoregressive (SAR) models with a spatial autoregression parameter rho considered. The performances of reference priors for ANOVA/ANCOVA models are evaluated by simulation studies with comparisons to Jeffreys' prior and Least Squares Estimation (LSE). The priors are then illustrated in a Bayesian model of the "Risk of Type 2 Diabetes in New Mexico" data, where the relationship between the type 2 diabetes risk (through Hemoglobin A1c) and different smoking levels is investigated. In both simulation studies and real data set modeling, the reference priors that incorporate internal order information show good performances and can be used as default priors. The reference priors for the CAR and SAR models are also illustrated in the "1999 SAT State Average Verbal Scores" data with a comparison to a Uniform prior distribution. Due to the complexity of the reference priors for both CAR and SAR models, only a portion (12 states in the Midwest) of the original data set is considered. The reference priors can give a different marginal posterior distribution compared to a Uniform prior, which provides an alternative for prior specifications for areal data in Spatial statistics.
Simple summation rule for optimal fixation selection in visual search.
Najemnik, Jiri; Geisler, Wilson S
2009-06-01
When searching for a known target in a natural texture, practiced humans achieve near-optimal performance compared to a Bayesian ideal searcher constrained with the human map of target detectability across the visual field [Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387-391]. To do so, humans must be good at choosing where to fixate during the search [Najemnik, J., & Geisler, W.S. (2008). Eye movement statistics in humans are consistent with an optimal strategy. Journal of Vision, 8(3), 1-14. 4]; however, it seems unlikely that a biological nervous system would implement the computations for the Bayesian ideal fixation selection because of their complexity. Here we derive and test a simple heuristic for optimal fixation selection that appears to be a much better candidate for implementation within a biological nervous system. Specifically, we show that the near-optimal fixation location is the maximum of the current posterior probability distribution for target location after the distribution is filtered by (convolved with) the square of the retinotopic target detectability map. We term the model that uses this strategy the entropy limit minimization (ELM) searcher. We show that when constrained with human-like retinotopic map of target detectability and human search error rates, the ELM searcher performs as well as the Bayesian ideal searcher, and produces fixation statistics similar to human.
The decisive future of inflation
NASA Astrophysics Data System (ADS)
Hardwick, Robert J.; Vennin, Vincent; Wands, David
2018-05-01
How much more will we learn about single-field inflationary models in the future? We address this question in the context of Bayesian design and information theory. We develop a novel method to compute the expected utility of deciding between models and apply it to a set of futuristic measurements. This necessarily requires one to evaluate the Bayesian evidence many thousands of times over, which is numerically challenging. We show how this can be done using a number of simplifying assumptions and discuss their validity. We also modify the form of the expected utility, as previously introduced in the literature in different contexts, in order to partition each possible future into either the rejection of models at the level of the maximum likelihood or the decision between models using Bayesian model comparison. We then quantify the ability of future experiments to constrain the reheating temperature and the scalar running. Our approach allows us to discuss possible strategies for maximising information from future cosmological surveys. In particular, our conclusions suggest that, in the context of inflationary model selection, a decrease in the measurement uncertainty of the scalar spectral index would be more decisive than a decrease in the uncertainty in the tensor-to-scalar ratio. We have incorporated our approach into a publicly available python class, foxi,1 that can be readily applied to any survey optimisation problem.
On the Adequacy of Bayesian Evaluations of Categorization Models: Reply to Vanpaemel and Lee (2012)
ERIC Educational Resources Information Center
Wills, Andy J.; Pothos, Emmanuel M.
2012-01-01
Vanpaemel and Lee (2012) argued, and we agree, that the comparison of formal models can be facilitated by Bayesian methods. However, Bayesian methods neither precede nor supplant our proposals (Wills & Pothos, 2012), as Bayesian methods can be applied both to our proposals and to their polar opposites. Furthermore, the use of Bayesian methods to…
Uncertainty aggregation and reduction in structure-material performance prediction
NASA Astrophysics Data System (ADS)
Hu, Zhen; Mahadevan, Sankaran; Ao, Dan
2018-02-01
An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.
Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A; Lu, Zhong-Lin; Myung, Jay I
2016-01-01
Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias.
Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A.; Lu, Zhong-Lin; Myung, Jay I.
2016-01-01
Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias. PMID:27105061
2012-09-01
make end of life ( EOL ) and remaining useful life (RUL) estimations. Model-based prognostics approaches perform these tasks with the help of first...in parameters Degradation Modeling Parameter estimation Prediction Thermal / Electrical Stress Experimental Data State Space model RUL EOL ...distribution at given single time point kP , and use this for multi-step predictions to EOL . There are several methods which exits for selecting the sigma
Applications of Bayesian spectrum representation in acoustics
NASA Astrophysics Data System (ADS)
Botts, Jonathan M.
This dissertation utilizes a Bayesian inference framework to enhance the solution of inverse problems where the forward model maps to acoustic spectra. A Bayesian solution to filter design inverts a acoustic spectra to pole-zero locations of a discrete-time filter model. Spatial sound field analysis with a spherical microphone array is a data analysis problem that requires inversion of spatio-temporal spectra to directions of arrival. As with many inverse problems, a probabilistic analysis results in richer solutions than can be achieved with ad-hoc methods. In the filter design problem, the Bayesian inversion results in globally optimal coefficient estimates as well as an estimate the most concise filter capable of representing the given spectrum, within a single framework. This approach is demonstrated on synthetic spectra, head-related transfer function spectra, and measured acoustic reflection spectra. The Bayesian model-based analysis of spatial room impulse responses is presented as an analogous problem with equally rich solution. The model selection mechanism provides an estimate of the number of arrivals, which is necessary to properly infer the directions of simultaneous arrivals. Although, spectrum inversion problems are fairly ubiquitous, the scope of this dissertation has been limited to these two and derivative problems. The Bayesian approach to filter design is demonstrated on an artificial spectrum to illustrate the model comparison mechanism and then on measured head-related transfer functions to show the potential range of application. Coupled with sampling methods, the Bayesian approach is shown to outperform least-squares filter design methods commonly used in commercial software, confirming the need for a global search of the parameter space. The resulting designs are shown to be comparable to those that result from global optimization methods, but the Bayesian approach has the added advantage of a filter length estimate within the same unified framework. The application to reflection data is useful for representing frequency-dependent impedance boundaries in finite difference acoustic simulations. Furthermore, since the filter transfer function is a parametric model, it can be modified to incorporate arbitrary frequency weighting and account for the band-limited nature of measured reflection spectra. Finally, the model is modified to compensate for dispersive error in the finite difference simulation, from the filter design process. Stemming from the filter boundary problem, the implementation of pressure sources in finite difference simulation is addressed in order to assure that schemes properly converge. A class of parameterized source functions is proposed and shown to offer straightforward control of residual error in the simulation. Guided by the notion that the solution to be approximated affects the approximation error, sources are designed which reduce residual dispersive error to the size of round-off errors. The early part of a room impulse response can be characterized by a series of isolated plane waves. Measured with an array of microphones, plane waves map to a directional response of the array or spatial intensity map. Probabilistic inversion of this response results in estimates of the number and directions of image source arrivals. The model-based inversion is shown to avoid ambiguities associated with peak-finding or inspection of the spatial intensity map. For this problem, determining the number of arrivals in a given frame is critical for properly inferring the state of the sound field. This analysis is effectively compression of the spatial room response, which is useful for analysis or encoding of the spatial sound field. Parametric, model-based formulations of these problems enhance the solution in all cases, and a Bayesian interpretation provides a principled approach to model comparison and parameter estimation. v
Jones, Matt; Love, Bradley C
2011-08-01
The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls that have plagued previous theoretical movements.
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
ERIC Educational Resources Information Center
Griffiths, Thomas L.; Chater, Nick; Norris, Dennis; Pouget, Alexandre
2012-01-01
Bowers and Davis (2012) criticize Bayesian modelers for telling "just so" stories about cognition and neuroscience. Their criticisms are weakened by not giving an accurate characterization of the motivation behind Bayesian modeling or the ways in which Bayesian models are used and by not evaluating this theoretical framework against specific…
Wavelet extractor: A Bayesian well-tie and wavelet extraction program
NASA Astrophysics Data System (ADS)
Gunning, James; Glinsky, Michael E.
2006-06-01
We introduce a new open-source toolkit for the well-tie or wavelet extraction problem of estimating seismic wavelets from seismic data, time-to-depth information, and well-log suites. The wavelet extraction model is formulated as a Bayesian inverse problem, and the software will simultaneously estimate wavelet coefficients, other parameters associated with uncertainty in the time-to-depth mapping, positioning errors in the seismic imaging, and useful amplitude-variation-with-offset (AVO) related parameters in multi-stack extractions. It is capable of multi-well, multi-stack extractions, and uses continuous seismic data-cube interpolation to cope with the problem of arbitrary well paths. Velocity constraints in the form of checkshot data, interpreted markers, and sonic logs are integrated in a natural way. The Bayesian formulation allows computation of full posterior uncertainties of the model parameters, and the important problem of the uncertain wavelet span is addressed uses a multi-model posterior developed from Bayesian model selection theory. The wavelet extraction tool is distributed as part of the Delivery seismic inversion toolkit. A simple log and seismic viewing tool is included in the distribution. The code is written in Java, and thus platform independent, but the Seismic Unix (SU) data model makes the inversion particularly suited to Unix/Linux environments. It is a natural companion piece of software to Delivery, having the capacity to produce maximum likelihood wavelet and noise estimates, but will also be of significant utility to practitioners wanting to produce wavelet estimates for other inversion codes or purposes. The generation of full parameter uncertainties is a crucial function for workers wishing to investigate questions of wavelet stability before proceeding to more advanced inversion studies.
NASA Technical Reports Server (NTRS)
Shih, Ann T.; Ancel, Ersin; Jones, Sharon M.
2012-01-01
The concern for reducing aviation safety risk is rising as the National Airspace System in the United States transforms to the Next Generation Air Transportation System (NextGen). The NASA Aviation Safety Program is committed to developing an effective aviation safety technology portfolio to meet the challenges of this transformation and to mitigate relevant safety risks. The paper focuses on the reasoning of selecting Object-Oriented Bayesian Networks (OOBN) as the technique and commercial software for the accident modeling and portfolio assessment. To illustrate the benefits of OOBN in a large and complex aviation accident model, the in-flight Loss-of-Control Accident Framework (LOCAF) constructed as an influence diagram is presented. An OOBN approach not only simplifies construction and maintenance of complex causal networks for the modelers, but also offers a well-organized hierarchical network that is easier for decision makers to exploit the model examining the effectiveness of risk mitigation strategies through technology insertions.
Zou, W; Ouyang, H
2016-02-01
We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A; Schwartz, Joel; Coull, Brent A
2017-03-01
The analysis of multiple outcomes is becoming increasingly common in modern biomedical studies. It is well-known that joint statistical models for multiple outcomes are more flexible and more powerful than fitting a separate model for each outcome; they yield more powerful tests of exposure or treatment effects by taking into account the dependence among outcomes and pooling evidence across outcomes. It is, however, unlikely that all outcomes are related to the same subset of covariates. Therefore, there is interest in identifying exposures or treatments associated with particular outcomes, which we term outcome-specific variable selection. In this work, we propose a variable selection approach for multivariate normal responses that incorporates not only information on the mean model, but also information on the variance-covariance structure of the outcomes. The approach effectively leverages evidence from all correlated outcomes to estimate the effect of a particular covariate on a given outcome. To implement this strategy, we develop a Bayesian method that builds a multivariate prior for the variable selection indicators based on the variance-covariance of the outcomes. We show via simulation that the proposed variable selection strategy can boost power to detect subtle effects without increasing the probability of false discoveries. We apply the approach to the Normative Aging Study (NAS) epigenetic data and identify a subset of five genes in the asthma pathway for which gene-specific DNA methylations are associated with exposures to either black carbon, a marker of traffic pollution, or sulfate, a marker of particles generated by power plants. © 2016, The International Biometric Society.
Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition
Fraley, Chris; Percival, Daniel
2014-01-01
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001
Fang, Jiansong; Yang, Ranyao; Gao, Li; Zhou, Dan; Yang, Shengqian; Liu, Ai-Lin; Du, Guan-hua
2013-11-25
Butyrylcholinesterase (BuChE, EC 3.1.1.8) is an important pharmacological target for Alzheimer's disease (AD) treatment. However, the currently available BuChE inhibitor screening assays are expensive, labor-intensive, and compound-dependent. It is necessary to develop robust in silico methods to predict the activities of BuChE inhibitors for the lead identification. In this investigation, support vector machine (SVM) models and naive Bayesian models were built to discriminate BuChE inhibitors (BuChEIs) from the noninhibitors. Each molecule was initially represented in 1870 structural descriptors (1235 from ADRIANA.Code, 334 from MOE, and 301 from Discovery studio). Correlation analysis and stepwise variable selection method were applied to figure out activity-related descriptors for prediction models. Additionally, structural fingerprint descriptors were added to improve the predictive ability of models, which were measured by cross-validation, a test set validation with 1001 compounds and an external test set validation with 317 diverse chemicals. The best two models gave Matthews correlation coefficient of 0.9551 and 0.9550 for the test set and 0.9132 and 0.9221 for the external test set. To demonstrate the practical applicability of the models in virtual screening, we screened an in-house data set with 3601 compounds, and 30 compounds were selected for further bioactivity assay. The assay results showed that 10 out of 30 compounds exerted significant BuChE inhibitory activities with IC50 values ranging from 0.32 to 22.22 μM, at which three new scaffolds as BuChE inhibitors were identified for the first time. To our best knowledge, this is the first report on BuChE inhibitors using machine learning approaches. The models generated from SVM and naive Bayesian approaches successfully predicted BuChE inhibitors. The study proved the feasibility of a new method for predicting bioactivities of ligands and discovering novel lead compounds.
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
Modeling Diagnostic Assessments with Bayesian Networks
ERIC Educational Resources Information Center
Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego
2007-01-01
This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…
NASA Astrophysics Data System (ADS)
Wheeler, David C.; Waller, Lance A.
2009-03-01
In this paper, we compare and contrast a Bayesian spatially varying coefficient process (SVCP) model with a geographically weighted regression (GWR) model for the estimation of the potentially spatially varying regression effects of alcohol outlets and illegal drug activity on violent crime in Houston, Texas. In addition, we focus on the inherent coefficient shrinkage properties of the Bayesian SVCP model as a way to address increased coefficient variance that follows from collinearity in GWR models. We outline the advantages of the Bayesian model in terms of reducing inflated coefficient variance, enhanced model flexibility, and more formal measuring of model uncertainty for prediction. We find spatially varying effects for alcohol outlets and drug violations, but the amount of variation depends on the type of model used. For the Bayesian model, this variation is controllable through the amount of prior influence placed on the variance of the coefficients. For example, the spatial pattern of coefficients is similar for the GWR and Bayesian models when a relatively large prior variance is used in the Bayesian model.
Philosophy and the practice of Bayesian statistics
Gelman, Andrew; Shalizi, Cosma Rohilla
2015-01-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.
Advances in the Application of Decision Theory to Test-Based Decision Making.
ERIC Educational Resources Information Center
van der Linden, Wim J.
This paper reviews recent research in the Netherlands on the application of decision theory to test-based decision making about personnel selection and student placement. The review is based on an earlier model proposed for the classification of decision problems, and emphasizes an empirical Bayesian framework. Classification decisions with…
ERIC Educational Resources Information Center
Kim, Deok-Hwan; Chung, Chin-Wan
2003-01-01
Discusses the collection fusion problem of image databases, concerned with retrieving relevant images by content based retrieval from image databases distributed on the Web. Focuses on a metaserver which selects image databases supporting similarity measures and proposes a new algorithm which exploits a probabilistic technique using Bayesian…
ERIC Educational Resources Information Center
van de Schoot, Rens; Hoijtink, Herbert; Mulder, Joris; Van Aken, Marcel A. G.; Orobio de Castro, Bram; Meeus, Wim; Romeijn, Jan-Willem
2011-01-01
Researchers often have expectations about the research outcomes in regard to inequality constraints between, e.g., group means. Consider the example of researchers who investigated the effects of inducing a negative emotional state in aggressive boys. It was expected that highly aggressive boys would, on average, score higher on aggressive…
USDA-ARS?s Scientific Manuscript database
Single-step Genomic Best Linear Unbiased Predictor (ssGBLUP) has become increasingly popular for whole-genome prediction (WGP) modeling as it utilizes any available pedigree and phenotypes on both genotyped and non-genotyped individuals. The WGP accuracy of ssGBLUP has been demonstrated to be greate...
USDA-ARS?s Scientific Manuscript database
Background Several studies have examined the accuracy of genomic selection both within and across purebred beef or dairy populations. However, the accuracy of direct genomic breeding values (DGVs) has been less well studied in crossbred or admixed cattle populations. We used a population of 3,240 cr...
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Factors affecting GEBV accuracy with single-step Bayesian models.
Zhou, Lei; Mrode, Raphael; Zhang, Shengli; Zhang, Qin; Li, Bugao; Liu, Jian-Feng
2018-01-01
A single-step approach to obtain genomic prediction was first proposed in 2009. Many studies have investigated the components of GEBV accuracy in genomic selection. However, it is still unclear how the population structure and the relationships between training and validation populations influence GEBV accuracy in terms of single-step analysis. Here, we explored the components of GEBV accuracy in single-step Bayesian analysis with a simulation study. Three scenarios with various numbers of QTL (5, 50, and 500) were simulated. Three models were implemented to analyze the simulated data: single-step genomic best linear unbiased prediction (GBLUP; SSGBLUP), single-step BayesA (SS-BayesA), and single-step BayesB (SS-BayesB). According to our results, GEBV accuracy was influenced by the relationships between the training and validation populations more significantly for ungenotyped animals than for genotyped animals. SS-BayesA/BayesB showed an obvious advantage over SSGBLUP with the scenarios of 5 and 50 QTL. SS-BayesB model obtained the lowest accuracy with the 500 QTL in the simulation. SS-BayesA model was the most efficient and robust considering all QTL scenarios. Generally, both the relationships between training and validation populations and LD between markers and QTL contributed to GEBV accuracy in the single-step analysis, and the advantages of single-step Bayesian models were more apparent when the trait is controlled by fewer QTL.
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Fundamentals and Recent Developments in Approximate Bayesian Computation
Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka
2017-01-01
Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.
Probabilistic selection of high-redshift quasars
NASA Astrophysics Data System (ADS)
Mortlock, Daniel J.; Patel, Mitesh; Warren, Stephen J.; Hewett, Paul C.; Venemans, Bram P.; McMahon, Richard G.; Simpson, Chris
2012-01-01
High-redshift quasars (HZQs) with redshifts of z ≳ 6 are so rare that any photometrically selected sample of sources with HZQ-like colours is likely to be dominated by Galactic stars and brown dwarfs scattered from the stellar locus. It is impractical to re-observe all such candidates, so an alternative approach was developed in which Bayesian model comparison techniques are used to calculate the probability that a candidate is a HZQ, Pq, by combining models of the quasar and star populations with the photometric measurements of the object. This method was motivated specifically by the large number of HZQ candidates identified by cross-matching the UKIRT (United Kingdom Infrared Telescope) Infrared Deep Sky Survey (UKIDSS) Large Area Survey (LAS) to the Sloan Digital Sky Survey (SDSS): in the ? covered by the LAS in the UKIDSS Eighth Data Release (DR8) there are ˜9 × 103 real astronomical point sources with the measured colours of the target quasars, of which only ˜10 are expected to be HZQs. Applying Bayesian model comparison to the sample reveals that most sources with HZQ-like colours have Pq≲ 0.1 and can be confidently rejected without the need for any further observations. In the case of the UKIDSS DR8 LAS, there were just 107 candidates with Pq≥ 0.1; these objects were prioritized for re-observation by ranking according to Pq (and their likely redshift, which was also inferred from the photometric data). Most candidates were rejected after one or two (moderate-depth) photometric measurements by recalculating Pq using the new data. That left 12 confirmed HZQs, six of which were previously identified in the SDSS and six of which were new UKIDSS discoveries. The high efficiency of this Bayesian selection method suggests that it could usefully be extended to other HZQ surveys (e.g. searches by the Panoramic Survey Telescope And Rapid Response System, Pan-STARRS, or the Visible and Infrared Survey Telescope for Astronomy, VISTA) as well as to other searches for rare objects.
Non-ignorable missingness in logistic regression.
Wang, Joanna J J; Bartlett, Mark; Ryan, Louise
2017-08-30
Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Honey Bee Location- and Time-Linked Memory Use in Novel Foraging Situations: Floral Color Dependency
Amaya-Márquez, Marisol; Hill, Peggy S. M.; Abramson, Charles I.; Wells, Harrington
2014-01-01
Learning facilitates behavioral plasticity, leading to higher success rates when foraging. However, memory is of decreasing value with changes brought about by moving to novel resource locations or activity at different times of the day. These premises suggest a foraging model with location- and time-linked memory. Thus, each problem is novel, and selection should favor a maximum likelihood approach to achieve energy maximization results. Alternatively, information is potentially always applicable. This premise suggests a different foraging model, one where initial decisions should be based on previous learning regardless of the foraging site or time. Under this second model, no problem is considered novel, and selection should favor a Bayesian or pseudo-Bayesian approach to achieve energy maximization results. We tested these two models by offering honey bees a learning situation at one location in the morning, where nectar rewards differed between flower colors, and examined their behavior at a second location in the afternoon where rewards did not differ between flower colors. Both blue-yellow and blue-white dimorphic flower patches were used. Information learned in the morning was clearly used in the afternoon at a new foraging site. Memory was not location-time restricted in terms of use when visiting either flower color dimorphism. PMID:26462587
Liu, Zun-lei; Yuan, Xing-wei; Yang, Lin-lin; Yan, Li-ping; Zhang, Hui; Cheng, Jia-hua
2015-02-01
Multiple hypotheses are available to explain recruitment rate. Model selection methods can be used to identify the best model that supports a particular hypothesis. However, using a single model for estimating recruitment success is often inadequate for overexploited population because of high model uncertainty. In this study, stock-recruitment data of small yellow croaker in the East China Sea collected from fishery dependent and independent surveys between 1992 and 2012 were used to examine density-dependent effects on recruitment success. Model selection methods based on frequentist (AIC, maximum adjusted R2 and P-values) and Bayesian (Bayesian model averaging, BMA) methods were applied to identify the relationship between recruitment and environment conditions. Interannual variability of the East China Sea environment was indicated by sea surface temperature ( SST) , meridional wind stress (MWS), zonal wind stress (ZWS), sea surface pressure (SPP) and runoff of Changjiang River ( RCR). Mean absolute error, mean squared predictive error and continuous ranked probability score were calculated to evaluate the predictive performance of recruitment success. The results showed that models structures were not consistent based on three kinds of model selection methods, predictive variables of models were spawning abundance and MWS by AIC, spawning abundance by P-values, spawning abundance, MWS and RCR by maximum adjusted R2. The recruitment success decreased linearly with stock abundance (P < 0.01), suggesting overcompensation effect in the recruitment success might be due to cannibalism or food competition. Meridional wind intensity showed marginally significant and positive effects on the recruitment success (P = 0.06), while runoff of Changjiang River showed a marginally negative effect (P = 0.07). Based on mean absolute error and continuous ranked probability score, predictive error associated with models obtained from BMA was the smallest amongst different approaches, while that from models selected based on the P-value of the independent variables was the highest. However, mean squared predictive error from models selected based on the maximum adjusted R2 was highest. We found that BMA method could improve the prediction of recruitment success, derive more accurate prediction interval and quantitatively evaluate model uncertainty.
Analyzing Single-Molecule Time Series via Nonparametric Bayesian Inference
Hines, Keegan E.; Bankston, John R.; Aldrich, Richard W.
2015-01-01
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. PMID:25650922
Fully Bayesian tests of neutrality using genealogical summary statistics.
Drummond, Alexei J; Suchard, Marc A
2008-10-31
Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Bayesian networks for maritime traffic accident prevention: benefits and challenges.
Hänninen, Maria
2014-12-01
Bayesian networks are quantitative modeling tools whose applications to the maritime traffic safety context are becoming more popular. This paper discusses the utilization of Bayesian networks in maritime safety modeling. Based on literature and the author's own experiences, the paper studies what Bayesian networks can offer to maritime accident prevention and safety modeling and discusses a few challenges in their application to this context. It is argued that the capability of representing rather complex, not necessarily causal but uncertain relationships makes Bayesian networks an attractive modeling tool for the maritime safety and accidents. Furthermore, as the maritime accident and safety data is still rather scarce and has some quality problems, the possibility to combine data with expert knowledge and the easy way of updating the model after acquiring more evidence further enhance their feasibility. However, eliciting the probabilities from the maritime experts might be challenging and the model validation can be tricky. It is concluded that with the utilization of several data sources, Bayesian updating, dynamic modeling, and hidden nodes for latent variables, Bayesian networks are rather well-suited tools for the maritime safety management and decision-making. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bayesian classification theory
NASA Technical Reports Server (NTRS)
Hanson, Robin; Stutz, John; Cheeseman, Peter
1991-01-01
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.
Bayesian Framework for Water Quality Model Uncertainty Estimation and Risk Management
A formal Bayesian methodology is presented for integrated model calibration and risk-based water quality management using Bayesian Monte Carlo simulation and maximum likelihood estimation (BMCML). The primary focus is on lucid integration of model calibration with risk-based wat...
NASA Astrophysics Data System (ADS)
Narukawa, Takafumi; Yamaguchi, Akira; Jang, Sunghyon; Amaya, Masaki
2018-02-01
For estimating fracture probability of fuel cladding tube under loss-of-coolant accident conditions of light-water-reactors, laboratory-scale integral thermal shock tests were conducted on non-irradiated Zircaloy-4 cladding tube specimens. Then, the obtained binary data with respect to fracture or non-fracture of the cladding tube specimen were analyzed statistically. A method to obtain the fracture probability curve as a function of equivalent cladding reacted (ECR) was proposed using Bayesian inference for generalized linear models: probit, logit, and log-probit models. Then, model selection was performed in terms of physical characteristics and information criteria, a widely applicable information criterion and a widely applicable Bayesian information criterion. As a result, it was clarified that the log-probit model was the best among the three models to estimate the fracture probability in terms of the degree of prediction accuracy for both next data to be obtained and the true model. Using the log-probit model, it was shown that 20% ECR corresponded to a 5% probability level with a 95% confidence of fracture of the cladding tube specimens.
Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models
Burr, Tom
2013-01-01
Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668
Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.
Burr, Tom; Skurikhin, Alexei
2013-01-01
Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.
ERIC Educational Resources Information Center
Wu, Haiyan
2013-01-01
General diagnostic models (GDMs) and Bayesian networks are mathematical frameworks that cover a wide variety of psychometric models. Both extend latent class models, and while GDMs also extend item response theory (IRT) models, Bayesian networks can be parameterized using discretized IRT. The purpose of this study is to examine similarities and…
Variable screening via quantile partial correlation
Ma, Shujie; Tsai, Chih-Ling
2016-01-01
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683
Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model
Bitzer, Sebastian; Park, Hame; Blankenburg, Felix; Kiebel, Stefan J.
2014-01-01
Behavioral data obtained with perceptual decision making experiments are typically analyzed with the drift-diffusion model. This parsimonious model accumulates noisy pieces of evidence toward a decision bound to explain the accuracy and reaction times of subjects. Recently, Bayesian models have been proposed to explain how the brain extracts information from noisy input as typically presented in perceptual decision making tasks. It has long been known that the drift-diffusion model is tightly linked with such functional Bayesian models but the precise relationship of the two mechanisms was never made explicit. Using a Bayesian model, we derived the equations which relate parameter values between these models. In practice we show that this equivalence is useful when fitting multi-subject data. We further show that the Bayesian model suggests different decision variables which all predict equal responses and discuss how these may be discriminated based on neural correlates of accumulated evidence. In addition, we discuss extensions to the Bayesian model which would be difficult to derive for the drift-diffusion model. We suggest that these and other extensions may be highly useful for deriving new experiments which test novel hypotheses. PMID:24616689
Flood quantile estimation at ungauged sites by Bayesian networks
NASA Astrophysics Data System (ADS)
Mediero, L.; Santillán, D.; Garrote, L.
2012-04-01
Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a stochastic generator of synthetic data was developed. Synthetic basin characteristics were randomised, keeping the statistical properties of observed physical and climatic variables in the homogeneous region. The synthetic flood quantiles were stochastically generated taking the regression equation as basis. The learnt Bayesian network was validated by the reliability diagram, the Brier Score and the ROC diagram, which are common measures used in the validation of probabilistic forecasts. Summarising, the flood quantile estimations through Bayesian networks supply information about the prediction uncertainty as a probability distribution function of discharges is given as result. Therefore, the Bayesian network model has application as a decision support for water resources and planning management.
Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng
2014-01-01
A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping.
Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng
2014-01-01
A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping. PMID:25254227
Testing and selection of cosmological models with (1+z){sup 6} corrections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Szydlowski, Marek; Marc Kac Complex Systems Research Centre, Jagiellonian University, ul. Reymonta 4, 30-059 Cracow; Godlowski, Wlodzimierz
2008-02-15
In the paper we check whether the contribution of (-)(1+z){sup 6} type in the Friedmann equation can be tested. We consider some astronomical tests to constrain the density parameters in such models. We describe different interpretations of such an additional term: geometric effects of loop quantum cosmology, effects of braneworld cosmological models, nonstandard cosmological models in metric-affine gravity, and models with spinning fluid. Kinematical (or geometrical) tests based on null geodesics are insufficient to separate individual matter components when they behave like perfect fluid and scale in the same way. Still, it is possible to measure their overall effect. Wemore » use recent measurements of the coordinate distances from the Fanaroff-Riley type IIb radio galaxy data, supernovae type Ia data, baryon oscillation peak and cosmic microwave background radiation observations to obtain stronger bounds for the contribution of the type considered. We demonstrate that, while {rho}{sup 2} corrections are very small, they can be tested by astronomical observations--at least in principle. Bayesian criteria of model selection (the Bayesian factor, AIC, and BIC) are used to check if additional parameters are detectable in the present epoch. As it turns out, the {lambda}CDM model is favored over the bouncing model driven by loop quantum effects. Or, in other words, the bounds obtained from cosmography are very weak, and from the point of view of the present data this model is indistinguishable from the {lambda}CDM one.« less
A Bayesian spawning habitat suitability model for American shad in southeastern United States rivers
Hightower, Joseph E.; Harris, Julianne E.; Raabe, Joshua K.; Brownell, Prescott; Drew, C. Ashton
2012-01-01
Habitat suitability index models for American shad Alosa sapidissima were developed by Stier and Crance in 1985. These models, which were based on a combination of published information and expert opinion, are often used to make decisions about hydropower dam operations and fish passage. The purpose of this study was to develop updated habitat suitability index models for spawning American shad in the southeastern United States, building on the many field and laboratory studies completed since 1985. We surveyed biologists who had knowledge about American shad spawning grounds, assembled a panel of experts to discuss important habitat variables, and used raw data from published and unpublished studies to develop new habitat suitability curves. The updated curves are based on resource selection functions, which can model habitat selectivity based on use and availability of particular habitats. Using field data collected in eight rivers from Virginia to Florida (Mattaponi, Pamunkey, Roanoke, Tar, Neuse, Cape Fear, Pee Dee, St. Johns), we obtained new curves for temperature, current velocity, and depth that were generally similar to the original models. Our new suitability function for substrate was also similar to the original pattern, except that sand (optimal in the original model) has a very low estimated suitability. The Bayesian approach that we used to develop habitat suitability curves provides an objective framework for updating the model as new studies are completed and for testing the model's applicability in other parts of the species' range.
Liu, Feng; Walters, Stephen J; Julious, Steven A
2017-10-02
It is important to quantify the dose response for a drug in phase 2a clinical trials so the optimal doses can then be selected for subsequent late phase trials. In a phase 2a clinical trial of new lead drug being developed for the treatment of rheumatoid arthritis (RA), a U-shaped dose response curve was observed. In the light of this result further research was undertaken to design an efficient phase 2a proof of concept (PoC) trial for a follow-on compound using the lessons learnt from the lead compound. The planned analysis for the Phase 2a trial for GSK123456 was a Bayesian Emax model which assumes the dose-response relationship follows a monotonic sigmoid "S" shaped curve. This model was found to be suboptimal to model the U-shaped dose response observed in the data from this trial and alternatives approaches were needed to be considered for the next compound for which a Normal dynamic linear model (NDLM) is proposed. This paper compares the statistical properties of the Bayesian Emax model and NDLM model and both models are evaluated using simulation in the context of adaptive Phase 2a PoC design under a variety of assumed dose response curves: linear, Emax model, U-shaped model, and flat response. It is shown that the NDLM method is flexible and can handle a wide variety of dose-responses, including monotonic and non-monotonic relationships. In comparison to the NDLM model the Emax model excelled with higher probability of selecting ED90 and smaller average sample size, when the true dose response followed Emax like curve. In addition, the type I error, probability of incorrectly concluding a drug may work when it does not, is inflated with the Bayesian NDLM model in all scenarios which would represent a development risk to pharmaceutical company. The bias, which is the difference between the estimated effect from the Emax and NDLM models and the simulated value, is comparable if the true dose response follows a placebo like curve, an Emax like curve, or log linear shape curve under fixed dose allocation, no adaptive allocation, half adaptive and adaptive scenarios. The bias though is significantly increased for the Emax model if the true dose response follows a U-shaped curve. In most cases the Bayesian Emax model works effectively and efficiently, with low bias and good probability of success in case of monotonic dose response. However, if there is a belief that the dose response could be non-monotonic then the NDLM is the superior model to assess the dose response.
Hierarchical Bayesian Modeling of Fluid-Induced Seismicity
NASA Astrophysics Data System (ADS)
Broccardo, M.; Mignan, A.; Wiemer, S.; Stojadinovic, B.; Giardini, D.
2017-11-01
In this study, we present a Bayesian hierarchical framework to model fluid-induced seismicity. The framework is based on a nonhomogeneous Poisson process with a fluid-induced seismicity rate proportional to the rate of injected fluid. The fluid-induced seismicity rate model depends upon a set of physically meaningful parameters and has been validated for six fluid-induced case studies. In line with the vision of hierarchical Bayesian modeling, the rate parameters are considered as random variables. We develop both the Bayesian inference and updating rules, which are used to develop a probabilistic forecasting model. We tested the Basel 2006 fluid-induced seismic case study to prove that the hierarchical Bayesian model offers a suitable framework to coherently encode both epistemic uncertainty and aleatory variability. Moreover, it provides a robust and consistent short-term seismic forecasting model suitable for online risk quantification and mitigation.
Varughese, Eunice A.; Brinkman, Nichole E; Anneken, Emily M; Cashdollar, Jennifer S; Fout, G. Shay; Furlong, Edward T.; Kolpin, Dana W.; Glassmeyer, Susan T.; Keely, Scott P
2017-01-01
incorporated into a Bayesian model to more accurately determine viral load in both source and treated water. Results of the Bayesian model indicated that viruses are present in source water and treated water. By using a Bayesian framework that incorporates inhibition, as well as many other parameters that affect viral detection, this study offers an approach for more accurately estimating the occurrence of viral pathogens in environmental waters.
A local approach for focussed Bayesian fusion
NASA Astrophysics Data System (ADS)
Sander, Jennifer; Heizmann, Michael; Goussev, Igor; Beyerer, Jürgen
2009-04-01
Local Bayesian fusion approaches aim to reduce high storage and computational costs of Bayesian fusion which is separated from fixed modeling assumptions. Using the small world formalism, we argue why this proceeding is conform with Bayesian theory. Then, we concentrate on the realization of local Bayesian fusion by focussing the fusion process solely on local regions that are task relevant with a high probability. The resulting local models correspond then to restricted versions of the original one. In a previous publication, we used bounds for the probability of misleading evidence to show the validity of the pre-evaluation of task specific knowledge and prior information which we perform to build local models. In this paper, we prove the validity of this proceeding using information theoretic arguments. For additional efficiency, local Bayesian fusion can be realized in a distributed manner. Here, several local Bayesian fusion tasks are evaluated and unified after the actual fusion process. For the practical realization of distributed local Bayesian fusion, software agents are predestinated. There is a natural analogy between the resulting agent based architecture and criminal investigations in real life. We show how this analogy can be used to improve the efficiency of distributed local Bayesian fusion additionally. Using a landscape model, we present an experimental study of distributed local Bayesian fusion in the field of reconnaissance, which highlights its high potential.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.
Prediction of road accidents: A Bayesian hierarchical approach.
Deublein, Markus; Schubert, Matthias; Adey, Bryan T; Köhler, Jochen; Faber, Michael H
2013-03-01
In this paper a novel methodology for the prediction of the occurrence of road accidents is presented. The methodology utilizes a combination of three statistical methods: (1) gamma-updating of the occurrence rates of injury accidents and injured road users, (2) hierarchical multivariate Poisson-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models. Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis of the observed frequencies of the model response variables, e.g. the occurrence of an accident, and observed values of the risk indicating variables, e.g. degree of road curvature. Subsequently, parameter learning is done using updating algorithms, to determine the posterior predictive probability distributions of the model response variables, conditional on the values of the risk indicating variables. The methodology is illustrated through a case study using data of the Austrian rural motorway network. In the case study, on randomly selected road segments the methodology is used to produce a model to predict the expected number of accidents in which an injury has occurred and the expected number of light, severe and fatally injured road users. Additionally, the methodology is used for geo-referenced identification of road sections with increased occurrence probabilities of injury accident events on a road link between two Austrian cities. It is shown that the proposed methodology can be used to develop models to estimate the occurrence of road accidents for any road network provided that the required data are available. Copyright © 2012 Elsevier Ltd. All rights reserved.
Bayesian SEM for Specification Search Problems in Testing Factorial Invariance.
Shi, Dexin; Song, Hairong; Liao, Xiaolan; Terry, Robert; Snyder, Lori A
2017-01-01
Specification search problems refer to two important but under-addressed issues in testing for factorial invariance: how to select proper reference indicators and how to locate specific non-invariant parameters. In this study, we propose a two-step procedure to solve these issues. Step 1 is to identify a proper reference indicator using the Bayesian structural equation modeling approach. An item is selected if it is associated with the highest likelihood to be invariant across groups. Step 2 is to locate specific non-invariant parameters, given that a proper reference indicator has already been selected in Step 1. A series of simulation analyses show that the proposed method performs well under a variety of data conditions, and optimal performance is observed under conditions of large magnitude of non-invariance, low proportion of non-invariance, and large sample sizes. We also provide an empirical example to demonstrate the specific procedures to implement the proposed method in applied research. The importance and influences are discussed regarding the choices of informative priors with zero mean and small variances. Extensions and limitations are also pointed out.
Li, Qian; Trivedi, Pravin K
2016-02-01
This paper develops an extended specification of the two-part model, which controls for unobservable self-selection and heterogeneity of health insurance, and analyzes the impact of Medicare supplemental plans on the prescription drug expenditure of the elderly, using a linked data set based on the Medicare Current Beneficiary Survey data for 2003-2004. The econometric analysis is conducted using a Bayesian econometric framework. We estimate the treatment effects for different counterfactuals and find significant evidence of endogeneity in plan choice and the presence of both adverse and advantageous selections in the supplemental insurance market. The average incentive effect is estimated to be $757 (2004 value) or 41% increase per person per year for the elderly enrolled in supplemental plans with drug coverage against the Medicare fee-for-service counterfactual and is $350 or 21% against the supplemental plans without drug coverage counterfactual. The incentive effect varies by different sources of drug coverage: highest for employer-sponsored insurance plans, followed by Medigap and managed medicare plans. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Wentworth, Mami Tonoe
Uncertainty quantification plays an important role when making predictive estimates of model responses. In this context, uncertainty quantification is defined as quantifying and reducing uncertainties, and the objective is to quantify uncertainties in parameter, model and measurements, and propagate the uncertainties through the model, so that one can make a predictive estimate with quantified uncertainties. Two of the aspects of uncertainty quantification that must be performed prior to propagating uncertainties are model calibration and parameter selection. There are several efficient techniques for these processes; however, the accuracy of these methods are often not verified. This is the motivation for our work, and in this dissertation, we present and illustrate verification frameworks for model calibration and parameter selection in the context of biological and physical models. First, HIV models, developed and improved by [2, 3, 8], describe the viral infection dynamics of an HIV disease. These are also used to make predictive estimates of viral loads and T-cell counts and to construct an optimal control for drug therapy. Estimating input parameters is an essential step prior to uncertainty quantification. However, not all the parameters are identifiable, implying that they cannot be uniquely determined by the observations. These unidentifiable parameters can be partially removed by performing parameter selection, a process in which parameters that have minimal impacts on the model response are determined. We provide verification techniques for Bayesian model calibration and parameter selection for an HIV model. As an example of a physical model, we employ a heat model with experimental measurements presented in [10]. A steady-state heat model represents a prototypical behavior for heat conduction and diffusion process involved in a thermal-hydraulic model, which is a part of nuclear reactor models. We employ this simple heat model to illustrate verification techniques for model calibration. For Bayesian model calibration, we employ adaptive Metropolis algorithms to construct densities for input parameters in the heat model and the HIV model. To quantify the uncertainty in the parameters, we employ two MCMC algorithms: Delayed Rejection Adaptive Metropolis (DRAM) [33] and Differential Evolution Adaptive Metropolis (DREAM) [66, 68]. The densities obtained using these methods are compared to those obtained through the direct numerical evaluation of the Bayes' formula. We also combine uncertainties in input parameters and measurement errors to construct predictive estimates for a model response. A significant emphasis is on the development and illustration of techniques to verify the accuracy of sampling-based Metropolis algorithms. We verify the accuracy of DRAM and DREAM by comparing chains, densities and correlations obtained using DRAM, DREAM and the direct evaluation of Bayes formula. We also perform similar analysis for credible and prediction intervals for responses. Once the parameters are estimated, we employ energy statistics test [63, 64] to compare the densities obtained by different methods for the HIV model. The energy statistics are used to test the equality of distributions. We also consider parameter selection and verification techniques for models having one or more parameters that are noninfluential in the sense that they minimally impact model outputs. We illustrate these techniques for a dynamic HIV model but note that the parameter selection and verification framework is applicable to a wide range of biological and physical models. To accommodate the nonlinear input to output relations, which are typical for such models, we focus on global sensitivity analysis techniques, including those based on partial correlations, Sobol indices based on second-order model representations, and Morris indices, as well as a parameter selection technique based on standard errors. A significant objective is to provide verification strategies to assess the accuracy of those techniques, which we illustrate in the context of the HIV model. Finally, we examine active subspace methods as an alternative to parameter subset selection techniques. The objective of active subspace methods is to determine the subspace of inputs that most strongly affect the model response, and to reduce the dimension of the input space. The major difference between active subspace methods and parameter selection techniques is that parameter selection identifies influential parameters whereas subspace selection identifies a linear combination of parameters that impacts the model responses significantly. We employ active subspace methods discussed in [22] for the HIV model and present a verification that the active subspace successfully reduces the input dimensions.
Bayesian Models for Astrophysical Data Using R, JAGS, Python, and Stan
NASA Astrophysics Data System (ADS)
Hilbe, Joseph M.; de Souza, Rafael S.; Ishida, Emille E. O.
2017-05-01
This comprehensive guide to Bayesian methods in astronomy enables hands-on work by supplying complete R, JAGS, Python, and Stan code, to use directly or to adapt. It begins by examining the normal model from both frequentist and Bayesian perspectives and then progresses to a full range of Bayesian generalized linear and mixed or hierarchical models, as well as additional types of models such as ABC and INLA. The book provides code that is largely unavailable elsewhere and includes details on interpreting and evaluating Bayesian models. Initial discussions offer models in synthetic form so that readers can easily adapt them to their own data; later the models are applied to real astronomical data. The consistent focus is on hands-on modeling, analysis of data, and interpretations that address scientific questions. A must-have for astronomers, its concrete approach will also be attractive to researchers in the sciences more generally.
Approximate Bayesian computation in large-scale structure: constraining the galaxy-halo connection
NASA Astrophysics Data System (ADS)
Hahn, ChangHoon; Vakili, Mohammadjavad; Walsh, Kilian; Hearin, Andrew P.; Hogg, David W.; Campbell, Duncan
2017-08-01
Standard approaches to Bayesian parameter inference in large-scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as approximate Bayesian computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter haloes with galaxies. Using specific implementation of ABC supplemented with population Monte Carlo importance sampling, a generative forward model using HOD and a distance metric based on galaxy number density, two-point correlation function and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected 'true' HOD parameters. The parameter constraints we obtain from ABC are consistent with the 'true' HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately, our results suggest that ABC can and should be applied in parameter inference for LSS analyses.
Quantifying falsifiability of scientific theories
NASA Astrophysics Data System (ADS)
Nemenman, Ilya
I argue that the notion of falsifiability, a key concept in defining a valid scientific theory, can be quantified using Bayesian Model Selection, which is a standard tool in modern statistics. This relates falsifiability to the quantitative version of the statistical Occam's razor, and allows transforming some long-running arguments about validity of scientific theories from philosophical discussions to rigorous mathematical calculations.
Ritchie, Andrew M; Lo, Nathan; Ho, Simon Y W
2017-05-01
In Bayesian phylogenetic analyses of genetic data, prior probability distributions need to be specified for the model parameters, including the tree. When Bayesian methods are used for molecular dating, available tree priors include those designed for species-level data, such as the pure-birth and birth-death priors, and coalescent-based priors designed for population-level data. However, molecular dating methods are frequently applied to data sets that include multiple individuals across multiple species. Such data sets violate the assumptions of both the speciation and coalescent-based tree priors, making it unclear which should be chosen and whether this choice can affect the estimation of node times. To investigate this problem, we used a simulation approach to produce data sets with different proportions of within- and between-species sampling under the multispecies coalescent model. These data sets were then analyzed under pure-birth, birth-death, constant-size coalescent, and skyline coalescent tree priors. We also explored the ability of Bayesian model testing to select the best-performing priors. We confirmed the applicability of our results to empirical data sets from cetaceans, phocids, and coregonid whitefish. Estimates of node times were generally robust to the choice of tree prior, but some combinations of tree priors and sampling schemes led to large differences in the age estimates. In particular, the pure-birth tree prior frequently led to inaccurate estimates for data sets containing a mixture of inter- and intraspecific sampling, whereas the birth-death and skyline coalescent priors produced stable results across all scenarios. Model testing provided an adequate means of rejecting inappropriate tree priors. Our results suggest that tree priors do not strongly affect Bayesian molecular dating results in most cases, even when severely misspecified. However, the choice of tree prior can be significant for the accuracy of dating results in the case of data sets with mixed inter- and intraspecies sampling. [Bayesian phylogenetic methods; model testing; molecular dating; node time; tree prior.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
Moving in time: Bayesian causal inference explains movement coordination to auditory beats
Elliott, Mark T.; Wing, Alan M.; Welchman, Andrew E.
2014-01-01
Many everyday skilled actions depend on moving in time with signals that are embedded in complex auditory streams (e.g. musical performance, dancing or simply holding a conversation). Such behaviour is apparently effortless; however, it is not known how humans combine auditory signals to support movement production and coordination. Here, we test how participants synchronize their movements when there are potentially conflicting auditory targets to guide their actions. Participants tapped their fingers in time with two simultaneously presented metronomes of equal tempo, but differing in phase and temporal regularity. Synchronization therefore depended on integrating the two timing cues into a single-event estimate or treating the cues as independent and thereby selecting one signal over the other. We show that a Bayesian inference process explains the situations in which participants choose to integrate or separate signals, and predicts motor timing errors. Simulations of this causal inference process demonstrate that this model provides a better description of the data than other plausible models. Our findings suggest that humans exploit a Bayesian inference process to control movement timing in situations where the origin of auditory signals needs to be resolved. PMID:24850915
NASA Astrophysics Data System (ADS)
Zarekarizi, M.; Moradkhani, H.
2015-12-01
Extreme events are proven to be affected by climate change, influencing hydrologic simulations for which stationarity is usually a main assumption. Studies have discussed that this assumption would lead to large bias in model estimations and higher flood hazard consequently. Getting inspired by the importance of non-stationarity, we determined how the exceedance probabilities have changed over time in Johnson Creek River, Oregon. This could help estimate the probability of failure of a structure that was primarily designed to resist less likely floods according to common practice. Therefore, we built a climate informed Bayesian hierarchical model and non-stationarity was considered in modeling framework. Principle component analysis shows that North Atlantic Oscillation (NAO), Western Pacific Index (WPI) and Eastern Asia (EA) are mostly affecting stream flow in this river. We modeled flood extremes using peaks over threshold (POT) method rather than conventional annual maximum flood (AMF) mainly because it is possible to base the model on more information. We used available threshold selection methods to select a suitable threshold for the study area. Accounting for non-stationarity, model parameters vary through time with climate indices. We developed a couple of model scenarios and chose one which could best explain the variation in data based on performance measures. We also estimated return periods under non-stationarity condition. Results show that ignoring stationarity could increase the flood hazard up to four times which could increase the probability of an in-stream structure being overtopped.
Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.
Verde, Pablo E
2010-12-30
In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837
ERIC Educational Resources Information Center
Levy, Roy
2014-01-01
Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…
Steingroever, Helen; Pachur, Thorsten; Šmíra, Martin; Lee, Michael D
2018-06-01
The Iowa Gambling Task (IGT) is one of the most popular experimental paradigms for comparing complex decision-making across groups. Most commonly, IGT behavior is analyzed using frequentist tests to compare performance across groups, and to compare inferred parameters of cognitive models developed for the IGT. Here, we present a Bayesian alternative based on Bayesian repeated-measures ANOVA for comparing performance, and a suite of three complementary model-based methods for assessing the cognitive processes underlying IGT performance. The three model-based methods involve Bayesian hierarchical parameter estimation, Bayes factor model comparison, and Bayesian latent-mixture modeling. We illustrate these Bayesian methods by applying them to test the extent to which differences in intuitive versus deliberate decision style are associated with differences in IGT performance. The results show that intuitive and deliberate decision-makers behave similarly on the IGT, and the modeling analyses consistently suggest that both groups of decision-makers rely on similar cognitive processes. Our results challenge the notion that individual differences in intuitive and deliberate decision styles have a broad impact on decision-making. They also highlight the advantages of Bayesian methods, especially their ability to quantify evidence in favor of the null hypothesis, and that they allow model-based analyses to incorporate hierarchical and latent-mixture structures.
Molitor, John
2012-03-01
Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
The Bayesian reader: explaining word recognition as an optimal Bayesian decision process.
Norris, Dennis
2006-04-01
This article presents a theory of visual word recognition that assumes that, in the tasks of word identification, lexical decision, and semantic categorization, human readers behave as optimal Bayesian decision makers. This leads to the development of a computational model of word recognition, the Bayesian reader. The Bayesian reader successfully simulates some of the most significant data on human reading. The model accounts for the nature of the function relating word frequency to reaction time and identification threshold, the effects of neighborhood density and its interaction with frequency, and the variation in the pattern of neighborhood density effects seen in different experimental tasks. Both the general behavior of the model and the way the model predicts different patterns of results in different tasks follow entirely from the assumption that human readers approximate optimal Bayesian decision makers. ((c) 2006 APA, all rights reserved).
Moradi, Milad; Ghadiri, Nasser
2018-01-01
Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Bayesian peak picking for NMR spectra.
Cheng, Yichen; Gao, Xin; Liang, Faming
2014-02-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein-DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method. Copyright © 2013. Production and hosting by Elsevier Ltd.
Model selection as a science driver for dark energy surveys
NASA Astrophysics Data System (ADS)
Mukherjee, Pia; Parkinson, David; Corasaniti, Pier Stefano; Liddle, Andrew R.; Kunz, Martin
2006-07-01
A key science goal of upcoming dark energy surveys is to seek time-evolution of the dark energy. This problem is one of model selection, where the aim is to differentiate between cosmological models with different numbers of parameters. However, the power of these surveys is traditionally assessed by estimating their ability to constrain parameters, which is a different statistical problem. In this paper, we use Bayesian model selection techniques, specifically forecasting of the Bayes factors, to compare the abilities of different proposed surveys in discovering dark energy evolution. We consider six experiments - supernova luminosity measurements by the Supernova Legacy Survey, SNAP, JEDI and ALPACA, and baryon acoustic oscillation measurements by WFMOS and JEDI - and use Bayes factor plots to compare their statistical constraining power. The concept of Bayes factor forecasting has much broader applicability than dark energy surveys.
Continuous-time discrete-space models for animal movement
Hanks, Ephraim M.; Hooten, Mevin B.; Alldredge, Mat W.
2015-01-01
The processes influencing animal movement and resource selection are complex and varied. Past efforts to model behavioral changes over time used Bayesian statistical models with variable parameter space, such as reversible-jump Markov chain Monte Carlo approaches, which are computationally demanding and inaccessible to many practitioners. We present a continuous-time discrete-space (CTDS) model of animal movement that can be fit using standard generalized linear modeling (GLM) methods. This CTDS approach allows for the joint modeling of location-based as well as directional drivers of movement. Changing behavior over time is modeled using a varying-coefficient framework which maintains the computational simplicity of a GLM approach, and variable selection is accomplished using a group lasso penalty. We apply our approach to a study of two mountain lions (Puma concolor) in Colorado, USA.
Estimating the Uncertain Mathematical Structure of Hydrological Model via Bayesian Data Assimilation
NASA Astrophysics Data System (ADS)
Bulygina, N.; Gupta, H.; O'Donell, G.; Wheater, H.
2008-12-01
The structure of hydrological model at macro scale (e.g. watershed) is inherently uncertain due to many factors, including the lack of a robust hydrological theory at the macro scale. In this work, we assume that a suitable conceptual model for the hydrologic system has already been determined - i.e., the system boundaries have been specified, the important state variables and input and output fluxes to be included have been selected, and the major hydrological processes and geometries of their interconnections have been identified. The structural identification problem then is to specify the mathematical form of the relationships between the inputs, state variables and outputs, so that a computational model can be constructed for making simulations and/or predictions of system input-state-output behaviour. We show how Bayesian data assimilation can be used to merge both prior beliefs in the form of pre-assumed model equations with information derived from the data to construct a posterior model. The approach, entitled Bayesian Estimation of Structure (BESt), is used to estimate a hydrological model for a small basin in England, at hourly time scales, conditioned on the assumption of 3-dimensional state - soil moisture storage, fast and slow flow stores - conceptual model structure. Inputs to the system are precipitation and potential evapotranspiration, and outputs are actual evapotranspiration and streamflow discharge. Results show the difference between prior and posterior mathematical structures, as well as provide prediction confidence intervals that reflect three types of uncertainty: due to initial conditions, due to input and due to mathematical structure.
Kuiper, Rebecca M; Nederhoff, Tim; Klugkist, Irene
2015-05-01
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration-based set of hypotheses containing equality constraints on the means, or a theory-based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory-based hypotheses) has advantages over exploration (i.e., examining all possible equality-constrained hypotheses). Furthermore, examining reasonable order-restricted hypotheses has more power to detect the true effect/non-null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory-based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). © 2014 The British Psychological Society.
Bayesian flood forecasting methods: A review
NASA Astrophysics Data System (ADS)
Han, Shasha; Coulibaly, Paulin
2017-08-01
Over the past few decades, floods have been seen as one of the most common and largely distributed natural disasters in the world. If floods could be accurately forecasted in advance, then their negative impacts could be greatly minimized. It is widely recognized that quantification and reduction of uncertainty associated with the hydrologic forecast is of great importance for flood estimation and rational decision making. Bayesian forecasting system (BFS) offers an ideal theoretic framework for uncertainty quantification that can be developed for probabilistic flood forecasting via any deterministic hydrologic model. It provides suitable theoretical structure, empirically validated models and reasonable analytic-numerical computation method, and can be developed into various Bayesian forecasting approaches. This paper presents a comprehensive review on Bayesian forecasting approaches applied in flood forecasting from 1999 till now. The review starts with an overview of fundamentals of BFS and recent advances in BFS, followed with BFS application in river stage forecasting and real-time flood forecasting, then move to a critical analysis by evaluating advantages and limitations of Bayesian forecasting methods and other predictive uncertainty assessment approaches in flood forecasting, and finally discusses the future research direction in Bayesian flood forecasting. Results show that the Bayesian flood forecasting approach is an effective and advanced way for flood estimation, it considers all sources of uncertainties and produces a predictive distribution of the river stage, river discharge or runoff, thus gives more accurate and reliable flood forecasts. Some emerging Bayesian forecasting methods (e.g. ensemble Bayesian forecasting system, Bayesian multi-model combination) were shown to overcome limitations of single model or fixed model weight and effectively reduce predictive uncertainty. In recent years, various Bayesian flood forecasting approaches have been developed and widely applied, but there is still room for improvements. Future research in the context of Bayesian flood forecasting should be on assimilation of various sources of newly available information and improvement of predictive performance assessment methods.
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.
Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
Bayesian statistics in medicine: a 25 year review.
Ashby, Deborah
2006-11-15
This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo
2017-01-05
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
An introduction to Bayesian statistics in health psychology.
Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske
2017-09-01
The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.
Temporal BYY encoding, Markovian state spaces, and space dimension determination.
Xu, Lei
2004-09-01
As a complementary to those temporal coding approaches of the current major stream, this paper aims at the Markovian state space temporal models from the perspective of the temporal Bayesian Ying-Yang (BYY) learning with both new insights and new results on not only the discrete state featured Hidden Markov model and extensions but also the continuous state featured linear state spaces and extensions, especially with a new learning mechanism that makes selection of the state number or the dimension of state space either automatically during adaptive learning or subsequently after learning via model selection criteria obtained from this mechanism. Experiments are demonstrated to show how the proposed approach works.
Bayesian Variable Selection for Hierarchical Gene-Environment and Gene-Gene Interactions
Liu, Changlu; Ma, Jianzhong; Amos, Christopher I.
2014-01-01
We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions and gene by environment interactions in the same model. Our approach incorporates the natural hierarchical structure between the main effects and interaction effects into a mixture model, such that our methods tend to remove the irrelevant interaction effects more effectively, resulting in more robust and parsimonious models. We consider both strong and weak hierarchical models. For a strong hierarchical model, both of the main effects between interacting factors must be present for the interactions to be considered in the model development, while for a weak hierarchical model, only one of the two main effects is required to be present for the interaction to be evaluated. Our simulation results show that the proposed strong and weak hierarchical mixture models work well in controlling false positive rates and provide a powerful approach for identifying the predisposing effects and interactions in gene-environment interaction studies, in comparison with the naive model that does not impose this hierarchical constraint in most of the scenarios simulated. We illustrated our approach using data for lung cancer and cutaneous melanoma. PMID:25154630
Spatial Prediction and Optimized Sampling Design for Sodium Concentration in Groundwater
Shabbir, Javid; M. AbdEl-Salam, Nasser; Hussain, Tajammal
2016-01-01
Sodium is an integral part of water, and its excessive amount in drinking water causes high blood pressure and hypertension. In the present paper, spatial distribution of sodium concentration in drinking water is modeled and optimized sampling designs for selecting sampling locations is calculated for three divisions in Punjab, Pakistan. Universal kriging and Bayesian universal kriging are used to predict the sodium concentrations. Spatial simulated annealing is used to generate optimized sampling designs. Different estimation methods (i.e., maximum likelihood, restricted maximum likelihood, ordinary least squares, and weighted least squares) are used to estimate the parameters of the variogram model (i.e, exponential, Gaussian, spherical and cubic). It is concluded that Bayesian universal kriging fits better than universal kriging. It is also observed that the universal kriging predictor provides minimum mean universal kriging variance for both adding and deleting locations during sampling design. PMID:27683016
On selecting a prior for the precision parameter of Dirichlet process mixture models
Dorazio, R.M.
2009-01-01
In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.
A Bayesian modelling framework for tornado occurrences in North America
NASA Astrophysics Data System (ADS)
Cheng, Vincent Y. S.; Arhonditsis, George B.; Sills, David M. L.; Gough, William A.; Auld, Heather
2015-03-01
Tornadoes represent one of nature’s most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year.
A Bayesian modelling framework for tornado occurrences in North America.
Cheng, Vincent Y S; Arhonditsis, George B; Sills, David M L; Gough, William A; Auld, Heather
2015-03-25
Tornadoes represent one of nature's most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year.
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression
Wiedenhoeft, John; Brugel, Eric; Schliep, Alexander
2016-01-01
By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings. PMID:27177143
A Bayesian Approach for Summarizing and Modeling Time-Series Exposure Data with Left Censoring.
Houseman, E Andres; Virji, M Abbas
2017-08-01
Direct reading instruments are valuable tools for measuring exposure as they provide real-time measurements for rapid decision making. However, their use is limited to general survey applications in part due to issues related to their performance. Moreover, statistical analysis of real-time data is complicated by autocorrelation among successive measurements, non-stationary time series, and the presence of left-censoring due to limit-of-detection (LOD). A Bayesian framework is proposed that accounts for non-stationary autocorrelation and LOD issues in exposure time-series data in order to model workplace factors that affect exposure and estimate summary statistics for tasks or other covariates of interest. A spline-based approach is used to model non-stationary autocorrelation with relatively few assumptions about autocorrelation structure. Left-censoring is addressed by integrating over the left tail of the distribution. The model is fit using Markov-Chain Monte Carlo within a Bayesian paradigm. The method can flexibly account for hierarchical relationships, random effects and fixed effects of covariates. The method is implemented using the rjags package in R, and is illustrated by applying it to real-time exposure data. Estimates for task means and covariates from the Bayesian model are compared to those from conventional frequentist models including linear regression, mixed-effects, and time-series models with different autocorrelation structures. Simulations studies are also conducted to evaluate method performance. Simulation studies with percent of measurements below the LOD ranging from 0 to 50% showed lowest root mean squared errors for task means and the least biased standard deviations from the Bayesian model compared to the frequentist models across all levels of LOD. In the application, task means from the Bayesian model were similar to means from the frequentist models, while the standard deviations were different. Parameter estimates for covariates were significant in some frequentist models, but in the Bayesian model their credible intervals contained zero; such discrepancies were observed in multiple datasets. Variance components from the Bayesian model reflected substantial autocorrelation, consistent with the frequentist models, except for the auto-regressive moving average model. Plots of means from the Bayesian model showed good fit to the observed data. The proposed Bayesian model provides an approach for modeling non-stationary autocorrelation in a hierarchical modeling framework to estimate task means, standard deviations, quantiles, and parameter estimates for covariates that are less biased and have better performance characteristics than some of the contemporary methods. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
Bayesian Learning and the Psychology of Rule Induction
ERIC Educational Resources Information Center
Endress, Ansgar D.
2013-01-01
In recent years, Bayesian learning models have been applied to an increasing variety of domains. While such models have been criticized on theoretical grounds, the underlying assumptions and predictions are rarely made concrete and tested experimentally. Here, I use Frank and Tenenbaum's (2011) Bayesian model of rule-learning as a case study to…
Properties of the Bayesian Knowledge Tracing Model
ERIC Educational Resources Information Center
van de Sande, Brett
2013-01-01
Bayesian Knowledge Tracing is used very widely to model student learning. It comes in two different forms: The first form is the Bayesian Knowledge Tracing "hidden Markov model" which predicts the probability of correct application of a skill as a function of the number of previous opportunities to apply that skill and the model…
Bayesian Analysis of Longitudinal Data Using Growth Curve Models
ERIC Educational Resources Information Center
Zhang, Zhiyong; Hamagami, Fumiaki; Wang, Lijuan Lijuan; Nesselroade, John R.; Grimm, Kevin J.
2007-01-01
Bayesian methods for analyzing longitudinal data in social and behavioral research are recommended for their ability to incorporate prior information in estimating simple and complex models. We first summarize the basics of Bayesian methods before presenting an empirical example in which we fit a latent basis growth curve model to achievement data…
Testing students' e-learning via Facebook through Bayesian structural equation modeling.
Salarzadeh Jenatabadi, Hashem; Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students' intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods' results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated.
Testing students’ e-learning via Facebook through Bayesian structural equation modeling
Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students’ intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods’ results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated. PMID:28886019
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Rodrigues, Josemar; Cancho, Vicente G; de Castro, Mário; Balakrishnan, N
2012-12-01
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis--latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de São Carlos, São Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Approximate Bayesian estimation of extinction rate in the Finnish Daphnia magna metapopulation.
Robinson, John D; Hall, David W; Wares, John P
2013-05-01
Approximate Bayesian computation (ABC) is useful for parameterizing complex models in population genetics. In this study, ABC was applied to simultaneously estimate parameter values for a model of metapopulation coalescence and test two alternatives to a strict metapopulation model in the well-studied network of Daphnia magna populations in Finland. The models shared four free parameters: the subpopulation genetic diversity (θS), the rate of gene flow among patches (4Nm), the founding population size (N0) and the metapopulation extinction rate (e) but differed in the distribution of extinction rates across habitat patches in the system. The three models had either a constant extinction rate in all populations (strict metapopulation), one population that was protected from local extinction (i.e. a persistent source), or habitat-specific extinction rates drawn from a distribution with specified mean and variance. Our model selection analysis favoured the model including a persistent source population over the two alternative models. Of the closest 750,000 data sets in Euclidean space, 78% were simulated under the persistent source model (estimated posterior probability = 0.769). This fraction increased to more than 85% when only the closest 150,000 data sets were considered (estimated posterior probability = 0.774). Approximate Bayesian computation was then used to estimate parameter values that might produce the observed set of summary statistics. Our analysis provided posterior distributions for e that included the point estimate obtained from previous data from the Finnish D. magna metapopulation. Our results support the use of ABC and population genetic data for testing the strict metapopulation model and parameterizing complex models of demography. © 2013 Blackwell Publishing Ltd.
Saleh, Mohammad I
2017-11-01
Pegylated interferon α-2a (PEG-IFN-α-2a) is an antiviral drug used for the treatment of chronic hepatitis C virus (HCV) infection. This study describes the population pharmacokinetics of PEG-IFN-α-2a in hepatitis C patients using a Bayesian approach. A possible association between patient characteristics and pharmacokinetic parameters is also explored. A Bayesian population pharmacokinetic modeling approach, using WinBUGS version 1.4.3, was applied to a cohort of patients (n = 292) with chronic HCV infection. Data were obtained from two phase III studies sponsored by Hoffmann-La Roche. Demographic and clinical information were evaluated as possible predictors of pharmacokinetic parameters during model development. A one-compartment model with an additive error best fitted the data, and a total of 2271 PEG-IFN-α-2a measurements from 292 subjects were analyzed using the proposed population pharmacokinetic model. Sex was identified as a predictor of PEG-IFN-α-2a clearance, and hemoglobin baseline level was identified as a predictor of PEG-IFN-α-2a volume of distribution. A population pharmacokinetic model of PEG-IFN-α-2a in patients with chronic HCV infection was presented in this study. The proposed model can be used to optimize PEG-IFN-α-2a dosing in patients with chronic HCV infection. Optimal PEG-IFN-α-2a selection is important to maximize response and/or to avoid potential side effects such as thrombocytopenia and neutropenia. NV15942 and NV15801.
NASA Astrophysics Data System (ADS)
Likhachev, Dmitriy V.
2017-06-01
Johs and Hale developed the Kramers-Kronig consistent B-spline formulation for the dielectric function modeling in spectroscopic ellipsometry data analysis. In this article we use popular Akaike, corrected Akaike and Bayesian Information Criteria (AIC, AICc and BIC, respectively) to determine an optimal number of knots for B-spline model. These criteria allow finding a compromise between under- and overfitting of experimental data since they penalize for increasing number of knots and select representation which achieves the best fit with minimal number of knots. Proposed approach provides objective and practical guidance, as opposite to empirically driven or "gut feeling" decisions, for selecting the right number of knots for B-spline models in spectroscopic ellipsometry. AIC, AICc and BIC selection criteria work remarkably well as we demonstrated in several real-data applications. This approach formalizes selection of the optimal knot number and may be useful in practical perspective of spectroscopic ellipsometry data analysis.
Bayesian naturalness, simplicity, and testability applied to the B ‑ L MSSM GUT
NASA Astrophysics Data System (ADS)
Fundira, Panashe; Purves, Austin
2018-04-01
Recent years have seen increased use of Bayesian model comparison to quantify notions such as naturalness, simplicity, and testability, especially in the area of supersymmetric model building. After demonstrating that Bayesian model comparison can resolve a paradox that has been raised in the literature concerning the naturalness of the proton mass, we apply Bayesian model comparison to GUTs, an area to which it has not been applied before. We find that the GUTs are substantially favored over the nonunifying puzzle model. Of the GUTs we consider, the B ‑ L MSSM GUT is the most favored, but the MSSM GUT is almost equally favored.
NASA Astrophysics Data System (ADS)
Mohammad-Djafari, Ali
2015-01-01
The main object of this tutorial article is first to review the main inference tools using Bayesian approach, Entropy, Information theory and their corresponding geometries. This review is focused mainly on the ways these tools have been used in data, signal and image processing. After a short introduction of the different quantities related to the Bayes rule, the entropy and the Maximum Entropy Principle (MEP), relative entropy and the Kullback-Leibler divergence, Fisher information, we will study their use in different fields of data and signal processing such as: entropy in source separation, Fisher information in model order selection, different Maximum Entropy based methods in time series spectral estimation and finally, general linear inverse problems.
English, Sangeeta B.; Shih, Shou-Ching; Ramoni, Marco F.; Smith, Lois E.; Butte, Atul J.
2014-01-01
Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation. PMID:18790084
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Kärkkäinen, Hanni P; Sillanpää, Mikko J
2013-09-04
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.
Kärkkäinen, Hanni P.; Sillanpää, Mikko J.
2013-01-01
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed. PMID:23821618
Uncertainties in ozone concentrations predicted with a Lagrangian photochemical air quality model have been estimated using Bayesian Monte Carlo (BMC) analysis. Bayesian Monte Carlo analysis provides a means of combining subjective "prior" uncertainty estimates developed ...
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Quantitative Rheological Model Selection
NASA Astrophysics Data System (ADS)
Freund, Jonathan; Ewoldt, Randy
2014-11-01
The more parameters in a rheological the better it will reproduce available data, though this does not mean that it is necessarily a better justified model. Good fits are only part of model selection. We employ a Bayesian inference approach that quantifies model suitability by balancing closeness to data against both the number of model parameters and their a priori uncertainty. The penalty depends upon prior-to-calibration expectation of the viable range of values that model parameters might take, which we discuss as an essential aspect of the selection criterion. Models that are physically grounded are usually accompanied by tighter physical constraints on their respective parameters. The analysis reflects a basic principle: models grounded in physics can be expected to enjoy greater generality and perform better away from where they are calibrated. In contrast, purely empirical models can provide comparable fits, but the model selection framework penalizes their a priori uncertainty. We demonstrate the approach by selecting the best-justified number of modes in a Multi-mode Maxwell description of PVA-Borax. We also quantify relative merits of the Maxwell model relative to powerlaw fits and purely empirical fits for PVA-Borax, a viscoelastic liquid, and gluten.
Method for Automatic Selection of Parameters in Normal Tissue Complication Probability Modeling.
Christophides, Damianos; Appelt, Ane L; Gusnanto, Arief; Lilley, John; Sebag-Montefiore, David
2018-07-01
To present a fully automatic method to generate multiparameter normal tissue complication probability (NTCP) models and compare its results with those of a published model, using the same patient cohort. Data were analyzed from 345 rectal cancer patients treated with external radiation therapy to predict the risk of patients developing grade 1 or ≥2 cystitis. In total, 23 clinical factors were included in the analysis as candidate predictors of cystitis. Principal component analysis was used to decompose the bladder dose-volume histogram into 8 principal components, explaining more than 95% of the variance. The data set of clinical factors and principal components was divided into training (70%) and test (30%) data sets, with the training data set used by the algorithm to compute an NTCP model. The first step of the algorithm was to obtain a bootstrap sample, followed by multicollinearity reduction using the variance inflation factor and genetic algorithm optimization to determine an ordinal logistic regression model that minimizes the Bayesian information criterion. The process was repeated 100 times, and the model with the minimum Bayesian information criterion was recorded on each iteration. The most frequent model was selected as the final "automatically generated model" (AGM). The published model and AGM were fitted on the training data sets, and the risk of cystitis was calculated. The 2 models had no significant differences in predictive performance, both for the training and test data sets (P value > .05) and found similar clinical and dosimetric factors as predictors. Both models exhibited good explanatory performance on the training data set (P values > .44), which was reduced on the test data sets (P values < .05). The predictive value of the AGM is equivalent to that of the expert-derived published model. It demonstrates potential in saving time, tackling problems with a large number of parameters, and standardizing variable selection in NTCP modeling. Crown Copyright © 2018. Published by Elsevier Inc. All rights reserved.
To fulfill its mission to protect human health and the environment, EPA has established National Ambient Air Quality Standards (NAAQS) on six selected air pollutants known as criteria pollutants: ozone (O3); carbon monoxide (CO); lead (Pb); nitrogen dioxide (NO2); sulfur dioxide ...
USDA-ARS?s Scientific Manuscript database
PURPOSE: Bacterial cold water disease (BCWD) causes significant economic loss in salmonid aquaculture, and in 2005, a rainbow trout breeding program was initiated at the NCCCWA to select for increased disease survival. The main objectives of this study were to determine the mode of inheritance of di...
Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation
Peter, Benjamin M.; Huerta-Sanchez, Emilia; Nielsen, Rasmus
2012-01-01
An outstanding question in human genetics has been the degree to which adaptation occurs from standing genetic variation or from de novo mutations. Here, we combine several common statistics used to detect selection in an Approximate Bayesian Computation (ABC) framework, with the goal of discriminating between models of selection and providing estimates of the age of selected alleles and the selection coefficients acting on them. We use simulations to assess the power and accuracy of our method and apply it to seven of the strongest sweeps currently known in humans. We identify two genes, ASPM and PSCA, that are most likely affected by selection on standing variation; and we find three genes, ADH1B, LCT, and EDAR, in which the adaptive alleles seem to have swept from a new mutation. We also confirm evidence of selection for one further gene, TRPV6. In one gene, G6PD, neither neutral models nor models of selective sweeps fit the data, presumably because this locus has been subject to balancing selection. PMID:23071458
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
Ortega, Alonso; Labrenz, Stephan; Markowitsch, Hans J; Piefke, Martina
2013-01-01
In the last decade, different statistical techniques have been introduced to improve assessment of malingering-related poor effort. In this context, we have recently shown preliminary evidence that a Bayesian latent group model may help to optimize classification accuracy using a simulation research design. In the present study, we conducted two analyses. Firstly, we evaluated how accurately this Bayesian approach can distinguish between participants answering in an honest way (honest response group) and participants feigning cognitive impairment (experimental malingering group). Secondly, we tested the accuracy of our model in the differentiation between patients who had real cognitive deficits (cognitively impaired group) and participants who belonged to the experimental malingering group. All Bayesian analyses were conducted using the raw scores of a visual recognition forced-choice task (2AFC), the Test of Memory Malingering (TOMM, Trial 2), and the Word Memory Test (WMT, primary effort subtests). The first analysis showed 100% accuracy for the Bayesian model in distinguishing participants of both groups with all effort measures. The second analysis showed outstanding overall accuracy of the Bayesian model when estimates were obtained from the 2AFC and the TOMM raw scores. Diagnostic accuracy of the Bayesian model diminished when using the WMT total raw scores. Despite, overall diagnostic accuracy can still be considered excellent. The most plausible explanation for this decrement is the low performance in verbal recognition and fluency tasks of some patients of the cognitively impaired group. Additionally, the Bayesian model provides individual estimates, p(zi |D), of examinees' effort levels. In conclusion, both high classification accuracy levels and Bayesian individual estimates of effort may be very useful for clinicians when assessing for effort in medico-legal settings.
Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno
2016-01-01
Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323
Estimating Tree Height-Diameter Models with the Bayesian Method
Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the “best” model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2. PMID:24711733
Estimating tree height-diameter models with the Bayesian method.
Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.
Gerber, Brian D.; Kendall, William L.; Hooten, Mevin B.; Dubovsky, James A.; Drewien, Roderick C.
2015-01-01
Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment.Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression.Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring–summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect.Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond.Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions.
Polynomial order selection in random regression models via penalizing adaptively the likelihood.
Corrales, J D; Munilla, S; Cantet, R J C
2015-08-01
Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates. © 2015 Blackwell Verlag GmbH.
Accurate Biomass Estimation via Bayesian Adaptive Sampling
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay
2005-01-01
The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.
Bayesian adaptive phase II screening design for combination trials.
Cai, Chunyan; Yuan, Ying; Johnson, Valen E
2013-01-01
Trials of combination therapies for the treatment of cancer are playing an increasingly important role in the battle against this disease. To more efficiently handle the large number of combination therapies that must be tested, we propose a novel Bayesian phase II adaptive screening design to simultaneously select among possible treatment combinations involving multiple agents. Our design is based on formulating the selection procedure as a Bayesian hypothesis testing problem in which the superiority of each treatment combination is equated to a single hypothesis. During the trial conduct, we use the current values of the posterior probabilities of all hypotheses to adaptively allocate patients to treatment combinations. Simulation studies show that the proposed design substantially outperforms the conventional multiarm balanced factorial trial design. The proposed design yields a significantly higher probability for selecting the best treatment while allocating substantially more patients to efficacious treatments. The proposed design is most appropriate for the trials combining multiple agents and screening out the efficacious combination to be further investigated. The proposed Bayesian adaptive phase II screening design substantially outperformed the conventional complete factorial design. Our design allocates more patients to better treatments while providing higher power to identify the best treatment at the end of the trial.
In Silico Syndrome Prediction for Coronary Artery Disease in Traditional Chinese Medicine
Lu, Peng; Chen, Jianxin; Zhao, Huihui; Gao, Yibo; Luo, Liangtao; Zuo, Xiaohan; Shi, Qi; Yang, Yiping; Yi, Jianqiang; Wang, Wei
2012-01-01
Coronary artery disease (CAD) is the leading causes of deaths in the world. The differentiation of syndrome (ZHENG) is the criterion of diagnosis and therapeutic in TCM. Therefore, syndrome prediction in silico can be improving the performance of treatment. In this paper, we present a Bayesian network framework to construct a high-confidence syndrome predictor based on the optimum subset, that is, collected by Support Vector Machine (SVM) feature selection. Syndrome of CAD can be divided into asthenia and sthenia syndromes. According to the hierarchical characteristics of syndrome, we firstly label every case three types of syndrome (asthenia, sthenia, or both) to solve several syndromes with some patients. On basis of the three syndromes' classes, we design SVM feature selection to achieve the optimum symptom subset and compare this subset with Markov blanket feature select using ROC. Using this subset, the six predictors of CAD's syndrome are constructed by the Bayesian network technique. We also design Naïve Bayes, C4.5 Logistic, Radial basis function (RBF) network compared with Bayesian network. In a conclusion, the Bayesian network method based on the optimum symptoms shows a practical method to predict six syndromes of CAD in TCM. PMID:22567030
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios, E-mail: georgios.karagiannis@pnnl.gov; Lin, Guang, E-mail: guang.lin@pnnl.gov
2014-02-15
Generalized polynomial chaos (gPC) expansions allow us to represent the solution of a stochastic system using a series of polynomial chaos basis functions. The number of gPC terms increases dramatically as the dimension of the random input variables increases. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs when the corresponding deterministic solver is computationally expensive, evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solutions, in both spatial and random domains, bymore » coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spatial points, via (1) the Bayesian model average (BMA) or (2) the median probability model, and their construction as spatial functions on the spatial domain via spline interpolation. The former accounts for the model uncertainty and provides Bayes-optimal predictions; while the latter provides a sparse representation of the stochastic solutions by evaluating the expansion on a subset of dominating gPC bases. Moreover, the proposed methods quantify the importance of the gPC bases in the probabilistic sense through inclusion probabilities. We design a Markov chain Monte Carlo (MCMC) sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed methods are suitable for, but not restricted to, problems whose stochastic solutions are sparse in the stochastic space with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the accuracy and performance of the proposed methods and make comparisons with other approaches on solving elliptic SPDEs with 1-, 14- and 40-random dimensions.« less
Sparse Event Modeling with Hierarchical Bayesian Kernel Methods
2016-01-05
SECURITY CLASSIFICATION OF: The research objective of this proposal was to develop a predictive Bayesian kernel approach to model count data based on...several predictive variables. Such an approach, which we refer to as the Poisson Bayesian kernel model , is able to model the rate of occurrence of...which adds specificity to the model and can make nonlinear data more manageable. Early results show that the 1. REPORT DATE (DD-MM-YYYY) 4. TITLE
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors
Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world’s deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014. PMID:28257437
Phylodynamics of classical swine fever virus with emphasis on Ecuadorian strains.
Garrido Haro, A D; Barrera Valle, M; Acosta, A; J Flores, F
2018-06-01
Classic swine fever virus (CSFV) is a Pestivirus from the Flaviviridae family that affects pigs worldwide and is endemic in several Latin American countries. However, there are still some countries in the region, including Ecuador, for which CSFV molecular information is lacking. To better understand the epidemiology of CSFV in the Americas, sequences from CSFVs from Ecuador were generated and a phylodynamic analysis of the virus was performed. Sequences for the full-length glycoprotein E2 gene of twenty field isolates were obtained and, along with sequences from strains previously described in the Americas and from the most representative strains worldwide, were used to analyse the phylodynamics of the virus. Bayesian methods were used to test several molecular clock and demographic models. A calibrated ultrametric tree and a Bayesian skyline were constructed, and codons associated with positive selection involving immune scape were detected. The best model according to Bayes factors was the strict molecular clock and Bayesian skyline model, which shows that CSFV has an evolution rate of 3.2 × 10 -4 substitutions per site per year. The model estimates the origin of CSFV in the mid-1500s. There is a strong spatial structure for CSFV in the Americas, indicating that the virus is moving mainly through neighbouring countries. The genetic diversity of CSFV has increased constantly since its appearance, with a slight decrease in mid-twentieth century, which coincides, with eradication campaigns in North America. Even though there is no evidence of strong directional evolution of the E2 gene in CSFV, codons 713, 761, 762 and 975 appear to be selected positively and could be related to virulence or pathogenesis. These results reveal how CSFV has spread and evolved since it first appeared in the Americas and provide important information for attaining the goal of eradication of this virus in Latin America. © 2018 Blackwell Verlag GmbH.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato.
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs. PMID:29563919
Ridge, Lasso and Bayesian additive-dominance genomic models.
Azevedo, Camila Ferreira; de Resende, Marcos Deon Vilela; E Silva, Fabyano Fonseca; Viana, José Marcelo Soriano; Valente, Magno Sávio Ferreira; Resende, Márcio Fernando Ribeiro; Muñoz, Patricio
2015-08-25
A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (-2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
Nonparametric Bayesian inference of the microcanonical stochastic block model
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
A Primer for Model Selection: The Decisive Role of Model Complexity
NASA Astrophysics Data System (ADS)
Höge, Marvin; Wöhling, Thomas; Nowak, Wolfgang
2018-03-01
Selecting a "best" model among several competing candidate models poses an often encountered problem in water resources modeling (and other disciplines which employ models). For a modeler, the best model fulfills a certain purpose best (e.g., flood prediction), which is typically assessed by comparing model simulations to data (e.g., stream flow). Model selection methods find the "best" trade-off between good fit with data and model complexity. In this context, the interpretations of model complexity implied by different model selection methods are crucial, because they represent different underlying goals of modeling. Over the last decades, numerous model selection criteria have been proposed, but modelers who primarily want to apply a model selection criterion often face a lack of guidance for choosing the right criterion that matches their goal. We propose a classification scheme for model selection criteria that helps to find the right criterion for a specific goal, i.e., which employs the correct complexity interpretation. We identify four model selection classes which seek to achieve high predictive density, low predictive error, high model probability, or shortest compression of data. These goals can be achieved by following either nonconsistent or consistent model selection and by either incorporating a Bayesian parameter prior or not. We allocate commonly used criteria to these four classes, analyze how they represent model complexity and what this means for the model selection task. Finally, we provide guidance on choosing the right type of criteria for specific model selection tasks. (A quick guide through all key points is given at the end of the introduction.)
Cholinergic stimulation enhances Bayesian belief updating in the deployment of spatial attention.
Vossel, Simone; Bauer, Markus; Mathys, Christoph; Adams, Rick A; Dolan, Raymond J; Stephan, Klaas E; Friston, Karl J
2014-11-19
The exact mechanisms whereby the cholinergic neurotransmitter system contributes to attentional processing remain poorly understood. Here, we applied computational modeling to psychophysical data (obtained from a spatial attention task) under a psychopharmacological challenge with the cholinesterase inhibitor galantamine (Reminyl). This allowed us to characterize the cholinergic modulation of selective attention formally, in terms of hierarchical Bayesian inference. In a placebo-controlled, within-subject, crossover design, 16 healthy human subjects performed a modified version of Posner's location-cueing task in which the proportion of validly and invalidly cued targets (percentage of cue validity, % CV) changed over time. Saccadic response speeds were used to estimate the parameters of a hierarchical Bayesian model to test whether cholinergic stimulation affected the trial-wise updating of probabilistic beliefs that underlie the allocation of attention or whether galantamine changed the mapping from those beliefs to subsequent eye movements. Behaviorally, galantamine led to a greater influence of probabilistic context (% CV) on response speed than placebo. Crucially, computational modeling suggested this effect was due to an increase in the rate of belief updating about cue validity (as opposed to the increased sensitivity of behavioral responses to those beliefs). We discuss these findings with respect to cholinergic effects on hierarchical cortical processing and in relation to the encoding of expected uncertainty or precision. Copyright © 2014 the authors 0270-6474/14/3415735-08$15.00/0.
CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md
2014-01-01
Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803
Bayesian Methods for the Physical Sciences. Learning from Examples in Astronomy and Physics.
NASA Astrophysics Data System (ADS)
Andreon, Stefano; Weaver, Brian
2015-05-01
Chapter 1: This chapter presents some basic steps for performing a good statistical analysis, all summarized in about one page. Chapter 2: This short chapter introduces the basics of probability theory inan intuitive fashion using simple examples. It also illustrates, again with examples, how to propagate errors and the difference between marginal and profile likelihoods. Chapter 3: This chapter introduces the computational tools and methods that we use for sampling from the posterior distribution. Since all numerical computations, and Bayesian ones are no exception, may end in errors, we also provide a few tips to check that the numerical computation is sampling from the posterior distribution. Chapter 4: Many of the concepts of building, running, and summarizing the resultsof a Bayesian analysis are described with this step-by-step guide using a basic (Gaussian) model. The chapter also introduces examples using Poisson and Binomial likelihoods, and how to combine repeated independent measurements. Chapter 5: All statistical analyses make assumptions, and Bayesian analyses are no exception. This chapter emphasizes that results depend on data and priors (assumptions). We illustrate this concept with examples where the prior plays greatly different roles, from major to negligible. We also provide some advice on how to look for information useful for sculpting the prior. Chapter 6: In this chapter we consider examples for which we want to estimate more than a single parameter. These common problems include estimating location and spread. We also consider examples that require the modeling of two populations (one we are interested in and a nuisance population) or averaging incompatible measurements. We also introduce quite complex examples dealing with upper limits and with a larger-than-expected scatter. Chapter 7: Rarely is a sample randomly selected from the population we wish to study. Often, samples are affected by selection effects, e.g., easier-to-collect events or objects are over-represented in samples and difficult-to-collect are under-represented if not missing altogether. In this chapter we show how to account for non-random data collection to infer the properties of the population from the studied sample. Chapter 8: In this chapter we introduce regression models, i.e., how to fit (regress) one, or more quantities, against each other through a functional relationship and estimate any unknown parameters that dictate this relationship. Questions of interest include: how to deal with samples affected by selection effects? How does a rich data structure influence the fitted parameters? And what about non-linear multiple-predictor fits, upper/lower limits, measurements errors of different amplitudes and an intrinsic variety in the studied populations or an extra source of variability? A number of examples illustrate how to answer these questions and how to predict the value of an unavailable quantity by exploiting the existence of a trend with another, available, quantity. Chapter 9: This chapter provides some advice on how the careful scientist should perform model checking and sensitivity analysis, i.e., how to answer the following questions: is the considered model at odds with the current available data (the fitted data), for example because it is over-simplified compared to some specific complexity pointed out by the data? Furthermore, are the data informative about the quantity being measured or are results sensibly dependent on details of the fitted model? And, finally, what about if assumptions are uncertain? A number of examples illustrate how to answer these questions. Chapter 10: This chapter compares the performance of Bayesian methods against simple, non-Bayesian alternatives, such as maximum likelihood, minimal chi square, ordinary and weighted least square, bivariate correlated errors and intrinsic scatter, and robust estimates of location and scale. Performances are evaluated in terms of quality of the prediction, accuracy of the estimates, and fairness and noisiness of the quoted errors. We also focus on three failures of maximum likelihood methods occurring with small samples, with mixtures, and with regressions with errors in the predictor quantity.
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Ye, M.; Liang, F.
2016-12-01
Due to simplification and/or misrepresentation of the real aquifer system, numerical groundwater flow and solute transport models are usually subject to model structural error. During model calibration, the hydrogeological parameters may be overly adjusted to compensate for unknown structural error. This may result in biased predictions when models are used to forecast aquifer response to new forcing. In this study, we extend a fully Bayesian method [Xu and Valocchi, 2015] to calibrate a real-world, regional groundwater flow model. The method uses a data-driven error model to describe model structural error and jointly infers model parameters and structural error. In this study, Bayesian inference is facilitated using high performance computing and fast surrogate models. The surrogate models are constructed using machine learning techniques to emulate the response simulated by the computationally expensive groundwater model. We demonstrate in the real-world case study that explicitly accounting for model structural error yields parameter posterior distributions that are substantially different from those derived by the classical Bayesian calibration that does not account for model structural error. In addition, the Bayesian with error model method gives significantly more accurate prediction along with reasonable credible intervals.
Daee, Pedram; Mirian, Maryam S; Ahmadabadi, Majid Nili
2014-01-01
In a multisensory task, human adults integrate information from different sensory modalities--behaviorally in an optimal Bayesian fashion--while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities--i.e. selection--at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon.
The Bayesian Revolution Approaches Psychological Development
ERIC Educational Resources Information Center
Shultz, Thomas R.
2007-01-01
This commentary reviews five articles that apply Bayesian ideas to psychological development, some with psychology experiments, some with computational modeling, and some with both experiments and modeling. The reviewed work extends the current Bayesian revolution into tasks often studied in children, such as causal learning and word learning, and…
Entropic criterion for model selection
NASA Astrophysics Data System (ADS)
Tseng, Chih-Yuan
2006-10-01
Model or variable selection is usually achieved through ranking models according to the increasing order of preference. One of methods is applying Kullback-Leibler distance or relative entropy as a selection criterion. Yet that will raise two questions, why use this criterion and are there any other criteria. Besides, conventional approaches require a reference prior, which is usually difficult to get. Following the logic of inductive inference proposed by Caticha [Relative entropy and inductive inference, in: G. Erickson, Y. Zhai (Eds.), Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, vol. 707, 2004 (available from arXiv.org/abs/physics/0311093)], we show relative entropy to be a unique criterion, which requires no prior information and can be applied to different fields. We examine this criterion by considering a physical problem, simple fluids, and results are promising.
Bayesian Optimization for Neuroimaging Pre-processing in Brain Age Classification and Prediction
Lancaster, Jenessa; Lorenz, Romy; Leech, Rob; Cole, James H.
2018-01-01
Neuroimaging-based age prediction using machine learning is proposed as a biomarker of brain aging, relating to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalization to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimizing resampling parameters using Bayesian optimization. Using data on N = 2003 healthy individuals (aged 16–90 years) we trained support vector machines to (i) distinguish between young (<22 years) and old (>50 years) brains (classification) and (ii) predict chronological age (regression). We also evaluated generalisability of the age-regression model to an independent dataset (CamCAN, N = 648, aged 18–88 years). Bayesian optimization was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 88.1% was achieved, (optimal voxel size = 11.5 mm3, smoothing kernel = 2.3 mm). For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, (optimal voxel size = 3.73 mm3, smoothing kernel = 3.68 mm). This was compared to performance using default values of 1.5 mm3 and 4mm respectively, resulting in MAE = 5.48 years, though this 7.3% improvement was not statistically significant. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimization framework to the new dataset, out-performing the parameters optimized for the initial training dataset. Our study outlines the proof-of-principle that neuroimaging models for brain-age prediction can use Bayesian optimization to derive case-specific pre-processing parameters. Our results suggest that different pre-processing parameters are selected when optimization is conducted in specific contexts. This potentially motivates use of optimization techniques at many different points during the experimental process, which may improve statistical sensitivity and reduce opportunities for experimenter-led bias. PMID:29483870
Computational Neuropsychology and Bayesian Inference.
Parr, Thomas; Rees, Geraint; Friston, Karl J
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine 'prior' beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology - optimal inference with suboptimal priors - and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient's behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology.
Computational Neuropsychology and Bayesian Inference
Parr, Thomas; Rees, Geraint; Friston, Karl J.
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine ‘prior’ beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology – optimal inference with suboptimal priors – and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient’s behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology. PMID:29527157
Incorporating approximation error in surrogate based Bayesian inversion
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.; Li, W.; Wu, L.
2015-12-01
There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.
Rodgers, Joseph Lee
2016-01-01
The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.
A Bayesian Model of the Memory Colour Effect.
Witzel, Christoph; Olkkonen, Maria; Gegenfurtner, Karl R
2018-01-01
According to the memory colour effect, the colour of a colour-diagnostic object is not perceived independently of the object itself. Instead, it has been shown through an achromatic adjustment method that colour-diagnostic objects still appear slightly in their typical colour, even when they are colourimetrically grey. Bayesian models provide a promising approach to capture the effect of prior knowledge on colour perception and to link these effects to more general effects of cue integration. Here, we model memory colour effects using prior knowledge about typical colours as priors for the grey adjustments in a Bayesian model. This simple model does not involve any fitting of free parameters. The Bayesian model roughly captured the magnitude of the measured memory colour effect for photographs of objects. To some extent, the model predicted observed differences in memory colour effects across objects. The model could not account for the differences in memory colour effects across different levels of realism in the object images. The Bayesian model provides a particularly simple account of memory colour effects, capturing some of the multiple sources of variation of these effects.
A Bayesian Model of the Memory Colour Effect
Olkkonen, Maria; Gegenfurtner, Karl R.
2018-01-01
According to the memory colour effect, the colour of a colour-diagnostic object is not perceived independently of the object itself. Instead, it has been shown through an achromatic adjustment method that colour-diagnostic objects still appear slightly in their typical colour, even when they are colourimetrically grey. Bayesian models provide a promising approach to capture the effect of prior knowledge on colour perception and to link these effects to more general effects of cue integration. Here, we model memory colour effects using prior knowledge about typical colours as priors for the grey adjustments in a Bayesian model. This simple model does not involve any fitting of free parameters. The Bayesian model roughly captured the magnitude of the measured memory colour effect for photographs of objects. To some extent, the model predicted observed differences in memory colour effects across objects. The model could not account for the differences in memory colour effects across different levels of realism in the object images. The Bayesian model provides a particularly simple account of memory colour effects, capturing some of the multiple sources of variation of these effects. PMID:29760874
Inference of reactive transport model parameters using a Bayesian multivariate approach
NASA Astrophysics Data System (ADS)
Carniato, Luca; Schoups, Gerrit; van de Giesen, Nick
2014-08-01
Parameter estimation of subsurface transport models from multispecies data requires the definition of an objective function that includes different types of measurements. Common approaches are weighted least squares (WLS), where weights are specified a priori for each measurement, and weighted least squares with weight estimation (WLS(we)) where weights are estimated from the data together with the parameters. In this study, we formulate the parameter estimation task as a multivariate Bayesian inference problem. The WLS and WLS(we) methods are special cases in this framework, corresponding to specific prior assumptions about the residual covariance matrix. The Bayesian perspective allows for generalizations to cases where residual correlation is important and for efficient inference by analytically integrating out the variances (weights) and selected covariances from the joint posterior. Specifically, the WLS and WLS(we) methods are compared to a multivariate (MV) approach that accounts for specific residual correlations without the need for explicit estimation of the error parameters. When applied to inference of reactive transport model parameters from column-scale data on dissolved species concentrations, the following results were obtained: (1) accounting for residual correlation between species provides more accurate parameter estimation for high residual correlation levels whereas its influence for predictive uncertainty is negligible, (2) integrating out the (co)variances leads to an efficient estimation of the full joint posterior with a reduced computational effort compared to the WLS(we) method, and (3) in the presence of model structural errors, none of the methods is able to identify the correct parameter values.
An efficient Bayesian data-worth analysis using a multilevel Monte Carlo method
NASA Astrophysics Data System (ADS)
Lu, Dan; Ricciuto, Daniel; Evans, Katherine
2018-03-01
Improving the understanding of subsurface systems and thus reducing prediction uncertainty requires collection of data. As the collection of subsurface data is costly, it is important that the data collection scheme is cost-effective. Design of a cost-effective data collection scheme, i.e., data-worth analysis, requires quantifying model parameter, prediction, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface hydrological model simulations using standard Monte Carlo (MC) sampling or surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose an efficient Bayesian data-worth analysis using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce computational costs using multifidelity approximations. Since the Bayesian data-worth analysis involves a great deal of expectation estimation, the cost saving of the MLMC in the assessment can be outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it for a highly heterogeneous two-phase subsurface flow simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the standard MC estimation. But compared to the standard MC, the MLMC greatly reduces the computational costs.
Bayesian dynamic modeling of time series of dengue disease case counts.
Martínez-Bello, Daniel Adyro; López-Quílez, Antonio; Torres-Prieto, Alexander
2017-07-01
The aim of this study is to model the association between weekly time series of dengue case counts and meteorological variables, in a high-incidence city of Colombia, applying Bayesian hierarchical dynamic generalized linear models over the period January 2008 to August 2015. Additionally, we evaluate the model's short-term performance for predicting dengue cases. The methodology shows dynamic Poisson log link models including constant or time-varying coefficients for the meteorological variables. Calendar effects were modeled using constant or first- or second-order random walk time-varying coefficients. The meteorological variables were modeled using constant coefficients and first-order random walk time-varying coefficients. We applied Markov Chain Monte Carlo simulations for parameter estimation, and deviance information criterion statistic (DIC) for model selection. We assessed the short-term predictive performance of the selected final model, at several time points within the study period using the mean absolute percentage error. The results showed the best model including first-order random walk time-varying coefficients for calendar trend and first-order random walk time-varying coefficients for the meteorological variables. Besides the computational challenges, interpreting the results implies a complete analysis of the time series of dengue with respect to the parameter estimates of the meteorological effects. We found small values of the mean absolute percentage errors at one or two weeks out-of-sample predictions for most prediction points, associated with low volatility periods in the dengue counts. We discuss the advantages and limitations of the dynamic Poisson models for studying the association between time series of dengue disease and meteorological variables. The key conclusion of the study is that dynamic Poisson models account for the dynamic nature of the variables involved in the modeling of time series of dengue disease, producing useful models for decision-making in public health.
Smith, Wade P; Doctor, Jason; Meyer, Jürgen; Kalet, Ira J; Phillips, Mark H
2009-06-01
The prognosis of cancer patients treated with intensity-modulated radiation-therapy (IMRT) is inherently uncertain, depends on many decision variables, and requires that a physician balance competing objectives: maximum tumor control with minimal treatment complications. In order to better deal with the complex and multiple objective nature of the problem we have combined a prognostic probabilistic model with multi-attribute decision theory which incorporates patient preferences for outcomes. The response to IMRT for prostate cancer was modeled. A Bayesian network was used for prognosis for each treatment plan. Prognoses included predicting local tumor control, regional spread, distant metastases, and normal tissue complications resulting from treatment. A Markov model was constructed and used to calculate a quality-adjusted life-expectancy which aids in the multi-attribute decision process. Our method makes explicit the tradeoffs patients face between quality and quantity of life. This approach has advantages over current approaches because with our approach risks of health outcomes and patient preferences determine treatment decisions.
Maximum likelihood-based analysis of single-molecule photon arrival trajectories
NASA Astrophysics Data System (ADS)
Hajdziona, Marta; Molski, Andrzej
2011-02-01
In this work we explore the statistical properties of the maximum likelihood-based analysis of one-color photon arrival trajectories. This approach does not involve binning and, therefore, all of the information contained in an observed photon strajectory is used. We study the accuracy and precision of parameter estimates and the efficiency of the Akaike information criterion and the Bayesian information criterion (BIC) in selecting the true kinetic model. We focus on the low excitation regime where photon trajectories can be modeled as realizations of Markov modulated Poisson processes. The number of observed photons is the key parameter in determining model selection and parameter estimation. For example, the BIC can select the true three-state model from competing two-, three-, and four-state kinetic models even for relatively short trajectories made up of 2 × 103 photons. When the intensity levels are well-separated and 104 photons are observed, the two-state model parameters can be estimated with about 10% precision and those for a three-state model with about 20% precision.
Using Bayesian Networks to Improve Knowledge Assessment
ERIC Educational Resources Information Center
Millan, Eva; Descalco, Luis; Castillo, Gladys; Oliveira, Paula; Diogo, Sandra
2013-01-01
In this paper, we describe the integration and evaluation of an existing generic Bayesian student model (GBSM) into an existing computerized testing system within the Mathematics Education Project (PmatE--Projecto Matematica Ensino) of the University of Aveiro. This generic Bayesian student model had been previously evaluated with simulated…
Bayesian Posterior Odds Ratios: Statistical Tools for Collaborative Evaluations
ERIC Educational Resources Information Center
Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon
2018-01-01
To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
BCM: toolkit for Bayesian analysis of Computational Models using samplers.
Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A
2016-10-21
Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Bayesian inference based on stationary Fokker-Planck sampling.
Berrones, Arturo
2010-06-01
A novel formalism for bayesian learning in the context of complex inference models is proposed. The method is based on the use of the stationary Fokker-Planck (SFP) approach to sample from the posterior density. Stationary Fokker-Planck sampling generalizes the Gibbs sampler algorithm for arbitrary and unknown conditional densities. By the SFP procedure, approximate analytical expressions for the conditionals and marginals of the posterior can be constructed. At each stage of SFP, the approximate conditionals are used to define a Gibbs sampling process, which is convergent to the full joint posterior. By the analytical marginals efficient learning methods in the context of artificial neural networks are outlined. Offline and incremental bayesian inference and maximum likelihood estimation from the posterior are performed in classification and regression examples. A comparison of SFP with other Monte Carlo strategies in the general problem of sampling from arbitrary densities is also presented. It is shown that SFP is able to jump large low-probability regions without the need of a careful tuning of any step-size parameter. In fact, the SFP method requires only a small set of meaningful parameters that can be selected following clear, problem-independent guidelines. The computation cost of SFP, measured in terms of loss function evaluations, grows linearly with the given model's dimension.
Two Approaches to Calibration in Metrology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark
2014-04-01
Inferring mathematical relationships with quantified uncertainty from measurement data is common to computational science and metrology. Sufficient knowledge of measurement process noise enables Bayesian inference. Otherwise, an alternative approach is required, here termed compartmentalized inference, because collection of uncertain data and model inference occur independently. Bayesian parameterized model inference is compared to a Bayesian-compatible compartmentalized approach for ISO-GUM compliant calibration problems in renewable energy metrology. In either approach, model evidence can help reduce model discrepancy.
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Environmental Monitoring for Situation Assessment using Mobile and Fixed Sensors
NASA Technical Reports Server (NTRS)
Fikes, Richard
2004-01-01
This project was co-led by Dr. Sheila McIlraith and Prof. Richard Fikes. Substantial research results and published papers describing those results were produced in multiple technology areas, including the following: 1) Monitoring a Complex Physical System using a Hybrid Dynamic Bayes Net; 2) A Formal Theory of Testing for Dynamical Systems; 3) Diagnosing Hybrid Systems Using a Bayesian Model Selection Approach.
NASA Astrophysics Data System (ADS)
Wöhling, T.; Schöniger, A.; Geiges, A.; Nowak, W.; Gayler, S.
2013-12-01
The objective selection of appropriate models for realistic simulations of coupled soil-plant processes is a challenging task since the processes are complex, not fully understood at larger scales, and highly non-linear. Also, comprehensive data sets are scarce, and measurements are uncertain. In the past decades, a variety of different models have been developed that exhibit a wide range of complexity regarding their approximation of processes in the coupled model compartments. We present a method for evaluating experimental design for maximum confidence in the model selection task. The method considers uncertainty in parameters, measurements and model structures. Advancing the ideas behind Bayesian Model Averaging (BMA), we analyze the changes in posterior model weights and posterior model choice uncertainty when more data are made available. This allows assessing the power of different data types, data densities and data locations in identifying the best model structure from among a suite of plausible models. The models considered in this study are the crop models CERES, SUCROS, GECROS and SPASS, which are coupled to identical routines for simulating soil processes within the modelling framework Expert-N. The four models considerably differ in the degree of detail at which crop growth and root water uptake are represented. Monte-Carlo simulations were conducted for each of these models considering their uncertainty in soil hydraulic properties and selected crop model parameters. Using a Bootstrap Filter (BF), the models were then conditioned on field measurements of soil moisture, matric potential, leaf-area index, and evapotranspiration rates (from eddy-covariance measurements) during a vegetation period of winter wheat at a field site at the Swabian Alb in Southwestern Germany. Following our new method, we derived model weights when using all data or different subsets thereof. We discuss to which degree the posterior mean outperforms the prior mean and all individual posterior models, how informative the data types were for reducing prediction uncertainty of evapotranspiration and deep drainage, and how well the model structure can be identified based on the different data types and subsets. We further analyze the impact of measurement uncertainty und systematic model errors on the effective sample size of the BF and the resulting model weights.
Tau-REx: A new look at the retrieval of exoplanetary atmospheres
NASA Astrophysics Data System (ADS)
Waldmann, Ingo
2014-11-01
The field of exoplanetary spectroscopy is as fast moving as it is new. With an increasing amount of space and ground based instruments obtaining data on a large set of extrasolar planets we are indeed entering the era of exoplanetary characterisation. Permanently at the edge of instrument feasibility, it is as important as it is difficult to find the most optimal and objective methodologies to analysing and interpreting current data. This is particularly true for smaller and fainter Earth and Super-Earth type planets.For low to mid signal to noise (SNR) observations, we are prone to two sources of biases: 1) Prior selection in the data reduction and analysis; 2) Prior constraints on the spectral retrieval. In Waldmann et al. (2013), Morello et al. (2014) and Waldmann (2012, 2014) we have shown a prior-free approach to data analysis based on non-parametric machine learning techniques. Following these approaches we will present a new take on the spectral retrieval of extrasolar planets. Tau-REx (tau-retrieval of exoplanets) is a new line-by-line, atmospheric retrieval framework. In the past the decision on what opacity sources go into an atmospheric model were usually user defined. Manual input can lead to model biases and poor convergence of the atmospheric model to the data. In Tau-REx we have set out to solve this. Through custom built pattern recognition software, Tau-REx is able to rapidly identify the most likely atmospheric opacities from a large number of possible absorbers/emitters (ExoMol or HiTran data bases) and non-parametrically constrain the prior space for the Bayesian retrieval. Unlike other (MCMC based) techniques, Tau-REx is able to fully integrate high-dimensional log-likelihood spaces and to calculate the full Bayesian Evidence of the atmospheric models. We achieve this through a combination of Nested Sampling and a high degree of code parallelisation. This allows for an exact and unbiased Bayesian model selection and a fully mapping of potential model-data degeneracies. Together with non-parametric data de-trending of exoplanetary spectra, we can reach an un- precedented level of objectivity in our atmospheric characterisation of these foreign worlds.
NASA Astrophysics Data System (ADS)
Smith, J. P.; Owens, P. N.; Gaspar, L.; Lobb, D. A.; Petticrew, E. L.
2015-12-01
An understanding of sediment redistribution processes and the main sediment sources within a watershed is needed to support watershed management strategies. The fingerprinting technique is increasingly being recognized as a method for establishing the source of the sediment transported within watersheds. However, the different behaviour of the various fingerprinting properties has been recognized as a major limitation of the technique, and the uncertainty associated with tracer selection needs to be addressed. There are also questions associated with which modelling approach (frequentist or Bayesian) is the best to unmix complex environmental mixtures, such as river sediment. This study aims to compare and evaluate the differences between fingerprinting predictions provided by a Bayesian unmixing model (MixSIAR) using different groups of tracer properties for use in sediment source identification. We used fallout radionuclides (e.g. 137Cs) and geochemical elements (e.g. As) as conventional fingerprinting properties, and colour parameters as emerging properties; both alone and in combination. These fingerprinting properties are being used (i.e. Koiter et al., 2013; Barthod et al., 2015) to determine the proportional contributions of fine sediment in the South Tobacco Creek Watershed, an agricultural watershed located in Manitoba, Canada. We show that the unmixing model using a combination of fallout radionuclides and geochemical tracers gave similar results to the model based on colour parameters. Furthermore, we show that a model that combines all tracers (i.e. radionuclide/geochemical and colour) gave similar results, showing that sediment sources change from predominantly topsoil in the upper reaches of the watershed to channel bank and bedrock outcrop material in the lower reaches. Barthod LRM et al. (2015). Selecting color-based tracers and classifying sediment sources in the assessment of sediment dynamics using sediment source fingerprinting. J Environ Qual. Doi:10.2134/jeq2015.01.0043 Koiter AJ et al. (2013). Investigating the role of connectivity and scale in assessing the sources of sediment in an agricultural watershed in the Canadian prairies using sediment source fingerprinting. J Soils Sediments, 13, 1676-1691.
Spatiotemporal multivariate mixture models for Bayesian model selection in disease mapping.
Lawson, A B; Carroll, R; Faes, C; Kirby, R S; Aregay, M; Watjou, K
2017-12-01
It is often the case that researchers wish to simultaneously explore the behavior of and estimate overall risk for multiple, related diseases with varying rarity while accounting for potential spatial and/or temporal correlation. In this paper, we propose a flexible class of multivariate spatio-temporal mixture models to fill this role. Further, these models offer flexibility with the potential for model selection as well as the ability to accommodate lifestyle, socio-economic, and physical environmental variables with spatial, temporal, or both structures. Here, we explore the capability of this approach via a large scale simulation study and examine a motivating data example involving three cancers in South Carolina. The results which are focused on four model variants suggest that all models possess the ability to recover simulation ground truth and display improved model fit over two baseline Knorr-Held spatio-temporal interaction model variants in a real data application.
NASA Astrophysics Data System (ADS)
Lowe, Rachel; Bailey, Trevor C.; Stephenson, David B.; Graham, Richard J.; Coelho, Caio A. S.; Sá Carvalho, Marilia; Barcellos, Christovam
2011-03-01
This paper considers the potential for using seasonal climate forecasts in developing an early warning system for dengue fever epidemics in Brazil. In the first instance, a generalised linear model (GLM) is used to select climate and other covariates which are both readily available and prove significant in prediction of confirmed monthly dengue cases based on data collected across the whole of Brazil for the period January 2001 to December 2008 at the microregion level (typically consisting of one large city and several smaller municipalities). The covariates explored include temperature and precipitation data on a 2.5°×2.5° longitude-latitude grid with time lags relevant to dengue transmission, an El Niño Southern Oscillation index and other relevant socio-economic and environmental variables. A negative binomial model formulation is adopted in this model selection to allow for extra-Poisson variation (overdispersion) in the observed dengue counts caused by unknown/unobserved confounding factors and possible correlations in these effects in both time and space. Subsequently, the selected global model is refined in the context of the South East region of Brazil, where dengue predominates, by reverting to a Poisson framework and explicitly modelling the overdispersion through a combination of unstructured and spatio-temporal structured random effects. The resulting spatio-temporal hierarchical model (or GLMM—generalised linear mixed model) is implemented via a Bayesian framework using Markov Chain Monte Carlo (MCMC). Dengue predictions are found to be enhanced both spatially and temporally when using the GLMM and the Bayesian framework allows posterior predictive distributions for dengue cases to be derived, which can be useful for developing a dengue alert system. Using this model, we conclude that seasonal climate forecasts could have potential value in helping to predict dengue incidence months in advance of an epidemic in South East Brazil.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Dynamic Bayesian network modeling for longitudinal brain morphometry
Chen, Rong; Resnick, Susan M; Davatzikos, Christos; Herskovits, Edward H
2011-01-01
Identifying interactions among brain regions from structural magnetic-resonance images presents one of the major challenges in computational neuroanatomy. We propose a Bayesian data-mining approach to the detection of longitudinal morphological changes in the human brain. Our method uses a dynamic Bayesian network to represent evolving inter-regional dependencies. The major advantage of dynamic Bayesian network modeling is that it can represent complicated interactions among temporal processes. We validated our approach by analyzing a simulated atrophy study, and found that this approach requires only a small number of samples to detect the ground-truth temporal model. We further applied dynamic Bayesian network modeling to a longitudinal study of normal aging and mild cognitive impairment — the Baltimore Longitudinal Study of Aging. We found that interactions among regional volume-change rates for the mild cognitive impairment group are different from those for the normal-aging group. PMID:21963916
Variational learning and bits-back coding: an information-theoretic view to Bayesian learning.
Honkela, Antti; Valpola, Harri
2004-07-01
The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.
Bayesian Model Selection in Geophysics: The evidence
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2016-12-01
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Per Bayes theorem, the posterior probability, , P(H|D), of a hypothesis, H, given the data D, is equivalent to the product of its prior probability, P(H), and likelihood, L(H|D), divided by a normalization constant, P(D). In geophysics, the hypothesis, H, often constitutes a description (parameterization) of the subsurface for some entity of interest (e.g. porosity, moisture content). The normalization constant, P(D), is not required for inference of the subsurface structure, yet of great value for model selection. Unfortunately, it is not particularly easy to estimate P(D) in practice. Here, I will introduce the various building blocks of a general purpose method which provides robust and unbiased estimates of the evidence, P(D). This method uses multi-dimensional numerical integration of the posterior (parameter) distribution. I will then illustrate this new estimator by application to three competing subsurface models (hypothesis) using GPR travel time data from the South Oyster Bacterial Transport Site, in Virginia, USA. The three subsurface models differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. The results of the new estimator are compared against the brute force Monte Carlo method, and the Laplace-Metropolis method.
Adrian, Molly; Kiff, Cara; Glazner, Chris; Kohen, Ruth; Tracy, Julia Helen; Zhou, Chuan; McCauley, Elizabeth; Stoep, Ann Vander
2015-01-01
Objective The objective of this study was to apply a Bayesian statistical analytic approach that minimizes multiple testing problems to explore the combined effects of chronic low familial support and variants in 12 candidate genes on risk for a common and debilitating childhood mental health condition. Method Bayesian mixture modeling was used to examine gene by environment interactions among genetic variants and environmental factors (family support) associated in previous studies with the occurrence of comorbid depression and disruptive behavior disorders youth, using a sample of 255 children. Results One main effects, variants in the oxytocin receptor (OXTR, rs53576) was associated with increased risk for comorbid disorders. Two significant gene x environment and one signification gene x gene interaction emerged. Variants in the nicotinic acetylcholine receptor α5 subunit (CHRNA5, rs16969968) and in the glucocorticoid receptor chaperone protein FK506 binding protein 5 (FKBP5, rs4713902) interacted with chronic low family support in association with child mental health status. One gene x gene interaction, 5-HTTLPR variant of the serotonin transporter (SERT/SLC6A4) in combination with μ opioid receptor (OPRM1, rs1799971) was associated with comorbid depression and conduct problems. Conclusions Results indicate that Bayesian modeling is a feasible strategy for conducting behavioral genetics research. This approach, combined with an optimized genetic selection strategy (Vrieze, Iacono, & McGue, 2012), revealed genetic variants involved in stress regulation ( FKBP5, SERTxOPMR), social bonding (OXTR), and nicotine responsivity (CHRNA5) in predicting comorbid status. PMID:26228411
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Yu; Hou, Zhangshuan; Huang, Maoyi
2013-12-10
This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Two inversion strategies, the deterministic least-square fitting and stochastic Markov-Chain Monte-Carlo (MCMC) - Bayesian inversion approaches, are evaluated by applying them to CLM4 at selected sites. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find thatmore » using model parameters calibrated by the least-square fitting provides little improvements in the model simulations but the sampling-based stochastic inversion approaches are consistent - as more information comes in, the predictive intervals of the calibrated parameters become narrower and the misfits between the calculated and observed responses decrease. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to the different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.« less
NASA Astrophysics Data System (ADS)
Sun, Y.; Hou, Z.; Huang, M.; Tian, F.; Leung, L. Ruby
2013-12-01
This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Both deterministic least-square fitting and stochastic Markov-chain Monte Carlo (MCMC)-Bayesian inversion approaches are evaluated by applying them to CLM4 at selected sites with different climate and soil conditions. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find that using model parameters calibrated by the sampling-based stochastic inversion approaches provides significant improvements in the model simulations compared to using default CLM4 parameter values, and that as more information comes in, the predictive intervals (ranges of posterior distributions) of the calibrated parameters become narrower. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.
NASA Astrophysics Data System (ADS)
Licquia, Timothy C.; Newman, Jeffrey A.
2016-11-01
The exponential scale length (L d ) of the Milky Way’s (MW’s) disk is a critical parameter for describing the global physical size of our Galaxy, important both for interpreting other Galactic measurements and helping us to understand how our Galaxy fits into extragalactic contexts. Unfortunately, current estimates span a wide range of values and are often statistically incompatible with one another. Here, we perform a Bayesian meta-analysis to determine an improved, aggregate estimate for L d , utilizing a mixture-model approach to account for the possibility that any one measurement has not properly accounted for all statistical or systematic errors. Within this machinery, we explore a variety of ways of modeling the nature of problematic measurements, and then employ a Bayesian model averaging technique to derive net posterior distributions that incorporate any model-selection uncertainty. Our meta-analysis combines 29 different (15 visible and 14 infrared) photometric measurements of L d available in the literature; these involve a broad assortment of observational data sets, MW models and assumptions, and methodologies, all tabulated herein. Analyzing the visible and infrared measurements separately yields estimates for L d of {2.71}-0.20+0.22 kpc and {2.51}-0.13+0.15 kpc, respectively, whereas considering them all combined yields 2.64 ± 0.13 kpc. The ratio between the visible and infrared scale lengths determined here is very similar to that measured in external spiral galaxies. We use these results to update the model of the Galactic disk from our previous work, constraining its stellar mass to be {4.8}-1.1+1.5× {10}10 M ⊙, and the MW’s total stellar mass to be {5.7}-1.1+1.5× {10}10 M ⊙.
Craig, Marlies H; Sharp, Brian L; Mabaso, Musawenkosi LH; Kleinschmidt, Immo
2007-01-01
Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have produced a highly plausible and parsimonious model of historical malaria risk for Botswana from point-referenced data from a 1961/2 prevalence survey of malaria infection in 1–14 year old children. After starting with a list of 50 potential variables we ended with three highly plausible predictors, by applying a systematic and repeatable staged variable selection procedure that included a spatial analysis, which has application for other environmentally determined infectious diseases. All this was accomplished using general-purpose statistical software. PMID:17892584
a Bayesian Synthesis of Predictions from Different Models for Setting Water Quality Criteria
NASA Astrophysics Data System (ADS)
Arhonditsis, G. B.; Ecological Modelling Laboratory
2011-12-01
Skeptical views of the scientific value of modelling argue that there is no true model of an ecological system, but rather several adequate descriptions of different conceptual basis and structure. In this regard, rather than picking the single "best-fit" model to predict future system responses, we can use Bayesian model averaging to synthesize the forecasts from different models. Hence, by acknowledging that models from different areas of the complexity spectrum have different strengths and weaknesses, the Bayesian model averaging is an appealing approach to improve the predictive capacity and to overcome the ambiguity surrounding the model selection or the risk of basing ecological forecasts on a single model. Our study addresses this question using a complex ecological model, developed by Ramin et al. (2011; Environ Modell Softw 26, 337-353) to guide the water quality criteria setting process in the Hamilton Harbour (Ontario, Canada), along with a simpler plankton model that considers the interplay among phosphate, detritus, and generic phytoplankton and zooplankton state variables. This simple approach is more easily subjected to detailed sensitivity analysis and also has the advantage of fewer unconstrained parameters. Using Markov Chain Monte Carlo simulations, we calculate the relative mean standard error to assess the posterior support of the two models from the existing data. Predictions from the two models are then combined using the respective standard error estimates as weights in a weighted model average. The model averaging approach is used to examine the robustness of predictive statements made from our earlier work regarding the response of Hamilton Harbour to the different nutrient loading reduction strategies. The two eutrophication models are then used in conjunction with the SPAtially Referenced Regressions On Watershed attributes (SPARROW) watershed model. The Bayesian nature of our work is used: (i) to alleviate problems of spatiotemporal resolution mismatch between watershed and receiving waterbody models; and (ii) to overcome the conceptual or scale misalignment between processes of interest and supporting information. The proposed Bayesian approach provides an effective means of empirically estimating the relation between in-stream measurements of nutrient fluxes and the sources/sinks of nutrients within the watershed, while explicitly accounting for the uncertainty associated with the existing knowledge from the system along with the different types of spatial correlation typically underlying the parameter estimation of watershed models. Our modelling exercise offers the first estimates of the export coefficients and the delivery rates from the different subcatchments and thus generates testable hypotheses regarding the nutrient export "hot spots" in the studied watershed. Finally, we conduct modeling experiments that evaluate the potential improvement of the model parameter estimates and the decrease of the predictive uncertainty, if the uncertainty associated with the contemporary nutrient loading estimates is reduced. The lessons learned from this study will contribute towards the development of integrated modelling frameworks.
Classifying emotion in Twitter using Bayesian network
NASA Astrophysics Data System (ADS)
Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya
2018-03-01
Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.
Maximizing the information learned from finite data selects a simple model
NASA Astrophysics Data System (ADS)
Mattingly, Henry H.; Transtrum, Mark K.; Abbott, Michael C.; Machta, Benjamin B.
2018-02-01
We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of the parameter space. Thus, it selects a lower-dimensional effective theory in a principled way, ignoring irrelevant parameter directions. In the limit where there are sufficient data to tightly constrain any number of parameters, this reduces to the Jeffreys prior. However, we argue that this limit is pathological when applied to the hyperribbon parameter manifolds generic in science, because it leads to dramatic dependence on effects invisible to experiment.
Bayesian Evaluation of Dynamical Soil Carbon Models Using Soil Carbon Flux Data
NASA Astrophysics Data System (ADS)
Xie, H. W.; Romero-Olivares, A.; Guindani, M.; Allison, S. D.
2017-12-01
2016 was Earth's hottest year in the modern temperature record and the third consecutive record-breaking year. As the planet continues to warm, temperature-induced changes in respiration rates of soil microbes could reduce the amount of carbon sequestered in the soil organic carbon (SOC) pool, one of the largest terrestrial stores of carbon. This would accelerate temperature increases. In order to predict the future size of the SOC pool, mathematical soil carbon models (SCMs) describing interactions between the biosphere and atmosphere are needed. SCMs must be validated before they can be chosen for predictive use. In this study, we check two SCMs called CON and AWB for consistency with observed data using Bayesian goodness of fit testing that can be used in the future to compare other models. We compare the fit of the models to longitudinal soil respiration data from a meta-analysis of soil heating experiments using a family of Bayesian goodness of fit metrics called information criteria (IC), including the Widely Applicable Information Criterion (WAIC), the Leave-One-Out Information Criterion (LOOIC), and the Log Pseudo Marginal Likelihood (LPML). These IC's take the entire posterior distribution into account, rather than just one outputted model fit line. A lower WAIC and LOOIC and larger LPML indicate a better fit. We compare AWB and CON with fixed steady state model pool sizes. At equivalent SOC, dissolved organic carbon, and microbial pool sizes, CON always outperforms AWB quantitatively by all three IC's used. AWB monotonically improves in fit as we reduce the SOC steady state pool size while fixing all other pool sizes, and the same is almost true for CON. The AWB model with the lowest SOC is the best performing AWB model, while the CON model with the second lowest SOC is the best performing model. We observe that AWB displays more changes in slope sign and qualitatively displays more adaptive dynamics, which prevents AWB from being fully ruled out for predictive use, but based on IC's, CON is clearly the superior model for fitting the data. Hence, we demonstrate that Bayesian goodness of fit testing with information criteria helps us rigorously determine the consistency of models with data. Models that demonstrate their consistency to multiple data sets with our approach can then be selected for further refinement.
NASA Astrophysics Data System (ADS)
Krishnanathan, Kirubhakaran; Anderson, Sean R.; Billings, Stephen A.; Kadirkamanathan, Visakan
2016-11-01
In this paper, we derive a system identification framework for continuous-time nonlinear systems, for the first time using a simulation-focused computational Bayesian approach. Simulation approaches to nonlinear system identification have been shown to outperform regression methods under certain conditions, such as non-persistently exciting inputs and fast-sampling. We use the approximate Bayesian computation (ABC) algorithm to perform simulation-based inference of model parameters. The framework has the following main advantages: (1) parameter distributions are intrinsically generated, giving the user a clear description of uncertainty, (2) the simulation approach avoids the difficult problem of estimating signal derivatives as is common with other continuous-time methods, and (3) as noted above, the simulation approach improves identification under conditions of non-persistently exciting inputs and fast-sampling. Term selection is performed by judging parameter significance using parameter distributions that are intrinsically generated as part of the ABC procedure. The results from a numerical example demonstrate that the method performs well in noisy scenarios, especially in comparison to competing techniques that rely on signal derivative estimation.
Killiches, Matthias; Czado, Claudia
2018-03-22
We propose a model for unbalanced longitudinal data, where the univariate margins can be selected arbitrarily and the dependence structure is described with the help of a D-vine copula. We show that our approach is an extremely flexible extension of the widely used linear mixed model if the correlation is homogeneous over the considered individuals. As an alternative to joint maximum-likelihood a sequential estimation approach for the D-vine copula is provided and validated in a simulation study. The model can handle missing values without being forced to discard data. Since conditional distributions are known analytically, we easily make predictions for future events. For model selection, we adjust the Bayesian information criterion to our situation. In an application to heart surgery data our model performs clearly better than competing linear mixed models. © 2018, The International Biometric Society.
Bayesian quantitative precipitation forecasts in terms of quantiles
NASA Astrophysics Data System (ADS)
Bentzien, Sabrina; Friederichs, Petra
2014-05-01
Ensemble prediction systems (EPS) for numerical weather predictions on the mesoscale are particularly developed to obtain probabilistic guidance for high impact weather. An EPS not only issues a deterministic future state of the atmosphere but a sample of possible future states. Ensemble postprocessing then translates such a sample of forecasts into probabilistic measures. This study focus on probabilistic quantitative precipitation forecasts in terms of quantiles. Quantiles are particular suitable to describe precipitation at various locations, since no assumption is required on the distribution of precipitation. The focus is on the prediction during high-impact events and related to the Volkswagen Stiftung funded project WEX-MOP (Mesoscale Weather Extremes - Theory, Spatial Modeling and Prediction). Quantile forecasts are derived from the raw ensemble and via quantile regression. Neighborhood method and time-lagging are effective tools to inexpensively increase the ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Since an EPS provides a large amount of potentially informative predictors, a variable selection is required in order to obtain a stable statistical model. A Bayesian formulation of quantile regression allows for inference about the selection of predictive covariates by the use of appropriate prior distributions. Moreover, the implementation of an additional process layer for the regression parameters accounts for spatial variations of the parameters. Bayesian quantile regression and its spatially adaptive extension is illustrated for the German-focused mesoscale weather prediction ensemble COSMO-DE-EPS, which runs (pre)operationally since December 2010 at the German Meteorological Service (DWD). Objective out-of-sample verification uses the quantile score (QS), a weighted absolute error between quantile forecasts and observations. The QS is a proper scoring function and can be decomposed into reliability, resolutions and uncertainty parts. A quantile reliability plot gives detailed insights in the predictive performance of the quantile forecasts.
Essays on inference in economics, competition, and the rate of profit
NASA Astrophysics Data System (ADS)
Scharfenaker, Ellis S.
This dissertation is comprised of three papers that demonstrate the role of Bayesian methods of inference and Shannon's information theory in classical political economy. The first chapter explores the empirical distribution of profit rate data from North American firms from 1962-2012. This chapter address the fact that existing methods for sample selection from noisy profit rate data in the industrial organization field of economics tends to be conditional on a covariate's value that risks discarding information. Conditioning sample selection instead on the profit rate data's structure by means of a two component (signal and noise) Bayesian mixture model we find the the profit rate sample to be time stationary Laplace distributed, corroborating earlier estimates of cross section distributions. The second chapter compares alternative probabilistic approaches to discrete (quantal) choice analysis and examines the various ways in which they overlap. In particular, the work on individual choice behavior by Duncan Luce and the extension of this work to quantal response problems by game theoreticians is shown to be related both to the rational inattention work of Christopher Sims through Shannon's information theory as well as to the maximum entropy principle of inference proposed physicist Edwin T. Jaynes. In the third chapter I propose a model of ``classically" competitive firms facing informational entropy constraints in their decisions to potentially enter or exit markets based on profit rate differentials. The result is a three parameter logit quantal response distribution for firm entry and exit decisions. Bayesian methods are used for inference into the the distribution of entry and exit decisions conditional on profit rate deviations and firm level data from Compustat is used to test these predictions.
Model selection for the North American Breeding Bird Survey: A comparison of methods
Link, William; Sauer, John; Niven, Daniel
2017-01-01
The North American Breeding Bird Survey (BBS) provides data for >420 bird species at multiple geographic scales over 5 decades. Modern computational methods have facilitated the fitting of complex hierarchical models to these data. It is easy to propose and fit new models, but little attention has been given to model selection. Here, we discuss and illustrate model selection using leave-one-out cross validation, and the Bayesian Predictive Information Criterion (BPIC). Cross-validation is enormously computationally intensive; we thus evaluate the performance of the Watanabe-Akaike Information Criterion (WAIC) as a computationally efficient approximation to the BPIC. Our evaluation is based on analyses of 4 models as applied to 20 species covered by the BBS. Model selection based on BPIC provided no strong evidence of one model being consistently superior to the others; for 14/20 species, none of the models emerged as superior. For the remaining 6 species, a first-difference model of population trajectory was always among the best fitting. Our results show that WAIC is not reliable as a surrogate for BPIC. Development of appropriate model sets and their evaluation using BPIC is an important innovation for the analysis of BBS data.
NASA Astrophysics Data System (ADS)
Plant, N. G.; Thieler, E. R.; Gutierrez, B.; Lentz, E. E.; Zeigler, S. L.; Van Dongeren, A.; Fienen, M. N.
2016-12-01
We evaluate the strengths and weaknesses of Bayesian networks that have been used to address scientific and decision-support questions related to coastal geomorphology. We will provide an overview of coastal geomorphology research that has used Bayesian networks and describe what this approach can do and when it works (or fails to work). Over the past decade, Bayesian networks have been formulated to analyze the multi-variate structure and evolution of coastal morphology and associated human and ecological impacts. The approach relates observable system variables to each other by estimating discrete correlations. The resulting Bayesian-networks make predictions that propagate errors, conduct inference via Bayes rule, or both. In scientific applications, the model results are useful for hypothesis testing, using confidence estimates to gage the strength of tests while applications to coastal resource management are aimed at decision-support, where the probabilities of desired ecosystems outcomes are evaluated. The range of Bayesian-network applications to coastal morphology includes emulation of high-resolution wave transformation models to make oceanographic predictions, morphologic response to storms and/or sea-level rise, groundwater response to sea-level rise and morphologic variability, habitat suitability for endangered species, and assessment of monetary or human-life risk associated with storms. All of these examples are based on vast observational data sets, numerical model output, or both. We will discuss the progression of our experiments, which has included testing whether the Bayesian-network approach can be implemented and is appropriate for addressing basic and applied scientific problems and evaluating the hindcast and forecast skill of these implementations. We will present and discuss calibration/validation tests that are used to assess the robustness of Bayesian-network models and we will compare these results to tests of other models. This will demonstrate how Bayesian networks are used to extract new insights about coastal morphologic behavior, assess impacts to societal and ecological systems, and communicate probabilistic predictions to decision makers.
NASA Astrophysics Data System (ADS)
Ha, Taesung
A probabilistic risk assessment (PRA) was conducted for a loss of coolant accident, (LOCA) in the McMaster Nuclear Reactor (MNR). A level 1 PRA was completed including event sequence modeling, system modeling, and quantification. To support the quantification of the accident sequence identified, data analysis using the Bayesian method and human reliability analysis (HRA) using the accident sequence evaluation procedure (ASEP) approach were performed. Since human performance in research reactors is significantly different from that in power reactors, a time-oriented HRA model (reliability physics model) was applied for the human error probability (HEP) estimation of the core relocation. This model is based on two competing random variables: phenomenological time and performance time. The response surface and direct Monte Carlo simulation with Latin Hypercube sampling were applied for estimating the phenomenological time, whereas the performance time was obtained from interviews with operators. An appropriate probability distribution for the phenomenological time was assigned by statistical goodness-of-fit tests. The human error probability (HEP) for the core relocation was estimated from these two competing quantities: phenomenological time and operators' performance time. The sensitivity of each probability distribution in human reliability estimation was investigated. In order to quantify the uncertainty in the predicted HEPs, a Bayesian approach was selected due to its capability of incorporating uncertainties in model itself and the parameters in that model. The HEP from the current time-oriented model was compared with that from the ASEP approach. Both results were used to evaluate the sensitivity of alternative huinan reliability modeling for the manual core relocation in the LOCA risk model. This exercise demonstrated the applicability of a reliability physics model supplemented with a. Bayesian approach for modeling human reliability and its potential usefulness of quantifying model uncertainty as sensitivity analysis in the PRA model.
A comment on priors for Bayesian occupancy models.
Northrup, Joseph M; Gerber, Brian D
2018-01-01
Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are "uninformative" or "vague", such priors can easily be unintentionally highly informative. Here we report on how the specification of a "vague" normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts.
Detecting consistent patterns of directional adaptation using differential selection codon models.
Parto, Sahar; Lartillot, Nicolas
2017-06-23
Phylogenetic codon models are often used to characterize the selective regimes acting on protein-coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by modeling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions. Here, we introduce a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles. Our differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.
Bayesian estimation inherent in a Mexican-hat-type neural network
NASA Astrophysics Data System (ADS)
Takiyama, Ken
2016-05-01
Brain functions, such as perception, motor control and learning, and decision making, have been explained based on a Bayesian framework, i.e., to decrease the effects of noise inherent in the human nervous system or external environment, our brain integrates sensory and a priori information in a Bayesian optimal manner. However, it remains unclear how Bayesian computations are implemented in the brain. Herein, I address this issue by analyzing a Mexican-hat-type neural network, which was used as a model of the visual cortex, motor cortex, and prefrontal cortex. I analytically demonstrate that the dynamics of an order parameter in the model corresponds exactly to a variational inference of a linear Gaussian state-space model, a Bayesian estimation, when the strength of recurrent synaptic connectivity is appropriately stronger than that of an external stimulus, a plausible condition in the brain. This exact correspondence can reveal the relationship between the parameters in the Bayesian estimation and those in the neural network, providing insight for understanding brain functions.
Bayesian adaptive phase II screening design for combination trials
Cai, Chunyan; Yuan, Ying; Johnson, Valen E
2013-01-01
Background Trials of combination therapies for the treatment of cancer are playing an increasingly important role in the battle against this disease. To more efficiently handle the large number of combination therapies that must be tested, we propose a novel Bayesian phase II adaptive screening design to simultaneously select among possible treatment combinations involving multiple agents. Methods Our design is based on formulating the selection procedure as a Bayesian hypothesis testing problem in which the superiority of each treatment combination is equated to a single hypothesis. During the trial conduct, we use the current values of the posterior probabilities of all hypotheses to adaptively allocate patients to treatment combinations. Results Simulation studies show that the proposed design substantially outperforms the conventional multiarm balanced factorial trial design. The proposed design yields a significantly higher probability for selecting the best treatment while allocating substantially more patients to efficacious treatments. Limitations The proposed design is most appropriate for the trials combining multiple agents and screening out the efficacious combination to be further investigated. Conclusions The proposed Bayesian adaptive phase II screening design substantially outperformed the conventional complete factorial design. Our design allocates more patients to better treatments while providing higher power to identify the best treatment at the end of the trial. PMID:23359875
NASA Astrophysics Data System (ADS)
An, M.; Assumpcao, M.
2003-12-01
The joint inversion of receiver function and surface wave is an effective way to diminish the influences of the strong tradeoff among parameters and the different sensitivity to the model parameters in their respective inversions, but the inversion problem becomes more complex. Multi-objective problems can be much more complicated than single-objective inversion in the model selection and optimization. If objectives are involved and conflicting, models can be ordered only partially. In this case, Pareto-optimal preference should be used to select solutions. On the other hand, the inversion to get only a few optimal solutions can not deal properly with the strong tradeoff between parameters, the uncertainties in the observation, the geophysical complexities and even the incompetency of the inversion technique. The effective way is to retrieve the geophysical information statistically from many acceptable solutions, which requires more competent global algorithms. Competent genetic algorithms recently proposed are far superior to the conventional genetic algorithm and can solve hard problems quickly, reliably and accurately. In this work we used one of competent genetic algorithms, Bayesian Optimization Algorithm as the main inverse procedure. This algorithm uses Bayesian networks to draw out inherited information and can use Pareto-optimal preference in the inversion. With this algorithm, the lithospheric structure of Paran"› basin is inverted to fit both the observations of inter-station surface wave dispersion and receiver function.
VizieR Online Data Catalog: Bayesian method for detecting stellar flares (Pitkin+, 2014)
NASA Astrophysics Data System (ADS)
Pitkin, M.; Williams, D.; Fletcher, L.; Grant, S. D. T.
2015-05-01
We present a Bayesian-odds-ratio-based algorithm for detecting stellar flares in light-curve data. We assume flares are described by a model in which there is a rapid rise with a half-Gaussian profile, followed by an exponential decay. Our signal model also contains a polynomial background model required to fit underlying light-curve variations in the data, which could otherwise partially mimic a flare. We characterize the false alarm probability and efficiency of this method under the assumption that any unmodelled noise in the data is Gaussian, and compare it with a simpler thresholding method based on that used in Walkowicz et al. We find our method has a significant increase in detection efficiency for low signal-to-noise ratio (S/N) flares. For a conservative false alarm probability our method can detect 95 per cent of flares with S/N less than 20, as compared to S/N of 25 for the simpler method. We also test how well the assumption of Gaussian noise holds by applying the method to a selection of 'quiet' Kepler stars. As an example we have applied our method to a selection of stars in Kepler Quarter 1 data. The method finds 687 flaring stars with a total of 1873 flares after vetos have been applied. For these flares we have made preliminary characterizations of their durations and and S/N. (1 data file).
A Bayesian method for detecting stellar flares
NASA Astrophysics Data System (ADS)
Pitkin, M.; Williams, D.; Fletcher, L.; Grant, S. D. T.
2014-12-01
We present a Bayesian-odds-ratio-based algorithm for detecting stellar flares in light-curve data. We assume flares are described by a model in which there is a rapid rise with a half-Gaussian profile, followed by an exponential decay. Our signal model also contains a polynomial background model required to fit underlying light-curve variations in the data, which could otherwise partially mimic a flare. We characterize the false alarm probability and efficiency of this method under the assumption that any unmodelled noise in the data is Gaussian, and compare it with a simpler thresholding method based on that used in Walkowicz et al. We find our method has a significant increase in detection efficiency for low signal-to-noise ratio (S/N) flares. For a conservative false alarm probability our method can detect 95 per cent of flares with S/N less than 20, as compared to S/N of 25 for the simpler method. We also test how well the assumption of Gaussian noise holds by applying the method to a selection of `quiet' Kepler stars. As an example we have applied our method to a selection of stars in Kepler Quarter 1 data. The method finds 687 flaring stars with a total of 1873 flares after vetos have been applied. For these flares we have made preliminary characterizations of their durations and and S/N.
Hierarchical modeling of bycatch rates of sea turtles in the western North Atlantic
Gardner, B.; Sullivan, P.J.; Epperly, S.; Morreale, S.J.
2008-01-01
Previous studies indicate that the locations of the endangered loggerhead Caretta caretta and critically endangered leatherback Dermochelys coriacea sea turtles are influenced by water temperatures, and that incidental catch rates in the pelagic longline fishery vary by region. We present a Bayesian hierarchical model to examine the effects of environmental variables, including water temperature, on the number of sea turtles captured in the US pelagic longline fishery in the western North Atlantic. The modeling structure is highly flexible, utilizes a Bayesian model selection technique, and is fully implemented in the software program WinBUGS. The number of sea turtles captured is modeled as a zero-inflated Poisson distribution and the model incorporates fixed effects to examine region-specific differences in the parameter estimates. Results indicate that water temperature, region, bottom depth, and target species are all significant predictors of the number of loggerhead sea turtles captured. For leatherback sea turtles, the model with only target species had the most posterior model weight, though a re-parameterization of the model indicates that temperature influences the zero-inflation parameter. The relationship between the number of sea turtles captured and the variables of interest all varied by region. This suggests that management decisions aimed at reducing sea turtle bycatch may be more effective if they are spatially explicit. ?? Inter-Research 2008.
2016-10-01
and implementation of embedded, adaptive feedback and performance assessment. The investigators also initiated work designing a Bayesian Belief ...training; Teamwork; Adaptive performance; Leadership; Simulation; Modeling; Bayesian belief networks (BBN) 16. SECURITY CLASSIFICATION OF: 17. LIMITATION...Trauma teams Team training Teamwork Adaptability Adaptive performance Leadership Simulation Modeling Bayesian belief networks (BBN) 6
ERIC Educational Resources Information Center
West, Patti; Rutstein, Daisy Wise; Mislevy, Robert J.; Liu, Junhui; Choi, Younyoung; Levy, Roy; Crawford, Aaron; DiCerbo, Kristen E.; Chappel, Kristina; Behrens, John T.
2010-01-01
A major issue in the study of learning progressions (LPs) is linking student performance on assessment tasks to the progressions. This report describes the challenges faced in making this linkage using Bayesian networks to model LPs in the field of computer networking. The ideas are illustrated with exemplar Bayesian networks built on Cisco…
Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model
Ellefsen, Karl J.; Smith, David
2016-01-01
Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Development of dynamic Bayesian models for web application test management
NASA Astrophysics Data System (ADS)
Azarnova, T. V.; Polukhin, P. V.; Bondarenko, Yu V.; Kashirina, I. L.
2018-03-01
The mathematical apparatus of dynamic Bayesian networks is an effective and technically proven tool that can be used to model complex stochastic dynamic processes. According to the results of the research, mathematical models and methods of dynamic Bayesian networks provide a high coverage of stochastic tasks associated with error testing in multiuser software products operated in a dynamically changing environment. Formalized representation of the discrete test process as a dynamic Bayesian model allows us to organize the logical connection between individual test assets for multiple time slices. This approach gives an opportunity to present testing as a discrete process with set structural components responsible for the generation of test assets. Dynamic Bayesian network-based models allow us to combine in one management area individual units and testing components with different functionalities and a direct influence on each other in the process of comprehensive testing of various groups of computer bugs. The application of the proposed models provides an opportunity to use a consistent approach to formalize test principles and procedures, methods used to treat situational error signs, and methods used to produce analytical conclusions based on test results.
NASA Astrophysics Data System (ADS)
Kim, Seongryong; Tkalčić, Hrvoje; Mustać, Marija; Rhie, Junkee; Ford, Sean
2016-04-01
A framework is presented within which we provide rigorous estimations for seismic sources and structures in the Northeast Asia. We use Bayesian inversion methods, which enable statistical estimations of models and their uncertainties based on data information. Ambiguities in error statistics and model parameterizations are addressed by hierarchical and trans-dimensional (trans-D) techniques, which can be inherently implemented in the Bayesian inversions. Hence reliable estimation of model parameters and their uncertainties is possible, thus avoiding arbitrary regularizations and parameterizations. Hierarchical and trans-D inversions are performed to develop a three-dimensional velocity model using ambient noise data. To further improve the model, we perform joint inversions with receiver function data using a newly developed Bayesian method. For the source estimation, a novel moment tensor inversion method is presented and applied to regional waveform data of the North Korean nuclear explosion tests. By the combination of new Bayesian techniques and the structural model, coupled with meaningful uncertainties related to each of the processes, more quantitative monitoring and discrimination of seismic events is possible.
A fast Bayesian approach to discrete object detection in astronomical data sets - PowellSnakes I
NASA Astrophysics Data System (ADS)
Carvalho, Pedro; Rocha, Graça; Hobson, M. P.
2009-03-01
A new fast Bayesian approach is introduced for the detection of discrete objects immersed in a diffuse background. This new method, called PowellSnakes, speeds up traditional Bayesian techniques by (i) replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate; (ii) using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and (iii) deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question. This new approach provides a speed up in performance by a factor of `100' as compared to existing Bayesian source extraction methods that use Monte Carlo Markov chain to explore the parameter space, such as that presented by Hobson & McLachlan. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. We illustrate the capabilities of the method by applying to some simplified toy models. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors which cannot be said of the frequentist methods. We present here the first implementation of this technique (version I). Further improvements to this implementation are currently under investigation and will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.
Bayesian Models Leveraging Bioactivity and Cytotoxicity Information for Drug Discovery
Ekins, Sean; Reynolds, Robert C.; Kim, Hiyun; Koo, Mi-Sun; Ekonomidis, Marilyn; Talaue, Meliza; Paget, Steve D.; Woolhiser, Lisa K.; Lenaerts, Anne J.; Bunin, Barry A.; Connell, Nancy; Freundlich, Joel S.
2013-01-01
SUMMARY Identification of unique leads represents a significant challenge in drug discovery. This hurdle is magnified in neglected diseases such as tuberculosis. We have leveraged public high-throughput screening (HTS) data, to experimentally validate virtual screening approach employing Bayesian models built with bioactivity information (single-event model) as well as bioactivity and cytotoxicity information (dual-event model). We virtually screen a commercial library and experimentally confirm actives with hit rates exceeding typical HTS results by 1-2 orders of magnitude. The first dual-event Bayesian model identified compounds with antitubercular whole-cell activity and low mammalian cell cytotoxicity from a published set of antimalarials. The most potent hit exhibits the in vitro activity and in vitro/in vivo safety profile of a drug lead. These Bayesian models offer significant economies in time and cost to drug discovery. PMID:23521795
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2018-02-01
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
Conroy, M.J.; Runge, J.P.; Barker, R.J.; Schofield, M.R.; Fonnesbeck, C.J.
2008-01-01
Many organisms are patchily distributed, with some patches occupied at high density, others at lower densities, and others not occupied. Estimation of overall abundance can be difficult and is inefficient via intensive approaches such as capture-mark-recapture (CMR) or distance sampling. We propose a two-phase sampling scheme and model in a Bayesian framework to estimate abundance for patchily distributed populations. In the first phase, occupancy is estimated by binomial detection samples taken on all selected sites, where selection may be of all sites available, or a random sample of sites. Detection can be by visual surveys, detection of sign, physical captures, or other approach. At the second phase, if a detection threshold is achieved, CMR or other intensive sampling is conducted via standard procedures (grids or webs) to estimate abundance. Detection and CMR data are then used in a joint likelihood to model probability of detection in the occupancy sample via an abundance-detection model. CMR modeling is used to estimate abundance for the abundance-detection relationship, which in turn is used to predict abundance at the remaining sites, where only detection data are collected. We present a full Bayesian modeling treatment of this problem, in which posterior inference on abundance and other parameters (detection, capture probability) is obtained under a variety of assumptions about spatial and individual sources of heterogeneity. We apply the approach to abundance estimation for two species of voles (Microtus spp.) in Montana, USA. We also use a simulation study to evaluate the frequentist properties of our procedure given known patterns in abundance and detection among sites as well as design criteria. For most population characteristics and designs considered, bias and mean-square error (MSE) were low, and coverage of true parameter values by Bayesian credibility intervals was near nominal. Our two-phase, adaptive approach allows efficient estimation of abundance of rare and patchily distributed species and is particularly appropriate when sampling in all patches is impossible, but a global estimate of abundance is required.
Evaluation of calibration efficacy under different levels of uncertainty
Heo, Yeonsook; Graziano, Diane J.; Guzowski, Leah; ...
2014-06-10
This study examines how calibration performs under different levels of uncertainty in model input data. It specifically assesses the efficacy of Bayesian calibration to enhance the reliability of EnergyPlus model predictions. A Bayesian approach can be used to update uncertain values of parameters, given measured energy-use data, and to quantify the associated uncertainty.We assess the efficacy of Bayesian calibration under a controlled virtual-reality setup, which enables rigorous validation of the accuracy of calibration results in terms of both calibrated parameter values and model predictions. Case studies demonstrate the performance of Bayesian calibration of base models developed from audit data withmore » differing levels of detail in building design, usage, and operation.« less
Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
A Bayesian hierarchical diffusion model decomposition of performance in Approach–Avoidance Tasks
Krypotos, Angelos-Miltiadis; Beckers, Tom; Kindt, Merel; Wagenmakers, Eric-Jan
2015-01-01
Common methods for analysing response time (RT) tasks, frequently used across different disciplines of psychology, suffer from a number of limitations such as the failure to directly measure the underlying latent processes of interest and the inability to take into account the uncertainty associated with each individual's point estimate of performance. Here, we discuss a Bayesian hierarchical diffusion model and apply it to RT data. This model allows researchers to decompose performance into meaningful psychological processes and to account optimally for individual differences and commonalities, even with relatively sparse data. We highlight the advantages of the Bayesian hierarchical diffusion model decomposition by applying it to performance on Approach–Avoidance Tasks, widely used in the emotion and psychopathology literature. Model fits for two experimental data-sets demonstrate that the model performs well. The Bayesian hierarchical diffusion model overcomes important limitations of current analysis procedures and provides deeper insight in latent psychological processes of interest. PMID:25491372
Shen, Yanna; Cooper, Gregory F
2012-09-01
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Isotropy of low redshift type Ia supernovae: A Bayesian analysis
NASA Astrophysics Data System (ADS)
Andrade, U.; Bengaly, C. A. P.; Alcaniz, J. S.; Santos, B.
2018-04-01
The standard cosmology strongly relies upon the cosmological principle, which consists on the hypotheses of large scale isotropy and homogeneity of the Universe. Testing these assumptions is, therefore, crucial to determining if there are deviations from the standard cosmological paradigm. In this paper, we use the latest type Ia supernova compilations, namely JLA and Union2.1 to test the cosmological isotropy at low redshift ranges (z <0.1 ). This is performed through a Bayesian selection analysis, in which we compare the standard, isotropic model, with another one including a dipole correction due to peculiar velocities. The full covariance matrix of SN distance uncertainties are taken into account. We find that the JLA sample favors the standard model, whilst the Union2.1 results are inconclusive, yet the constraints from both compilations are in agreement with previous analyses. We conclude that there is no evidence for a dipole anisotropy from nearby supernova compilations, albeit this test should be greatly improved with the much-improved data sets from upcoming cosmological surveys.
An objective Bayesian analysis of a crossover design via model selection and model averaging.
Li, Dandan; Sivaganesan, Siva
2016-11-10
Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two-period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Buddhavarapu, Prasad; Smit, Andre F; Prozzi, Jorge A
2015-07-01
Permeable friction course (PFC), a porous hot-mix asphalt, is typically applied to improve wet weather safety on high-speed roadways in Texas. In order to warrant expensive PFC construction, a statistical evaluation of its safety benefits is essential. Generally, the literature on the effectiveness of porous mixes in reducing wet-weather crashes is limited and often inconclusive. In this study, the safety effectiveness of PFC was evaluated using a fully Bayesian before-after safety analysis. First, two groups of road segments overlaid with PFC and non-PFC material were identified across Texas; the non-PFC or reference road segments selected were similar to their PFC counterparts in terms of site specific features. Second, a negative binomial data generating process was assumed to model the underlying distribution of crash counts of PFC and reference road segments to perform Bayesian inference on the safety effectiveness. A data-augmentation based computationally efficient algorithm was employed for a fully Bayesian estimation. The statistical analysis shows that PFC is not effective in reducing wet weather crashes. It should be noted that the findings of this study are in agreement with the existing literature, although these studies were not based on a fully Bayesian statistical analysis. Our study suggests that the safety effectiveness of PFC road surfaces, or any other safety infrastructure, largely relies on its interrelationship with the road user. The results suggest that the safety infrastructure must be properly used to reap the benefits of the substantial investments. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Bayesian model averaging method for the derivation of reservoir operating rules
NASA Astrophysics Data System (ADS)
Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai
2015-09-01
Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.
Modeling the Evolution of Beliefs Using an Attentional Focus Mechanism
Marković, Dimitrije; Gläscher, Jan; Bossaerts, Peter; O’Doherty, John; Kiebel, Stefan J.
2015-01-01
For making decisions in everyday life we often have first to infer the set of environmental features that are relevant for the current task. Here we investigated the computational mechanisms underlying the evolution of beliefs about the relevance of environmental features in a dynamical and noisy environment. For this purpose we designed a probabilistic Wisconsin card sorting task (WCST) with belief solicitation, in which subjects were presented with stimuli composed of multiple visual features. At each moment in time a particular feature was relevant for obtaining reward, and participants had to infer which feature was relevant and report their beliefs accordingly. To test the hypothesis that attentional focus modulates the belief update process, we derived and fitted several probabilistic and non-probabilistic behavioral models, which either incorporate a dynamical model of attentional focus, in the form of a hierarchical winner-take-all neuronal network, or a diffusive model, without attention-like features. We used Bayesian model selection to identify the most likely generative model of subjects’ behavior and found that attention-like features in the behavioral model are essential for explaining subjects’ responses. Furthermore, we demonstrate a method for integrating both connectionist and Bayesian models of decision making within a single framework that allowed us to infer hidden belief processes of human subjects. PMID:26495984
White, Nicole; Benton, Miles; Kennedy, Daniel; Fox, Andrew; Griffiths, Lyn; Lea, Rodney; Mengersen, Kerrie
2017-01-01
Cell- and sex-specific differences in DNA methylation are major sources of epigenetic variation in whole blood. Heterogeneity attributable to cell type has motivated the identification of cell-specific methylation at the CpG level, however statistical methods for this purpose have been limited to pairwise comparisons between cell types or between the cell type of interest and whole blood. We developed a Bayesian model selection algorithm for the identification of cell-specific methylation profiles that incorporates knowledge of shared cell lineage and allows for the identification of differential methylation profiles in one or more cell types simultaneously. Under the proposed methodology, sex-specific differences in methylation by cell type are also assessed. Using publicly available, cell-sorted methylation data, we show that 51.3% of female CpG markers and 61.4% of male CpG markers identified were associated with differential methylation in more than one cell type. The impact of cell lineage on differential methylation was also highlighted. An evaluation of sex-specific differences revealed differences in CD56+NK methylation, within both single and multi- cell dependent methylation patterns. Our findings demonstrate the need to account for cell lineage in studies of differential methylation and associated sex effects.
Bayesian Optimal Interval Design: A Simple and Well-Performing Design for Phase I Oncology Trials
Yuan, Ying; Hess, Kenneth R.; Hilsenbeck, Susan G.; Gilbert, Mark R.
2016-01-01
Despite more than two decades of publications that offer more innovative model-based designs, the classical 3+3 design remains the most dominant phase I trial design in practice. In this article, we introduce a new trial design, the Bayesian optimal interval (BOIN) design. The BOIN design is easy to implement in a way similar to the 3+3 design, but is more flexible for choosing the target toxicity rate and cohort size and yields a substantially better performance that is comparable to that of more complex model-based designs. The BOIN design contains the 3+3 design and the accelerated titration design as special cases, thus linking it to established phase I approaches. A numerical study shows that the BOIN design generally outperforms the 3+3 design and the modified toxicity probability interval (mTPI) design. The BOIN design is more likely than the 3+3 design to correctly select the maximum tolerated dose (MTD) and allocate more patients to the MTD. Compared to the mTPI design, the BOIN design has a substantially lower risk of overdosing patients and generally a higher probability of correctly selecting the MTD. User-friendly software is freely available to facilitate the application of the BOIN design. PMID:27407096
Model Comparison of Bayesian Semiparametric and Parametric Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Xia, Ye-Mao; Pan, Jun-Hao; Lee, Sik-Yum
2011-01-01
Structural equation models have wide applications. One of the most important issues in analyzing structural equation models is model comparison. This article proposes a Bayesian model comparison statistic, namely the "L[subscript nu]"-measure for both semiparametric and parametric structural equation models. For illustration purposes, we consider…
Bayesian methods for estimating GEBVs of threshold traits
Wang, C-L; Ding, X-D; Wang, J-Y; Liu, J-F; Fu, W-X; Zhang, Z; Yin, Z-J; Zhang, Q
2013-01-01
Estimation of genomic breeding values is the key step in genomic selection (GS). Many methods have been proposed for continuous traits, but methods for threshold traits are still scarce. Here we introduced threshold model to the framework of GS, and specifically, we extended the three Bayesian methods BayesA, BayesB and BayesCπ on the basis of threshold model for estimating genomic breeding values of threshold traits, and the extended methods are correspondingly termed BayesTA, BayesTB and BayesTCπ. Computing procedures of the three BayesT methods using Markov Chain Monte Carlo algorithm were derived. A simulation study was performed to investigate the benefit of the presented methods in accuracy with the genomic estimated breeding values (GEBVs) for threshold traits. Factors affecting the performance of the three BayesT methods were addressed. As expected, the three BayesT methods generally performed better than the corresponding normal Bayesian methods, in particular when the number of phenotypic categories was small. In the standard scenario (number of categories=2, incidence=30%, number of quantitative trait loci=50, h2=0.3), the accuracies were improved by 30.4%, 2.4%, and 5.7% points, respectively. In most scenarios, BayesTB and BayesTCπ generated similar accuracies and both performed better than BayesTA. In conclusion, our work proved that threshold model fits well for predicting GEBVs of threshold traits, and BayesTCπ is supposed to be the method of choice for GS of threshold traits. PMID:23149458
Scale Mixture Models with Applications to Bayesian Inference
NASA Astrophysics Data System (ADS)
Qin, Zhaohui S.; Damien, Paul; Walker, Stephen
2003-11-01
Scale mixtures of uniform distributions are used to model non-normal data in time series and econometrics in a Bayesian framework. Heteroscedastic and skewed data models are also tackled using scale mixture of uniform distributions.
Using Latent Class Analysis to Model Temperament Types.
Loken, Eric
2004-10-01
Mixture models are appropriate for data that arise from a set of qualitatively different subpopulations. In this study, latent class analysis was applied to observational data from a laboratory assessment of infant temperament at four months of age. The EM algorithm was used to fit the models, and the Bayesian method of posterior predictive checks was used for model selection. Results show at least three types of infant temperament, with patterns consistent with those identified by previous researchers who classified the infants using a theoretically based system. Multiple imputation of group memberships is proposed as an alternative to assigning subjects to the latent class with maximum posterior probability in order to reflect variance due to uncertainty in the parameter estimation. Latent class membership at four months of age predicted longitudinal outcomes at four years of age. The example illustrates issues relevant to all mixture models, including estimation, multi-modality, model selection, and comparisons based on the latent group indicators.
Merging information from multi-model flood projections in a hierarchical Bayesian framework
NASA Astrophysics Data System (ADS)
Le Vine, Nataliya
2016-04-01
Multi-model ensembles are becoming widely accepted for flood frequency change analysis. The use of multiple models results in large uncertainty around estimates of flood magnitudes, due to both uncertainty in model selection and natural variability of river flow. The challenge is therefore to extract the most meaningful signal from the multi-model predictions, accounting for both model quality and uncertainties in individual model estimates. The study demonstrates the potential of a recently proposed hierarchical Bayesian approach to combine information from multiple models. The approach facilitates explicit treatment of shared multi-model discrepancy as well as the probabilistic nature of the flood estimates, by treating the available models as a sample from a hypothetical complete (but unobserved) set of models. The advantages of the approach are: 1) to insure an adequate 'baseline' conditions with which to compare future changes; 2) to reduce flood estimate uncertainty; 3) to maximize use of statistical information in circumstances where multiple weak predictions individually lack power, but collectively provide meaningful information; 4) to adjust multi-model consistency criteria when model biases are large; and 5) to explicitly consider the influence of the (model performance) stationarity assumption. Moreover, the analysis indicates that reducing shared model discrepancy is the key to further reduction of uncertainty in the flood frequency analysis. The findings are of value regarding how conclusions about changing exposure to flooding are drawn, and to flood frequency change attribution studies.
Semiparametric Thurstonian Models for Recurrent Choices: A Bayesian Analysis
ERIC Educational Resources Information Center
Ansari, Asim; Iyengar, Raghuram
2006-01-01
We develop semiparametric Bayesian Thurstonian models for analyzing repeated choice decisions involving multinomial, multivariate binary or multivariate ordinal data. Our modeling framework has multiple components that together yield considerable flexibility in modeling preference utilities, cross-sectional heterogeneity and parameter-driven…
Optimizing Experimental Design for Comparing Models of Brain Function
Daunizeau, Jean; Preuschoff, Kerstin; Friston, Karl; Stephan, Klaas
2011-01-01
This article presents the first attempt to formalize the optimization of experimental design with the aim of comparing models of brain function based on neuroimaging data. We demonstrate our approach in the context of Dynamic Causal Modelling (DCM), which relates experimental manipulations to observed network dynamics (via hidden neuronal states) and provides an inference framework for selecting among candidate models. Here, we show how to optimize the sensitivity of model selection by choosing among experimental designs according to their respective model selection accuracy. Using Bayesian decision theory, we (i) derive the Laplace-Chernoff risk for model selection, (ii) disclose its relationship with classical design optimality criteria and (iii) assess its sensitivity to basic modelling assumptions. We then evaluate the approach when identifying brain networks using DCM. Monte-Carlo simulations and empirical analyses of fMRI data from a simple bimanual motor task in humans serve to demonstrate the relationship between network identification and the optimal experimental design. For example, we show that deciding whether there is a feedback connection requires shorter epoch durations, relative to asking whether there is experimentally induced change in a connection that is known to be present. Finally, we discuss limitations and potential extensions of this work. PMID:22125485
ERIC Educational Resources Information Center
Rindskopf, David
2012-01-01
Muthen and Asparouhov (2012) made a strong case for the advantages of Bayesian methodology in factor analysis and structural equation models. I show additional extensions and adaptations of their methods and show how non-Bayesians can take advantage of many (though not all) of these advantages by using interval restrictions on parameters. By…
ERIC Educational Resources Information Center
Marcoulides, Katerina M.
2018-01-01
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
A Tutorial Introduction to Bayesian Models of Cognitive Development
ERIC Educational Resources Information Center
Perfors, Amy; Tenenbaum, Joshua B.; Griffiths, Thomas L.; Xu, Fei
2011-01-01
We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the "what", the "how", and the "why" of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for…
ERIC Educational Resources Information Center
Wang, Lijuan; McArdle, John J.
2008-01-01
The main purpose of this research is to evaluate the performance of a Bayesian approach for estimating unknown change points using Monte Carlo simulations. The univariate and bivariate unknown change point mixed models were presented and the basic idea of the Bayesian approach for estimating the models was discussed. The performance of Bayesian…
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T
2010-11-01
Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
BUMPER: the Bayesian User-friendly Model for Palaeo-Environmental Reconstruction
NASA Astrophysics Data System (ADS)
Holden, Phil; Birks, John; Brooks, Steve; Bush, Mark; Hwang, Grace; Matthews-Bird, Frazer; Valencia, Bryan; van Woesik, Robert
2017-04-01
We describe the Bayesian User-friendly Model for Palaeo-Environmental Reconstruction (BUMPER), a Bayesian transfer function for inferring past climate and other environmental variables from microfossil assemblages. The principal motivation for a Bayesian approach is that the palaeoenvironment is treated probabilistically, and can be updated as additional data become available. Bayesian approaches therefore provide a reconstruction-specific quantification of the uncertainty in the data and in the model parameters. BUMPER is fully self-calibrating, straightforward to apply, and computationally fast, requiring 2 seconds to build a 100-taxon model from a 100-site training-set on a standard personal computer. We apply the model's probabilistic framework to generate thousands of artificial training-sets under ideal assumptions. We then use these to demonstrate both the general applicability of the model and the sensitivity of reconstructions to the characteristics of the training-set, considering assemblage richness, taxon tolerances, and the number of training sites. We demonstrate general applicability to real data, considering three different organism types (chironomids, diatoms, pollen) and different reconstructed variables. In all of these applications an identically configured model is used, the only change being the input files that provide the training-set environment and taxon-count data.