Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Groth, Katrina M.; Smith, Curtis L.; Swiler, Laura P.
2014-04-05
In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existingmore » HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.« less
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Precise Network Modeling of Systems Genetics Data Using the Bayesian Network Webserver.
Ziebarth, Jesse D; Cui, Yan
2017-01-01
The Bayesian Network Webserver (BNW, http://compbio.uthsc.edu/BNW ) is an integrated platform for Bayesian network modeling of biological datasets. It provides a web-based network modeling environment that seamlessly integrates advanced algorithms for probabilistic causal modeling and reasoning with Bayesian networks. BNW is designed for precise modeling of relatively small networks that contain less than 20 nodes. The structure learning algorithms used by BNW guarantee the discovery of the best (most probable) network structure given the data. To facilitate network modeling across multiple biological levels, BNW provides a very flexible interface that allows users to assign network nodes into different tiers and define the relationships between and within the tiers. This function is particularly useful for modeling systems genetics datasets that often consist of multiscalar heterogeneous genotype-to-phenotype data. BNW enables users to, within seconds or minutes, go from having a simply formatted input file containing a dataset to using a network model to make predictions about the interactions between variables and the potential effects of experimental interventions. In this chapter, we will introduce the functions of BNW and show how to model systems genetics datasets with BNW.
Cai, C; Rodet, T; Legoupil, S; Mohammad-Djafari, A
2013-11-01
Dual-energy computed tomography (DECT) makes it possible to get two fractions of basis materials without segmentation. One is the soft-tissue equivalent water fraction and the other is the hard-matter equivalent bone fraction. Practical DECT measurements are usually obtained with polychromatic x-ray beams. Existing reconstruction approaches based on linear forward models without counting the beam polychromaticity fail to estimate the correct decomposition fractions and result in beam-hardening artifacts (BHA). The existing BHA correction approaches either need to refer to calibration measurements or suffer from the noise amplification caused by the negative-log preprocessing and the ill-conditioned water and bone separation problem. To overcome these problems, statistical DECT reconstruction approaches based on nonlinear forward models counting the beam polychromaticity show great potential for giving accurate fraction images. This work proposes a full-spectral Bayesian reconstruction approach which allows the reconstruction of high quality fraction images from ordinary polychromatic measurements. This approach is based on a Gaussian noise model with unknown variance assigned directly to the projections without taking negative-log. Referring to Bayesian inferences, the decomposition fractions and observation variance are estimated by using the joint maximum a posteriori (MAP) estimation method. Subject to an adaptive prior model assigned to the variance, the joint estimation problem is then simplified into a single estimation problem. It transforms the joint MAP estimation problem into a minimization problem with a nonquadratic cost function. To solve it, the use of a monotone conjugate gradient algorithm with suboptimal descent steps is proposed. The performance of the proposed approach is analyzed with both simulated and experimental data. The results show that the proposed Bayesian approach is robust to noise and materials. It is also necessary to have the accurate spectrum information about the source-detector system. When dealing with experimental data, the spectrum can be predicted by a Monte Carlo simulator. For the materials between water and bone, less than 5% separation errors are observed on the estimated decomposition fractions. The proposed approach is a statistical reconstruction approach based on a nonlinear forward model counting the full beam polychromaticity and applied directly to the projections without taking negative-log. Compared to the approaches based on linear forward models and the BHA correction approaches, it has advantages in noise robustness and reconstruction accuracy.
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
NASA Astrophysics Data System (ADS)
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.
NASA Astrophysics Data System (ADS)
Arnst, M.; Abello Álvarez, B.; Ponthot, J.-P.; Boman, R.
2017-11-01
This paper is concerned with the characterization and the propagation of errors associated with data limitations in polynomial-chaos-based stochastic methods for uncertainty quantification. Such an issue can arise in uncertainty quantification when only a limited amount of data is available. When the available information does not suffice to accurately determine the probability distributions that must be assigned to the uncertain variables, the Bayesian method for assigning these probability distributions becomes attractive because it allows the stochastic model to account explicitly for insufficiency of the available information. In previous work, such applications of the Bayesian method had already been implemented by using the Metropolis-Hastings and Gibbs Markov Chain Monte Carlo (MCMC) methods. In this paper, we present an alternative implementation, which uses an alternative MCMC method built around an Itô stochastic differential equation (SDE) that is ergodic for the Bayesian posterior. We draw together from the mathematics literature a number of formal properties of this Itô SDE that lend support to its use in the implementation of the Bayesian method, and we describe its discretization, including the choice of the free parameters, by using the implicit Euler method. We demonstrate the proposed methodology on a problem of uncertainty quantification in a complex nonlinear engineering application relevant to metal forming.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.
Marucci-Wellman, H; Lehto, M; Corns, H
2011-12-01
Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review. Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding. When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories. A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.
NASA Astrophysics Data System (ADS)
Olson, R.; An, S. I.
2016-12-01
Atlantic Meridional Overturning Circulation (AMOC) in the ocean might slow down in the future, which can lead to a host of climatic effects in North Atlantic and throughout the world. Despite improvements in climate models and availability of new observations, AMOC projections remain uncertain. Here we constrain CMIP5 multi-model ensemble output with observations of a recently developed AMOC index to provide improved Bayesian predictions of future AMOC. Specifically, we first calculate yearly AMOC index loosely based on Rahmstorf et al. (2015) for years 1880—2004 for both observations, and the CMIP5 models for which relevant output is available. We then assign a weight to each model based on a Bayesian Model Averaging method that accounts for differential model skill in terms of both mean state and variability. We include the temporal autocorrelation in climate model errors, and account for the uncertainty in the parameters of our statistical model. We use the weights to provide future weighted projections of AMOC, and compare them to un-weighted ones. Our projections use bootstrapping to account for uncertainty in internal AMOC variability. We also perform spectral and other statistical analyses to show that AMOC index variability, both in models and in observations, is consistent with red noise. Our results improve on and complement previous work by using a new ensemble of climate models, a different observational metric, and an improved Bayesian weighting method that accounts for differential model skill at reproducing internal variability. Reference: Rahmstorf, S., Box, J. E., Feulner, G., Mann, M. E., Robinson, A., Rutherford, S., & Schaffernicht, E. J. (2015). Exceptional twentieth-century slowdown in atlantic ocean overturning circulation. Nature Climate Change, 5(5), 475-480. doi:10.1038/nclimate2554
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Liu, Zhihong; Zheng, Minghao; Yan, Xin; Gu, Qiong; Gasteiger, Johann; Tijhuis, Johan; Maas, Peter; Li, Jiabo; Xu, Jun
2014-09-01
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
None of the above: A Bayesian account of the detection of novel categories.
Navarro, Daniel J; Kemp, Charles
2017-10-01
Every time we encounter a new object, action, or event, there is some chance that we will need to assign it to a novel category. We describe and evaluate a class of probabilistic models that detect when an object belongs to a category that has not previously been encountered. The models incorporate a prior distribution that is influenced by the distribution of previous objects among categories, and we present 2 experiments that demonstrate that people are also sensitive to this distributional information. Two additional experiments confirm that distributional information is combined with similarity when both sources of information are available. We compare our approach to previous models of unsupervised categorization and to several heuristic-based models, and find that a hierarchical Bayesian approach provides the best account of our data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Holm Hansen, Christian; Warner, Pamela; Parker, Richard A; Walker, Brian R; Critchley, Hilary Od; Weir, Christopher J
2017-12-01
It is often unclear what specific adaptive trial design features lead to an efficient design which is also feasible to implement. This article describes the preparatory simulation study for a Bayesian response-adaptive dose-finding trial design. Dexamethasone for Excessive Menstruation aims to assess the efficacy of Dexamethasone in reducing excessive menstrual bleeding and to determine the best dose for further study. To maximise learning about the dose response, patients receive placebo or an active dose with randomisation probabilities adapting based on evidence from patients already recruited. The dose-response relationship is estimated using a flexible Bayesian Normal Dynamic Linear Model. Several competing design options were considered including: number of doses, proportion assigned to placebo, adaptation criterion, and number and timing of adaptations. We performed a fractional factorial study using SAS software to simulate virtual trial data for candidate adaptive designs under a variety of scenarios and to invoke WinBUGS for Bayesian model estimation. We analysed the simulated trial results using Normal linear models to estimate the effects of each design feature on empirical type I error and statistical power. Our readily-implemented approach using widely available statistical software identified a final design which performed robustly across a range of potential trial scenarios.
Hao, Jie; Astle, William; De Iorio, Maria; Ebbels, Timothy M D
2012-08-01
Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain metabolite profiles in complex biological mixtures. Common methods used to assign and estimate concentrations of metabolites involve either an expert manual peak fitting or extra pre-processing steps, such as peak alignment and binning. Peak fitting is very time consuming and is subject to human error. Conversely, alignment and binning can introduce artefacts and limit immediate biological interpretation of models. We present the Bayesian automated metabolite analyser for NMR spectra (BATMAN), an R package that deconvolutes peaks from one-dimensional NMR spectra, automatically assigns them to specific metabolites from a target list and obtains concentration estimates. The Bayesian model incorporates information on characteristic peak patterns of metabolites and is able to account for shifts in the position of peaks commonly seen in NMR spectra of biological samples. It applies a Markov chain Monte Carlo algorithm to sample from a joint posterior distribution of the model parameters and obtains concentration estimates with reduced error compared with conventional numerical integration and comparable to manual deconvolution by experienced spectroscopists. http://www1.imperial.ac.uk/medicine/people/t.ebbels/ t.ebbels@imperial.ac.uk.
Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele
2014-01-01
Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.
Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele
2014-01-01
Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions. PMID:24905464
NASA Astrophysics Data System (ADS)
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Jakob, Sabine S.; Rödder, Dennis; Engler, Jan O.; Shaaf, Salar; Özkan, Hakan; Blattner, Frank R.; Kilian, Benjamin
2014-01-01
Studies of Hordeum vulgare subsp. spontaneum, the wild progenitor of cultivated barley, have mostly relied on materials collected decades ago and maintained since then ex situ in germplasm repositories. We analyzed spatial genetic variation in wild barley populations collected rather recently, exploring sequence variations at seven single-copy nuclear loci, and inferred the relationships among these populations and toward the genepool of the crop. The wild barley collection covers the whole natural distribution area from the Mediterranean to Middle Asia. In contrast to earlier studies, Bayesian assignment analyses revealed three population clusters, in the Levant, Turkey, and east of Turkey, respectively. Genetic diversity was exceptionally high in the Levant, while eastern populations were depleted of private alleles. Species distribution modeling based on climate parameters and extant occurrence points of the taxon inferred suitable habitat conditions during the ice-age, particularly in the Levant and Turkey. Together with the ecologically wide range of habitats, they might contribute to structured but long-term stable populations in this region and their high genetic diversity. For recently collected individuals, Bayesian assignment to geographic clusters was generally unambiguous, but materials from genebanks often showed accessions that were not placed according to their assumed geographic origin or showed traces of introgression from cultivated barley. We assign this to gene flow among accessions during ex situ maintenance. Evolutionary studies based on such materials might therefore result in wrong conclusions regarding the history of the species or the origin and mode of domestication of the crop, depending on the accessions included. PMID:24586028
Bayesian data analysis tools for atomic physics
NASA Astrophysics Data System (ADS)
Trassinelli, Martino
2017-10-01
We present an introduction to some concepts of Bayesian data analysis in the context of atomic physics. Starting from basic rules of probability, we present the Bayes' theorem and its applications. In particular we discuss about how to calculate simple and joint probability distributions and the Bayesian evidence, a model dependent quantity that allows to assign probabilities to different hypotheses from the analysis of a same data set. To give some practical examples, these methods are applied to two concrete cases. In the first example, the presence or not of a satellite line in an atomic spectrum is investigated. In the second example, we determine the most probable model among a set of possible profiles from the analysis of a statistically poor spectrum. We show also how to calculate the probability distribution of the main spectral component without having to determine uniquely the spectrum modeling. For these two studies, we implement the program Nested_fit to calculate the different probability distributions and other related quantities. Nested_fit is a Fortran90/Python code developed during the last years for analysis of atomic spectra. As indicated by the name, it is based on the nested algorithm, which is presented in details together with the program itself.
Bayesian Model Comparison for the Order Restricted RC Association Model
ERIC Educational Resources Information Center
Iliopoulos, G.; Kateri, M.; Ntzoufras, I.
2009-01-01
Association models constitute an attractive alternative to the usual log-linear models for modeling the dependence between classification variables. They impose special structure on the underlying association by assigning scores on the levels of each classification variable, which can be fixed or parametric. Under the general row-column (RC)…
A fast combination method in DSmT and its application to recommender system
Liu, Yihai
2018-01-01
In many applications involving epistemic uncertainties usually modeled by belief functions, it is often necessary to approximate general (non-Bayesian) basic belief assignments (BBAs) to subjective probabilities (called Bayesian BBAs). This necessity occurs if one needs to embed the fusion result in a system based on the probabilistic framework and Bayesian inference (e.g. tracking systems), or if one needs to make a decision in the decision making problems. In this paper, we present a new fast combination method, called modified rigid coarsening (MRC), to obtain the final Bayesian BBAs based on hierarchical decomposition (coarsening) of the frame of discernment. Regarding this method, focal elements with probabilities are coarsened efficiently to reduce computational complexity in the process of combination by using disagreement vector and a simple dichotomous approach. In order to prove the practicality of our approach, this new approach is applied to combine users’ soft preferences in recommender systems (RSs). Additionally, in order to make a comprehensive performance comparison, the proportional conflict redistribution rule #6 (PCR6) is regarded as a baseline in a range of experiments. According to the results of experiments, MRC is more effective in accuracy of recommendations compared to original Rigid Coarsening (RC) method and comparable in computational time. PMID:29351297
A two step Bayesian approach for genomic prediction of breeding values.
Shariati, Mohammad M; Sørensen, Peter; Janss, Luc
2012-05-21
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Huang, Jen-Pan; Knowles, L Lacey
2016-07-01
With the recent attention and focus on quantitative methods for species delimitation, an overlooked but equally important issue regards what has actually been delimited. This study investigates the apparent arbitrariness of some taxonomic distinctions, and in particular how species and subspecies are assigned. Specifically, we use a recently developed Bayesian model-based approach to show that in the Hercules beetles (genus Dynastes) there is no statistical difference in the probability that putative taxa represent different species, irrespective of whether they were given species or subspecies designations. By considering multiple data types, as opposed to relying exclusively on genetic data alone, we also show that both previously recognized species and subspecies represent a variety of points along the speciation spectrum (i.e., previously recognized species are not systematically further along the continuum than subspecies). For example, based on evolutionary models of divergence, some taxa are statistically distinguishable on more than one axis of differentiation (e.g., along both phenotypic and genetic dimensions), whereas other taxa can only be delimited statistically from a single data type. Because both phenotypic and genetic data are analyzed in a common Bayesian framework, our study provides a framework for investigating whether disagreements in species boundaries among data types reflect (i) actual discordance with the actual history of lineage splitting, or instead (ii) differences among data types in the amount of time required for differentiation to become apparent among the delimited taxa. We discuss what the answers to these questions imply about what characters are used to delimit species, as well as the diverse processes involved in the origin and maintenance of species boundaries. With this in mind, we then reflect more generally on how quantitative methods for species delimitation are used to assign taxonomic status. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Bayesian Framework for Water Quality Model Uncertainty Estimation and Risk Management
A formal Bayesian methodology is presented for integrated model calibration and risk-based water quality management using Bayesian Monte Carlo simulation and maximum likelihood estimation (BMCML). The primary focus is on lucid integration of model calibration with risk-based wat...
Spertus, Jacob V; Normand, Sharon-Lise T
2018-04-23
High-dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high-dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high-dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian modeling of consumer behavior in the presence of anonymous visits
NASA Astrophysics Data System (ADS)
Novak, Julie Esther
Tailoring content to consumers has become a hallmark of marketing and digital media, particularly as it has become easier to identify customers across usage or purchase occasions. However, across a wide variety of contexts, companies find that customers do not consistently identify themselves, leaving a substantial fraction of anonymous visits. We develop a Bayesian hierarchical model that allows us to probabilistically assign anonymous sessions to users. These probabilistic assignments take into account a customer's demographic information, frequency of visitation, activities taken when visiting, and times of arrival. We present two studies, one with synthetic and one with real data, where we demonstrate improved performance over two popular practices (nearest-neighbor matching and deleting the anonymous visits) due to increased efficiency and reduced bias driven by the non-ignorability of which types of events are more likely to be anonymous. Using our proposed model, we avoid potential bias in understanding the effect of a firm's marketing on its customers, improve inference about the total number of customers in the dataset, and provide more precise targeted marketing to both previously observed and unobserved customers.
Drake, Brandon Lee; Wills, Wirt H.; Hamilton, Marian I.; Dorshow, Wetherbee
2014-01-01
Strontium isotope sourcing has become a common and useful method for assigning sources to archaeological artifacts. In Chaco Canyon, an Ancestral Pueblo regional center in New Mexico, previous studies using these methods have suggested that significant portion of maize and wood originate in the Chuska Mountains region, 75 km to the East. In the present manuscript, these results were tested using both frequentist methods (to determine if geochemical sources can truly be differentiated) and Bayesian methods (to address uncertainty in geochemical source attribution). It was found that Chaco Canyon and the Chuska Mountain region are not easily distinguishable based on radiogenic strontium isotope values. The strontium profiles of many geochemical sources in the region overlap, making it difficult to definitively identify any one particular geochemical source for the canyon's pre-historic maize. Bayesian mixing models support the argument that some spruce and fir wood originated in the San Mateo Mountains, but that this cannot explain all 87Sr/86Sr values in Chaco timber. Overall radiogenic strontium isotope data do not clearly identify a single major geochemical source for maize, ponderosa, and most spruce/fir timber. As such, the degree to which Chaco Canyon relied upon outside support for both food and construction material is still ambiguous. PMID:24854352
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
New insights into the classification and nomenclature of cortical GABAergic interneurons.
DeFelipe, Javier; López-Cruz, Pedro L; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R; Huang, Josh; Jones, Edward G; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A; Marín, Oscar; Markram, Henry; McBain, Chris J; Meyer, Hanno S; Monyer, Hannah; Nelson, Sacha B; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L R; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M; Sherwood, Chet C; Staiger, Jochen F; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A
2013-03-01
A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus.
New insights into the classification and nomenclature of cortical GABAergic interneurons
DeFelipe, Javier; López-Cruz, Pedro L.; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F.; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R.; Huang, Josh; Jones, Edward G.; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A.; Marín, Oscar; Markram, Henry; McBain, Chris J.; Meyer, Hanno S.; Monyer, Hannah; Nelson, Sacha B.; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L. R.; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M.; Sherwood, Chet C.; Staiger, Jochen F.; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A.
2013-01-01
A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts’ assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus. PMID:23385869
Using Latent Class Analysis to Model Temperament Types.
Loken, Eric
2004-10-01
Mixture models are appropriate for data that arise from a set of qualitatively different subpopulations. In this study, latent class analysis was applied to observational data from a laboratory assessment of infant temperament at four months of age. The EM algorithm was used to fit the models, and the Bayesian method of posterior predictive checks was used for model selection. Results show at least three types of infant temperament, with patterns consistent with those identified by previous researchers who classified the infants using a theoretically based system. Multiple imputation of group memberships is proposed as an alternative to assigning subjects to the latent class with maximum posterior probability in order to reflect variance due to uncertainty in the parameter estimation. Latent class membership at four months of age predicted longitudinal outcomes at four years of age. The example illustrates issues relevant to all mixture models, including estimation, multi-modality, model selection, and comparisons based on the latent group indicators.
Borchani, Hanen; Bielza, Concha; Martı Nez-Martı N, Pablo; Larrañaga, Pedro
2012-12-01
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson's patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson's disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables. Copyright © 2012 Elsevier Inc. All rights reserved.
Steingroever, Helen; Pachur, Thorsten; Šmíra, Martin; Lee, Michael D
2018-06-01
The Iowa Gambling Task (IGT) is one of the most popular experimental paradigms for comparing complex decision-making across groups. Most commonly, IGT behavior is analyzed using frequentist tests to compare performance across groups, and to compare inferred parameters of cognitive models developed for the IGT. Here, we present a Bayesian alternative based on Bayesian repeated-measures ANOVA for comparing performance, and a suite of three complementary model-based methods for assessing the cognitive processes underlying IGT performance. The three model-based methods involve Bayesian hierarchical parameter estimation, Bayes factor model comparison, and Bayesian latent-mixture modeling. We illustrate these Bayesian methods by applying them to test the extent to which differences in intuitive versus deliberate decision style are associated with differences in IGT performance. The results show that intuitive and deliberate decision-makers behave similarly on the IGT, and the modeling analyses consistently suggest that both groups of decision-makers rely on similar cognitive processes. Our results challenge the notion that individual differences in intuitive and deliberate decision styles have a broad impact on decision-making. They also highlight the advantages of Bayesian methods, especially their ability to quantify evidence in favor of the null hypothesis, and that they allow model-based analyses to incorporate hierarchical and latent-mixture structures.
Fundamentals and Recent Developments in Approximate Bayesian Computation
Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka
2017-01-01
Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922
Log-Linear Models for Gene Association
Hu, Jianhua; Joshi, Adarsh; Johnson, Valen E.
2009-01-01
We describe a class of log-linear models for the detection of interactions in high-dimensional genomic data. This class of models leads to a Bayesian model selection algorithm that can be applied to data that have been reduced to contingency tables using ranks of observations within subjects, and discretization of these ranks within gene/network components. Many normalization issues associated with the analysis of genomic data are thereby avoided. A prior density based on Ewens’ sampling distribution is used to restrict the number of interacting components assigned high posterior probability, and the calculation of posterior model probabilities is expedited by approximations based on the likelihood ratio statistic. Simulation studies are used to evaluate the efficiency of the resulting algorithm for known interaction structures. Finally, the algorithm is validated in a microarray study for which it was possible to obtain biological confirmation of detected interactions. PMID:19655032
NASA Astrophysics Data System (ADS)
Arabzadeh, Vida; Niaki, S. T. A.; Arabzadeh, Vahid
2017-10-01
One of the most important processes in the early stages of construction projects is to estimate the cost involved. This process involves a wide range of uncertainties, which make it a challenging task. Because of unknown issues, using the experience of the experts or looking for similar cases are the conventional methods to deal with cost estimation. The current study presents data-driven methods for cost estimation based on the application of artificial neural network (ANN) and regression models. The learning algorithms of the ANN are the Levenberg-Marquardt and the Bayesian regulated. Moreover, regression models are hybridized with a genetic algorithm to obtain better estimates of the coefficients. The methods are applied in a real case, where the input parameters of the models are assigned based on the key issues involved in a spherical tank construction. The results reveal that while a high correlation between the estimated cost and the real cost exists; both ANNs could perform better than the hybridized regression models. In addition, the ANN with the Levenberg-Marquardt learning algorithm (LMNN) obtains a better estimation than the ANN with the Bayesian-regulated learning algorithm (BRNN). The correlation between real data and estimated values is over 90%, while the mean square error is achieved around 0.4. The proposed LMNN model can be effective to reduce uncertainty and complexity in the early stages of the construction project.
de Nazelle, Audrey; Arunachalam, Saravanan; Serre, Marc L
2010-08-01
States in the USA are required to demonstrate future compliance of criteria air pollutant standards by using both air quality monitors and model outputs. In the case of ozone, the demonstration tests aim at relying heavily on measured values, due to their perceived objectivity and enforceable quality. Weight given to numerical models is diminished by integrating them in the calculations only in a relative sense. For unmonitored locations, the EPA has suggested the use of a spatial interpolation technique to assign current values. We demonstrate that this approach may lead to erroneous assignments of nonattainment and may make it difficult for States to establish future compliance. We propose a method that combines different sources of information to map air pollution, using the Bayesian Maximum Entropy (BME) Framework. The approach gives precedence to measured values and integrates modeled data as a function of model performance. We demonstrate this approach in North Carolina, using the State's ozone monitoring network in combination with outputs from the Multiscale Air Quality Simulation Platform (MAQSIP) modeling system. We show that the BME data integration approach, compared to a spatial interpolation of measured data, improves the accuracy and the precision of ozone estimations across the state.
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
Modeling Diagnostic Assessments with Bayesian Networks
ERIC Educational Resources Information Center
Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego
2007-01-01
This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…
Moving beyond qualitative evaluations of Bayesian models of cognition.
Hemmer, Pernille; Tauber, Sean; Steyvers, Mark
2015-06-01
Bayesian models of cognition provide a powerful way to understand the behavior and goals of individuals from a computational point of view. Much of the focus in the Bayesian cognitive modeling approach has been on qualitative model evaluations, where predictions from the models are compared to data that is often averaged over individuals. In many cognitive tasks, however, there are pervasive individual differences. We introduce an approach to directly infer individual differences related to subjective mental representations within the framework of Bayesian models of cognition. In this approach, Bayesian data analysis methods are used to estimate cognitive parameters and motivate the inference process within a Bayesian cognitive model. We illustrate this integrative Bayesian approach on a model of memory. We apply the model to behavioral data from a memory experiment involving the recall of heights of people. A cross-validation analysis shows that the Bayesian memory model with inferred subjective priors predicts withheld data better than a Bayesian model where the priors are based on environmental statistics. In addition, the model with inferred priors at the individual subject level led to the best overall generalization performance, suggesting that individual differences are important to consider in Bayesian models of cognition.
Molitor, John
2012-03-01
Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
Luce, Bryan R; Connor, Jason T; Broglio, Kristine R; Mullins, C Daniel; Ishak, K Jack; Saunders, Elijah; Davis, Barry R
2016-09-20
Bayesian and adaptive clinical trial designs offer the potential for more efficient processes that result in lower sample sizes and shorter trial durations than traditional designs. To explore the use and potential benefits of Bayesian adaptive clinical trial designs in comparative effectiveness research. Virtual execution of ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) as if it had been done according to a Bayesian adaptive trial design. Comparative effectiveness trial of antihypertensive medications. Patient data sampled from the more than 42 000 patients enrolled in ALLHAT with publicly available data. Number of patients randomly assigned between groups, trial duration, observed numbers of events, and overall trial results and conclusions. The Bayesian adaptive approach and original design yielded similar overall trial conclusions. The Bayesian adaptive trial randomly assigned more patients to the better-performing group and would probably have ended slightly earlier. This virtual trial execution required limited resampling of ALLHAT patients for inclusion in RE-ADAPT (REsearch in ADAptive methods for Pragmatic Trials). Involvement of a data monitoring committee and other trial logistics were not considered. In a comparative effectiveness research trial, Bayesian adaptive trial designs are a feasible approach and potentially generate earlier results and allocate more patients to better-performing groups. National Heart, Lung, and Blood Institute.
BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities
Chipman, Hugh; Gu, Hong; Bielawski, Joseph P.
2014-01-01
Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks) are shared across all groups (microbiome samples), but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems). The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD) patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD-associated communities might arise from opportunist growth of bacteria that can circumvent the host's nutrient-based mechanism for bacterial partner selection. PMID:25412107
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837
ERIC Educational Resources Information Center
Levy, Roy
2014-01-01
Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…
NASA Astrophysics Data System (ADS)
Ogorodnikov, Yuri; Khachay, Michael; Pljonkin, Anton
2018-04-01
We describe the possibility of employing the special case of the 3-SAT problem stemming from the well known integer factorization problem for the quantum cryptography. It is known, that for every instance of our 3-SAT setting the given 3-CNF is satisfiable by a unique truth assignment, and the goal is to find this assignment. Since the complexity status of the factorization problem is still undefined, development of approximation algorithms and heuristics adopts interest of numerous researchers. One of promising approaches to construction of approximation techniques is based on real-valued relaxation of the given 3-CNF followed by minimizing of the appropriate differentiable loss function, and subsequent rounding of the fractional minimizer obtained. Actually, algorithms developed this way differ by the rounding scheme applied on their final stage. We propose a new rounding scheme based on Bayesian learning. The article shows that the proposed method can be used to determine the security in quantum key distribution systems. In the quantum distribution the Shannon rules is applied and the factorization problem is paramount when decrypting secret keys.
Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J
2016-02-01
A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
NASA Astrophysics Data System (ADS)
Ha, Taesung
A probabilistic risk assessment (PRA) was conducted for a loss of coolant accident, (LOCA) in the McMaster Nuclear Reactor (MNR). A level 1 PRA was completed including event sequence modeling, system modeling, and quantification. To support the quantification of the accident sequence identified, data analysis using the Bayesian method and human reliability analysis (HRA) using the accident sequence evaluation procedure (ASEP) approach were performed. Since human performance in research reactors is significantly different from that in power reactors, a time-oriented HRA model (reliability physics model) was applied for the human error probability (HEP) estimation of the core relocation. This model is based on two competing random variables: phenomenological time and performance time. The response surface and direct Monte Carlo simulation with Latin Hypercube sampling were applied for estimating the phenomenological time, whereas the performance time was obtained from interviews with operators. An appropriate probability distribution for the phenomenological time was assigned by statistical goodness-of-fit tests. The human error probability (HEP) for the core relocation was estimated from these two competing quantities: phenomenological time and operators' performance time. The sensitivity of each probability distribution in human reliability estimation was investigated. In order to quantify the uncertainty in the predicted HEPs, a Bayesian approach was selected due to its capability of incorporating uncertainties in model itself and the parameters in that model. The HEP from the current time-oriented model was compared with that from the ASEP approach. Both results were used to evaluate the sensitivity of alternative huinan reliability modeling for the manual core relocation in the LOCA risk model. This exercise demonstrated the applicability of a reliability physics model supplemented with a. Bayesian approach for modeling human reliability and its potential usefulness of quantifying model uncertainty as sensitivity analysis in the PRA model.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
On the robustness of a Bayes estimate. [in reliability theory
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1974-01-01
This paper examines the robustness of a Bayes estimator with respect to the assigned prior distribution. A Bayesian analysis for a stochastic scale parameter of a Weibull failure model is summarized in which the natural conjugate is assigned as the prior distribution of the random parameter. The sensitivity analysis is carried out by the Monte Carlo method in which, although an inverted gamma is the assigned prior, realizations are generated using distribution functions of varying shape. For several distributional forms and even for some fixed values of the parameter, simulated mean squared errors of Bayes and minimum variance unbiased estimators are determined and compared. Results indicate that the Bayes estimator remains squared-error superior and appears to be largely robust to the form of the assigned prior distribution.
A comprehensive probabilistic analysis model of oil pipelines network based on Bayesian network
NASA Astrophysics Data System (ADS)
Zhang, C.; Qin, T. X.; Jiang, B.; Huang, C.
2018-02-01
Oil pipelines network is one of the most important facilities of energy transportation. But oil pipelines network accident may result in serious disasters. Some analysis models for these accidents have been established mainly based on three methods, including event-tree, accident simulation and Bayesian network. Among these methods, Bayesian network is suitable for probabilistic analysis. But not all the important influencing factors are considered and the deployment rule of the factors has not been established. This paper proposed a probabilistic analysis model of oil pipelines network based on Bayesian network. Most of the important influencing factors, including the key environment condition and emergency response are considered in this model. Moreover, the paper also introduces a deployment rule for these factors. The model can be used in probabilistic analysis and sensitive analysis of oil pipelines network accident.
Using Bayesian Learning to Classify College Algebra Students by Understanding in Real-Time
ERIC Educational Resources Information Center
Cousino, Andrew
2013-01-01
The goal of this work is to provide instructors with detailed information about their classes at each assignment during the term. The information is both on an individual level and at the aggregate level. We used the large number of grades, which are available online these days, along with data-mining techniques to build our models. This enabled…
NASA Astrophysics Data System (ADS)
Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor
2018-02-01
Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Estimating Tree Height-Diameter Models with the Bayesian Method
Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the “best” model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2. PMID:24711733
Estimating tree height-diameter models with the Bayesian method.
Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.
BCM: toolkit for Bayesian analysis of Computational Models using samplers.
Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A
2016-10-21
Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Sparse Event Modeling with Hierarchical Bayesian Kernel Methods
2016-01-05
SECURITY CLASSIFICATION OF: The research objective of this proposal was to develop a predictive Bayesian kernel approach to model count data based on...several predictive variables. Such an approach, which we refer to as the Poisson Bayesian kernel model , is able to model the rate of occurrence of...which adds specificity to the model and can make nonlinear data more manageable. Early results show that the 1. REPORT DATE (DD-MM-YYYY) 4. TITLE
A voxel-based investigation for MRI-only radiotherapy of the brain using ultra short echo times
NASA Astrophysics Data System (ADS)
Edmund, Jens M.; Kjer, Hans M.; Van Leemput, Koen; Hansen, Rasmus H.; Andersen, Jon AL; Andreasen, Daniel
2014-12-01
Radiotherapy (RT) based on magnetic resonance imaging (MRI) as the only modality, so-called MRI-only RT, would remove the systematic registration error between MR and computed tomography (CT), and provide co-registered MRI for assessment of treatment response and adaptive RT. Electron densities, however, need to be assigned to the MRI images for dose calculation and patient setup based on digitally reconstructed radiographs (DRRs). Here, we investigate the geometric and dosimetric performance for a number of popular voxel-based methods to generate a so-called pseudo CT (pCT). Five patients receiving cranial irradiation, each containing a co-registered MRI and CT scan, were included. An ultra short echo time MRI sequence for bone visualization was used. Six methods were investigated for three popular types of voxel-based approaches; (1) threshold-based segmentation, (2) Bayesian segmentation and (3) statistical regression. Each approach contained two methods. Approach 1 used bulk density assignment of MRI voxels into air, soft tissue and bone based on logical masks and the transverse relaxation time T2 of the bone. Approach 2 used similar bulk density assignments with Bayesian statistics including or excluding additional spatial information. Approach 3 used a statistical regression correlating MRI voxels with their corresponding CT voxels. A similar photon and proton treatment plan was generated for a target positioned between the nasal cavity and the brainstem for all patients. The CT agreement with the pCT of each method was quantified and compared with the other methods geometrically and dosimetrically using both a number of reported metrics and introducing some novel metrics. The best geometrical agreement with CT was obtained with the statistical regression methods which performed significantly better than the threshold and Bayesian segmentation methods (excluding spatial information). All methods agreed significantly better with CT than a reference water MRI comparison. The mean dosimetric deviation for photons and protons compared to the CT was about 2% and highest in the gradient dose region of the brainstem. Both the threshold based method and the statistical regression methods showed the highest dosimetrical agreement. Generation of pCTs using statistical regression seems to be the most promising candidate for MRI-only RT of the brain. Further, the total amount of different tissues needs to be taken into account for dosimetric considerations regardless of their correct geometrical position.
Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering
NASA Astrophysics Data System (ADS)
Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki
2016-11-01
We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
ERIC Educational Resources Information Center
Aslan, Burak Galip; Öztürk, Özlem; Inceoglu, Mustafa Murat
2014-01-01
Considering the increasing importance of adaptive approaches in CALL systems, this study implemented a machine learning based student modeling middleware with Bayesian networks. The profiling approach of the student modeling system is based on Felder and Silverman's Learning Styles Model and Felder and Soloman's Index of Learning Styles…
NASA Astrophysics Data System (ADS)
Li, Lu; Xu, Chong-Yu; Engeland, Kolbjørn
2013-04-01
SummaryWith respect to model calibration, parameter estimation and analysis of uncertainty sources, various regression and probabilistic approaches are used in hydrological modeling. A family of Bayesian methods, which incorporates different sources of information into a single analysis through Bayes' theorem, is widely used for uncertainty assessment. However, none of these approaches can well treat the impact of high flows in hydrological modeling. This study proposes a Bayesian modularization uncertainty assessment approach in which the highest streamflow observations are treated as suspect information that should not influence the inference of the main bulk of the model parameters. This study includes a comprehensive comparison and evaluation of uncertainty assessments by our new Bayesian modularization method and standard Bayesian methods using the Metropolis-Hastings (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions were used in combination with standard Bayesian method: the AR(1) plus Normal model independent of time (Model 1), the AR(1) plus Normal model dependent on time (Model 2) and the AR(1) plus Multi-normal model (Model 3). The results reveal that the Bayesian modularization method provides the most accurate streamflow estimates measured by the Nash-Sutcliffe efficiency and provide the best in uncertainty estimates for low, medium and entire flows compared to standard Bayesian methods. The study thus provides a new approach for reducing the impact of high flows on the discharge uncertainty assessment of hydrological models via Bayesian method.
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.
Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong
2015-11-01
The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.
Uncertainty aggregation and reduction in structure-material performance prediction
NASA Astrophysics Data System (ADS)
Hu, Zhen; Mahadevan, Sankaran; Ao, Dan
2018-02-01
An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.
Hierarchical Bayesian Modeling of Fluid-Induced Seismicity
NASA Astrophysics Data System (ADS)
Broccardo, M.; Mignan, A.; Wiemer, S.; Stojadinovic, B.; Giardini, D.
2017-11-01
In this study, we present a Bayesian hierarchical framework to model fluid-induced seismicity. The framework is based on a nonhomogeneous Poisson process with a fluid-induced seismicity rate proportional to the rate of injected fluid. The fluid-induced seismicity rate model depends upon a set of physically meaningful parameters and has been validated for six fluid-induced case studies. In line with the vision of hierarchical Bayesian modeling, the rate parameters are considered as random variables. We develop both the Bayesian inference and updating rules, which are used to develop a probabilistic forecasting model. We tested the Basel 2006 fluid-induced seismic case study to prove that the hierarchical Bayesian model offers a suitable framework to coherently encode both epistemic uncertainty and aleatory variability. Moreover, it provides a robust and consistent short-term seismic forecasting model suitable for online risk quantification and mitigation.
Classifying emotion in Twitter using Bayesian network
NASA Astrophysics Data System (ADS)
Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya
2018-03-01
Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.
ERIC Educational Resources Information Center
Zhang, Zhidong
2016-01-01
This study explored an alternative assessment procedure to examine learning trajectories of matrix multiplication. It took rule-based analytical and cognitive task analysis methods specifically to break down operation rules for a given matrix multiplication. Based on the analysis results, a hierarchical Bayesian network, an assessment model,…
A Gibbs sampler for Bayesian analysis of site-occupancy data
Dorazio, Robert M.; Rodriguez, Daniel Taylor
2012-01-01
1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.
Development of dynamic Bayesian models for web application test management
NASA Astrophysics Data System (ADS)
Azarnova, T. V.; Polukhin, P. V.; Bondarenko, Yu V.; Kashirina, I. L.
2018-03-01
The mathematical apparatus of dynamic Bayesian networks is an effective and technically proven tool that can be used to model complex stochastic dynamic processes. According to the results of the research, mathematical models and methods of dynamic Bayesian networks provide a high coverage of stochastic tasks associated with error testing in multiuser software products operated in a dynamically changing environment. Formalized representation of the discrete test process as a dynamic Bayesian model allows us to organize the logical connection between individual test assets for multiple time slices. This approach gives an opportunity to present testing as a discrete process with set structural components responsible for the generation of test assets. Dynamic Bayesian network-based models allow us to combine in one management area individual units and testing components with different functionalities and a direct influence on each other in the process of comprehensive testing of various groups of computer bugs. The application of the proposed models provides an opportunity to use a consistent approach to formalize test principles and procedures, methods used to treat situational error signs, and methods used to produce analytical conclusions based on test results.
2011-01-01
Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519
Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng
2014-01-01
A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping.
Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng
2014-01-01
A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping. PMID:25254227
Lim, Cherry; Wannapinij, Prapass; White, Lisa; Day, Nicholas P J; Cooper, Ben S; Peacock, Sharon J; Limmathurotsakul, Direk
2013-01-01
Estimates of the sensitivity and specificity for new diagnostic tests based on evaluation against a known gold standard are imprecise when the accuracy of the gold standard is imperfect. Bayesian latent class models (LCMs) can be helpful under these circumstances, but the necessary analysis requires expertise in computational programming. Here, we describe open-access web-based applications that allow non-experts to apply Bayesian LCMs to their own data sets via a user-friendly interface. Applications for Bayesian LCMs were constructed on a web server using R and WinBUGS programs. The models provided (http://mice.tropmedres.ac) include two Bayesian LCMs: the two-tests in two-population model (Hui and Walter model) and the three-tests in one-population model (Walter and Irwig model). Both models are available with simplified and advanced interfaces. In the former, all settings for Bayesian statistics are fixed as defaults. Users input their data set into a table provided on the webpage. Disease prevalence and accuracy of diagnostic tests are then estimated using the Bayesian LCM, and provided on the web page within a few minutes. With the advanced interfaces, experienced researchers can modify all settings in the models as needed. These settings include correlation among diagnostic test results and prior distributions for all unknown parameters. The web pages provide worked examples with both models using the original data sets presented by Hui and Walter in 1980, and by Walter and Irwig in 1988. We also illustrate the utility of the advanced interface using the Walter and Irwig model on a data set from a recent melioidosis study. The results obtained from the web-based applications were comparable to those published previously. The newly developed web-based applications are open-access and provide an important new resource for researchers worldwide to evaluate new diagnostic tests.
Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg
2015-12-09
We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.
Foster, Charles S P; Sauquet, Hervê; van der Merwe, Marlien; McPherson, Hannah; Rossetto, Maurizio; Ho, Simon Y W
2017-05-01
The evolutionary timescale of angiosperms has long been a key question in biology. Molecular estimates of this timescale have shown considerable variation, being influenced by differences in taxon sampling, gene sampling, fossil calibrations, evolutionary models, and choices of priors. Here, we analyze a data set comprising 76 protein-coding genes from the chloroplast genomes of 195 taxa spanning 86 families, including novel genome sequences for 11 taxa, to evaluate the impact of models, priors, and gene sampling on Bayesian estimates of the angiosperm evolutionary timescale. Using a Bayesian relaxed molecular-clock method, with a core set of 35 minimum and two maximum fossil constraints, we estimated that crown angiosperms arose 221 (251-192) Ma during the Triassic. Based on a range of additional sensitivity and subsampling analyses, we found that our date estimates were generally robust to large changes in the parameters of the birth-death tree prior and of the model of rate variation across branches. We found an exception to this when we implemented fossil calibrations in the form of highly informative gamma priors rather than as uniform priors on node ages. Under all other calibration schemes, including trials of seven maximum age constraints, we consistently found that the earliest divergences of angiosperm clades substantially predate the oldest fossils that can be assigned unequivocally to their crown group. Overall, our results and experiments with genome-scale data suggest that reliable estimates of the angiosperm crown age will require increased taxon sampling, significant methodological changes, and new information from the fossil record. [Angiospermae, chloroplast, genome, molecular dating, Triassic.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Ellis, Ian O.; Green, Andrew R.; Hanka, Rudolf
2008-01-01
Background We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples. Methodology/Principal Findings We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples. Conclusions/Significance Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in clinical studies. PMID:18698346
Constructing a Bayesian network model for improving safety behavior of employees at workplaces.
Mohammadfam, Iraj; Ghasemi, Fakhradin; Kalatpour, Omid; Moghimbeigi, Abbas
2017-01-01
Unsafe behavior increases the risk of accident at workplaces and needs to be managed properly. The aim of the present study was to provide a model for managing and improving safety behavior of employees using the Bayesian networks approach. The study was conducted in several power plant construction projects in Iran. The data were collected using a questionnaire composed of nine factors, including management commitment, supporting environment, safety management system, employees' participation, safety knowledge, safety attitude, motivation, resource allocation, and work pressure. In order for measuring the score of each factor assigned by a responder, a measurement model was constructed for each of them. The Bayesian network was constructed using experts' opinions and Dempster-Shafer theory. Using belief updating, the best intervention strategies for improving safety behavior also were selected. The result of the present study demonstrated that the majority of employees do not tend to consider safety rules, regulation, procedures and norms in their behavior at the workplace. Safety attitude, safety knowledge, and supporting environment were the best predictor of safety behavior. Moreover, it was determined that instantaneous improvement of supporting environment and employee participation is the best strategy to reach a high proportion of safety behavior at the workplace. The lack of a comprehensive model that can be used for explaining safety behavior was one of the most problematic issues of the study. Furthermore, it can be concluded that belief updating is a unique feature of Bayesian networks that is very useful in comparing various intervention strategies and selecting the best one form them. Copyright © 2016 Elsevier Ltd. All rights reserved.
Tracing Asian Seabass Individuals to Single Fish Farms Using Microsatellites
Yue, Gen Hua; Xia, Jun Hong; Liu, Peng; Liu, Feng; Sun, Fei; Lin, Grace
2012-01-01
Traceability through physical labels is well established, but it is not highly reliable as physical labels can be easily changed or lost. Application of DNA markers to the traceability of food plays an increasingly important role for consumer protection and confidence building. In this study, we tested the efficiency of 16 polymorphic microsatellites and their combinations for tracing 368 fish to four populations where they originated. Using the maximum likelihood and Bayesian methods, three most efficient microsatellites were required to assign over 95% of fish to the correct populations. Selection of markers based on the assignment score estimated with the software WHICHLOCI was most effective in choosing markers for individual assignment, followed by the selection based on the allele number of individual markers. By combining rapid DNA extraction, and high-throughput genotyping of selected microsatellites, it is possible to conduct routine genetic traceability with high accuracy in Asian seabass. PMID:23285169
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
Torruella, Guifré; Derelle, Romain; Paps, Jordi; Lang, B. Franz; Roger, Andrew J.; Shalchian-Tabrizi, Kamran; Ruiz-Trillo, Iñaki
2012-01-01
Many of the eukaryotic phylogenomic analyses published to date were based on alignments of hundreds to thousands of genes. Frequently, in such analyses, the most realistic evolutionary models currently available are often used to minimize the impact of systematic error. However, controversy remains over whether or not idiosyncratic gene family dynamics (i.e., gene duplications and losses) and incorrect orthology assignments are always appropriately taken into account. In this paper, we present an innovative strategy for overcoming orthology assignment problems. Rather than identifying and eliminating genes with paralogy problems, we have constructed a data set comprised exclusively of conserved single-copy protein domains that, unlike most of the commonly used phylogenomic data sets, should be less confounded by orthology miss-assignments. To evaluate the power of this approach, we performed maximum likelihood and Bayesian analyses to infer the evolutionary relationships within the opisthokonts (which includes Metazoa, Fungi, and related unicellular lineages). We used this approach to test 1) whether Filasterea and Ichthyosporea form a clade, 2) the interrelationships of early-branching metazoans, and 3) the relationships among early-branching fungi. We also assessed the impact of some methods that are known to minimize systematic error, including reducing the distance between the outgroup and ingroup taxa or using the CAT evolutionary model. Overall, our analyses support the Filozoa hypothesis in which Ichthyosporea are the first holozoan lineage to emerge followed by Filasterea, Choanoflagellata, and Metazoa. Blastocladiomycota appears as a lineage separate from Chytridiomycota, although this result is not strongly supported. These results represent independent tests of previous phylogenetic hypotheses, highlighting the importance of sophisticated approaches for orthology assignment in phylogenomic analyses. PMID:21771718
Zeng, Jianyang; Roberts, Kyle E.; Zhou, Pei
2011-01-01
Abstract A major bottleneck in protein structure determination via nuclear magnetic resonance (NMR) is the lengthy and laborious process of assigning resonances and nuclear Overhauser effect (NOE) cross peaks. Recent studies have shown that accurate backbone folds can be determined using sparse NMR data, such as residual dipolar couplings (RDCs) or backbone chemical shifts. This opens a question of whether we can also determine the accurate protein side-chain conformations using sparse or unassigned NMR data. We attack this question by using unassigned nuclear Overhauser effect spectroscopy (NOESY) data, which records the through-space dipolar interactions between protons nearby in three-dimensional (3D) space. We propose a Bayesian approach with a Markov random field (MRF) model to integrate the likelihood function derived from observed experimental data, with prior information (i.e., empirical molecular mechanics energies) about the protein structures. We unify the side-chain structure prediction problem with the side-chain structure determination problem using unassigned NMR data, and apply the deterministic dead-end elimination (DEE) and A* search algorithms to provably find the global optimum solution that maximizes the posterior probability. We employ a Hausdorff-based measure to derive the likelihood of a rotamer or a pairwise rotamer interaction from unassigned NOESY data. In addition, we apply a systematic and rigorous approach to estimate the experimental noise in NMR data, which also determines the weighting factor of the data term in the scoring function derived from the Bayesian framework. We tested our approach on real NMR data of three proteins: the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), and human ubiquitin. The promising results indicate that our algorithm can be applied in high-resolution protein structure determination. Since our approach does not require any NOE assignment, it can accelerate the NMR structure determination process. PMID:21970619
Bayesian networks for maritime traffic accident prevention: benefits and challenges.
Hänninen, Maria
2014-12-01
Bayesian networks are quantitative modeling tools whose applications to the maritime traffic safety context are becoming more popular. This paper discusses the utilization of Bayesian networks in maritime safety modeling. Based on literature and the author's own experiences, the paper studies what Bayesian networks can offer to maritime accident prevention and safety modeling and discusses a few challenges in their application to this context. It is argued that the capability of representing rather complex, not necessarily causal but uncertain relationships makes Bayesian networks an attractive modeling tool for the maritime safety and accidents. Furthermore, as the maritime accident and safety data is still rather scarce and has some quality problems, the possibility to combine data with expert knowledge and the easy way of updating the model after acquiring more evidence further enhance their feasibility. However, eliciting the probabilities from the maritime experts might be challenging and the model validation can be tricky. It is concluded that with the utilization of several data sources, Bayesian updating, dynamic modeling, and hidden nodes for latent variables, Bayesian networks are rather well-suited tools for the maritime safety management and decision-making. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Curtis, Gary P.; Lu, Dan; Ye, Ming
2015-01-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. This study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict the reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. These reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Limitations of applying MLBMA to the synthetic study and future real-world modeling are discussed.
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
Bayesian Image Segmentations by Potts Prior and Loopy Belief Propagation
NASA Astrophysics Data System (ADS)
Tanaka, Kazuyuki; Kataoka, Shun; Yasuda, Muneki; Waizumi, Yuji; Hsu, Chiou-Ting
2014-12-01
This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in loopy belief propagations for Potts models influence our hyperparameter estimation procedures.
Adjaye-Gbewonyo, Dzifa; Bednarczyk, Robert A; Davis, Robert L; Omer, Saad B
2014-02-01
To validate classification of race/ethnicity based on the Bayesian Improved Surname Geocoding method (BISG) and assess variations in validity by gender and age. Secondary data on members of Kaiser Permanente Georgia, an integrated managed care organization, through 2010. For 191,494 members with self-reported race/ethnicity, probabilities for belonging to each of six race/ethnicity categories predicted from the BISG algorithm were used to assign individuals to a race/ethnicity category over a range of cutoffs greater than a probability of 0.50. Overall as well as gender- and age-stratified sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Receiver operating characteristic (ROC) curves were generated and used to identify optimal cutoffs for race/ethnicity assignment. The overall cutoffs for assignment that optimized sensitivity and specificity ranged from 0.50 to 0.57 for the four main racial/ethnic categories (White, Black, Asian/Pacific Islander, Hispanic). Corresponding sensitivity, specificity, PPV, and NPV ranged from 64.4 to 81.4 percent, 80.8 to 99.7 percent, 75.0 to 91.6 percent, and 79.4 to 98.0 percent, respectively. Accuracy of assignment was better among males and individuals of 65 years or older. BISG may be useful for classifying race/ethnicity of health plan members when needed for health care studies. © Health Research and Educational Trust.
Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model
Bitzer, Sebastian; Park, Hame; Blankenburg, Felix; Kiebel, Stefan J.
2014-01-01
Behavioral data obtained with perceptual decision making experiments are typically analyzed with the drift-diffusion model. This parsimonious model accumulates noisy pieces of evidence toward a decision bound to explain the accuracy and reaction times of subjects. Recently, Bayesian models have been proposed to explain how the brain extracts information from noisy input as typically presented in perceptual decision making tasks. It has long been known that the drift-diffusion model is tightly linked with such functional Bayesian models but the precise relationship of the two mechanisms was never made explicit. Using a Bayesian model, we derived the equations which relate parameter values between these models. In practice we show that this equivalence is useful when fitting multi-subject data. We further show that the Bayesian model suggests different decision variables which all predict equal responses and discuss how these may be discriminated based on neural correlates of accumulated evidence. In addition, we discuss extensions to the Bayesian model which would be difficult to derive for the drift-diffusion model. We suggest that these and other extensions may be highly useful for deriving new experiments which test novel hypotheses. PMID:24616689
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Bayesian reconstruction and use of anatomical a priori information for emission tomography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowsher, J.E.; Johnson, V.E.; Turkington, T.G.
1996-10-01
A Bayesian method is presented for simultaneously segmenting and reconstructing emission computed tomography (ECT) images and for incorporating high-resolution, anatomical information into those reconstructions. The anatomical information is often available from other imaging modalities such as computed tomography (CT) or magnetic resonance imaging (MRI). The Bayesian procedure models the ECT radiopharmaceutical distribution as consisting of regions, such that radiopharmaceutical activity is similar throughout each region. It estimates the number of regions, the mean activity of each region, and the region classification and mean activity of each voxel. Anatomical information is incorporated by assigning higher prior probabilities to ECT segmentations inmore » which each ECT region stays within a single anatomical region. This approach is effective because anatomical tissue type often strongly influences radiopharmaceutical uptake. The Bayesian procedure is evaluated using physically acquired single-photon emission computed tomography (SPECT) projection data and MRI for the three-dimensional (3-D) Hoffman brain phantom. A clinically realistic count level is used. A cold lesion within the brain phantom is created during the SPECT scan but not during the MRI to demonstrate that the estimation procedure can detect ECT structure that is not present anatomically.« less
Clearing Unexploded Ordnance: Bayesian Methodology for Assessing Success
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, K K.
2005-10-30
The Department of Defense has many Formerly Used Defense Sites (FUDS) that are slated for transfer for public use. Some sites have unexploded ordnance (UXO) that must be cleared prior to any land transfers. Sites are characterized using geophysical sensing devices and locations are identified where possible UXO may be located. In practice, based on the analysis of the geophysical surveys, a dig list of N suspect locations is created for a site that is possibly contaminated with UXO. The suspect locations on the dig list are often assigned into K bins ranging from ``most likely to contain UXO" tomore » ``least likely to be UXO" based on signal discrimination techniques and expert judgment. Usually all dig list locations are sampled to determine if UXO is present before the site is determined to be free of UXO. While this method is 100% certain to insure no UXO remains in the locations identified by the signal discrimination and expert judgment, it is very costly. This paper proposes a statistical Bayesian methodology that may result in digging less than 100% of the suspect locations to reach a pre-defined tolerable risk, where risk is defined in terms of a low probability that any UXO remains in the unsampled dig list locations. Two important features of a Bayesian approach are that it can account for uncertainties in model parameters and that it can handle data that becomes available in stages. The results from each stage of data can be used to direct the subsequent digs.« less
Orhan, U.; Erdogmus, D.; Roark, B.; Oken, B.; Purwar, S.; Hild, K. E.; Fowler, A.; Fried-Oken, M.
2013-01-01
RSVP Keyboard™ is an electroencephalography (EEG) based brain computer interface (BCI) typing system, designed as an assistive technology for the communication needs of people with locked-in syndrome (LIS). It relies on rapid serial visual presentation (RSVP) and does not require precise eye gaze control. Existing BCI typing systems which uses event related potentials (ERP) in EEG suffer from low accuracy due to low signal-to-noise ratio. Henceforth, RSVP Keyboard™ utilizes a context based decision making via incorporating a language model, to improve the accuracy of letter decisions. To further improve the contributions of the language model, we propose recursive Bayesian estimation, which relies on non-committing string decisions, and conduct an offline analysis, which compares it with the existing naïve Bayesian fusion approach. The results indicate the superiority of the recursive Bayesian fusion and in the next generation of RSVP Keyboard™ we plan to incorporate this new approach. PMID:23366432
2D Bayesian automated tilted-ring fitting of disc galaxies in large H I galaxy surveys: 2DBAT
NASA Astrophysics Data System (ADS)
Oh, Se-Heon; Staveley-Smith, Lister; Spekkens, Kristine; Kamphuis, Peter; Koribalski, Bärbel S.
2018-01-01
We present a novel algorithm based on a Bayesian method for 2D tilted-ring analysis of disc galaxy velocity fields. Compared to the conventional algorithms based on a chi-squared minimization procedure, this new Bayesian-based algorithm suffers less from local minima of the model parameters even with highly multimodal posterior distributions. Moreover, the Bayesian analysis, implemented via Markov Chain Monte Carlo sampling, only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature will be essential when performing kinematic analysis on the large number of resolved galaxies expected to be detected in neutral hydrogen (H I) surveys with the Square Kilometre Array and its pathfinders. The so-called 2D Bayesian Automated Tilted-ring fitter (2DBAT) implements Bayesian fits of 2D tilted-ring models in order to derive rotation curves of galaxies. We explore 2DBAT performance on (a) artificial H I data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies, and (b) Australia Telescope Compact Array H I data from the Local Volume H I Survey. We find that 2DBAT works best for well-resolved galaxies with intermediate inclinations (20° < i < 70°), complementing 3D techniques better suited to modelling inclined galaxies.
A test of geographic assignment using isotope tracers in feathers of known origin
Wunder, Michael B.; Kester, C.L.; Knopf, F.L.; Rye, R.O.
2005-01-01
We used feathers of known origin collected from across the breeding range of a migratory shorebird to test the use of isotope tracers for assigning breeding origins. We analyzed δD, δ13C, and δ15N in feathers from 75 mountain plover (Charadrius montanus) chicks sampled in 2001 and from 119 chicks sampled in 2002. We estimated parameters for continuous-response inverse regression models and for discrete-response Bayesian probability models from data for each year independently. We evaluated model predictions with both the training data and by using the alternate year as an independent test dataset. Our results provide weak support for modeling latitude and isotope values as monotonic functions of one another, especially when data are pooled over known sources of variation such as sample year or location. We were unable to make even qualitative statements, such as north versus south, about the likely origin of birds using both δD and δ13C in inverse regression models; results were no better than random assignment. Probability models provided better results and a more natural framework for the problem. Correct assignment rates were highest when considering all three isotopes in the probability framework, but the use of even a single isotope was better than random assignment. The method appears relatively robust to temporal effects and is most sensitive to the isotope discrimination gradients over which samples are taken. We offer that the problem of using isotope tracers to infer geographic origin is best framed as one of assignment, rather than prediction.
Theory-based Bayesian Models of Inductive Inference
2010-07-19
Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom
Bayesian Modeling of a Human MMORPG Player
NASA Astrophysics Data System (ADS)
Synnaeve, Gabriel; Bessière, Pierre
2011-03-01
This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.
Working memory training in older adults: Bayesian evidence supporting the absence of transfer.
Guye, Sabrina; von Bastian, Claudia C
2017-12-01
The question of whether working memory training leads to generalized improvements in untrained cognitive abilities is a longstanding and heatedly debated one. Previous research provides mostly ambiguous evidence regarding the presence or absence of transfer effects in older adults. Thus, to draw decisive conclusions regarding the effectiveness of working memory training interventions, methodologically sound studies with larger sample sizes are needed. In this study, we investigated whether or not a computer-based working memory training intervention induced near and far transfer in a large sample of 142 healthy older adults (65 to 80 years). Therefore, we randomly assigned participants to either the experimental group, which completed 25 sessions of adaptive, process-based working memory training, or to the active, adaptive visual search control group. Bayesian linear mixed-effects models were used to estimate performance improvements on the level of abilities, using multiple indicator tasks for near (working memory) and far transfer (fluid intelligence, shifting, and inhibition). Our data provided consistent evidence supporting the absence of near transfer to untrained working memory tasks and the absence of far transfer effects to all of the assessed abilities. Our results suggest that working memory training is not an effective way to improve general cognitive functioning in old age. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Harsch, Tobias; Schneider, Philipp; Kieninger, Bärbel; Donaubauer, Harald; Kalbitzer, Hans Robert
2017-02-01
Side chain amide protons of asparagine and glutamine residues in random-coil peptides are characterized by large chemical shift differences and can be stereospecifically assigned on the basis of their chemical shift values only. The bimodal chemical shift distributions stored in the biological magnetic resonance data bank (BMRB) do not allow such an assignment. However, an analysis of the BMRB shows, that a substantial part of all stored stereospecific assignments is not correct. We show here that in most cases stereospecific assignment can also be done for folded proteins using an unbiased artificial chemical shift data base (UACSB). For a separation of the chemical shifts of the two amide resonance lines with differences ≥0.40 ppm for asparagine and differences ≥0.42 ppm for glutamine, the downfield shifted resonance lines can be assigned to H δ21 and H ε21 , respectively, at a confidence level >95%. A classifier derived from UASCB can also be used to correct the BMRB data. The program tool AssignmentChecker implemented in AUREMOL calculates the Bayesian probability for a given stereospecific assignment and automatically corrects the assignments for a given list of chemical shifts.
A local approach for focussed Bayesian fusion
NASA Astrophysics Data System (ADS)
Sander, Jennifer; Heizmann, Michael; Goussev, Igor; Beyerer, Jürgen
2009-04-01
Local Bayesian fusion approaches aim to reduce high storage and computational costs of Bayesian fusion which is separated from fixed modeling assumptions. Using the small world formalism, we argue why this proceeding is conform with Bayesian theory. Then, we concentrate on the realization of local Bayesian fusion by focussing the fusion process solely on local regions that are task relevant with a high probability. The resulting local models correspond then to restricted versions of the original one. In a previous publication, we used bounds for the probability of misleading evidence to show the validity of the pre-evaluation of task specific knowledge and prior information which we perform to build local models. In this paper, we prove the validity of this proceeding using information theoretic arguments. For additional efficiency, local Bayesian fusion can be realized in a distributed manner. Here, several local Bayesian fusion tasks are evaluated and unified after the actual fusion process. For the practical realization of distributed local Bayesian fusion, software agents are predestinated. There is a natural analogy between the resulting agent based architecture and criminal investigations in real life. We show how this analogy can be used to improve the efficiency of distributed local Bayesian fusion additionally. Using a landscape model, we present an experimental study of distributed local Bayesian fusion in the field of reconnaissance, which highlights its high potential.
An introduction to Bayesian statistics in health psychology.
Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske
2017-09-01
The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.
Bayesian estimation of seasonal course of canopy leaf area index from hyperspectral satellite data
NASA Astrophysics Data System (ADS)
Varvia, Petri; Rautiainen, Miina; Seppänen, Aku
2018-03-01
In this paper, Bayesian inversion of a physically-based forest reflectance model is investigated to estimate of boreal forest canopy leaf area index (LAI) from EO-1 Hyperion hyperspectral data. The data consist of multiple forest stands with different species compositions and structures, imaged in three phases of the growing season. The Bayesian estimates of canopy LAI are compared to reference estimates based on a spectral vegetation index. The forest reflectance model contains also other unknown variables in addition to LAI, for example leaf single scattering albedo and understory reflectance. In the Bayesian approach, these variables are estimated simultaneously with LAI. The feasibility and seasonal variation of these estimates is also examined. Credible intervals for the estimates are also calculated and evaluated. The results show that the Bayesian inversion approach is significantly better than using a comparable spectral vegetation index regression.
Tree Biomass Estimation of Chinese fir (Cunninghamia lanceolata) Based on Bayesian Method
Zhang, Jianguo
2013-01-01
Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important conifer species for timber production with huge distribution area in southern China. Accurate estimation of biomass is required for accounting and monitoring Chinese forest carbon stocking. In the study, allometric equation was used to analyze tree biomass of Chinese fir. The common methods for estimating allometric model have taken the classical approach based on the frequency interpretation of probability. However, many different biotic and abiotic factors introduce variability in Chinese fir biomass model, suggesting that parameters of biomass model are better represented by probability distributions rather than fixed values as classical method. To deal with the problem, Bayesian method was used for estimating Chinese fir biomass model. In the Bayesian framework, two priors were introduced: non-informative priors and informative priors. For informative priors, 32 biomass equations of Chinese fir were collected from published literature in the paper. The parameter distributions from published literature were regarded as prior distributions in Bayesian model for estimating Chinese fir biomass. Therefore, the Bayesian method with informative priors was better than non-informative priors and classical method, which provides a reasonable method for estimating Chinese fir biomass. PMID:24278198
Tree biomass estimation of Chinese fir (Cunninghamia lanceolata) based on Bayesian method.
Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo
2013-01-01
Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important conifer species for timber production with huge distribution area in southern China. Accurate estimation of biomass is required for accounting and monitoring Chinese forest carbon stocking. In the study, allometric equation W = a(D2H)b was used to analyze tree biomass of Chinese fir. The common methods for estimating allometric model have taken the classical approach based on the frequency interpretation of probability. However, many different biotic and abiotic factors introduce variability in Chinese fir biomass model, suggesting that parameters of biomass model are better represented by probability distributions rather than fixed values as classical method. To deal with the problem, Bayesian method was used for estimating Chinese fir biomass model. In the Bayesian framework, two priors were introduced: non-informative priors and informative priors. For informative priors, 32 biomass equations of Chinese fir were collected from published literature in the paper. The parameter distributions from published literature were regarded as prior distributions in Bayesian model for estimating Chinese fir biomass. Therefore, the Bayesian method with informative priors was better than non-informative priors and classical method, which provides a reasonable method for estimating Chinese fir biomass.
A Hierarchical Bayesian Model for Calibrating Estimates of Species Divergence Times
Heath, Tracy A.
2012-01-01
In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account. PMID:22334343
Entanglement-enhanced Neyman-Pearson target detection using quantum illumination
NASA Astrophysics Data System (ADS)
Zhuang, Quntao; Zhang, Zheshen; Shapiro, Jeffrey H.
2017-08-01
Quantum illumination (QI) provides entanglement-based target detection---in an entanglement-breaking environment---whose performance is significantly better than that of optimum classical-illumination target detection. QI's performance advantage was established in a Bayesian setting with the target presumed equally likely to be absent or present and error probability employed as the performance metric. Radar theory, however, eschews that Bayesian approach, preferring the Neyman-Pearson performance criterion to avoid the difficulties of accurately assigning prior probabilities to target absence and presence and appropriate costs to false-alarm and miss errors. We have recently reported an architecture---based on sum-frequency generation (SFG) and feedforward (FF) processing---for minimum error-probability QI target detection with arbitrary prior probabilities for target absence and presence. In this paper, we use our results for FF-SFG reception to determine the receiver operating characteristic---detection probability versus false-alarm probability---for optimum QI target detection under the Neyman-Pearson criterion.
Neelon, Brian; Chang, Howard H; Ling, Qiang; Hastings, Nicole S
2016-12-01
Motivated by a study exploring spatiotemporal trends in emergency department use, we develop a class of two-part hurdle models for the analysis of zero-inflated areal count data. The models consist of two components-one for the probability of any emergency department use and one for the number of emergency department visits given use. Through a hierarchical structure, the models incorporate both patient- and region-level predictors, as well as spatially and temporally correlated random effects for each model component. The random effects are assigned multivariate conditionally autoregressive priors, which induce dependence between the components and provide spatial and temporal smoothing across adjacent spatial units and time periods, resulting in improved inferences. To accommodate potential overdispersion, we consider a range of parametric specifications for the positive counts, including truncated negative binomial and generalized Poisson distributions. We adopt a Bayesian inferential approach, and posterior computation is handled conveniently within standard Bayesian software. Our results indicate that the negative binomial and generalized Poisson hurdle models vastly outperform the Poisson hurdle model, demonstrating that overdispersed hurdle models provide a useful approach to analyzing zero-inflated spatiotemporal data. © The Author(s) 2014.
Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno
2016-01-01
Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Evaluation of calibration efficacy under different levels of uncertainty
Heo, Yeonsook; Graziano, Diane J.; Guzowski, Leah; ...
2014-06-10
This study examines how calibration performs under different levels of uncertainty in model input data. It specifically assesses the efficacy of Bayesian calibration to enhance the reliability of EnergyPlus model predictions. A Bayesian approach can be used to update uncertain values of parameters, given measured energy-use data, and to quantify the associated uncertainty.We assess the efficacy of Bayesian calibration under a controlled virtual-reality setup, which enables rigorous validation of the accuracy of calibration results in terms of both calibrated parameter values and model predictions. Case studies demonstrate the performance of Bayesian calibration of base models developed from audit data withmore » differing levels of detail in building design, usage, and operation.« less
Archambeau, Cédric; Verleysen, Michel
2007-01-01
A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.
Efficient Probabilistic Diagnostics for Electrical Power Systems
NASA Technical Reports Server (NTRS)
Mengshoel, Ole J.; Chavira, Mark; Cascio, Keith; Poll, Scott; Darwiche, Adnan; Uckun, Serdar
2008-01-01
We consider in this work the probabilistic approach to model-based diagnosis when applied to electrical power systems (EPSs). Our probabilistic approach is formally well-founded, as it based on Bayesian networks and arithmetic circuits. We investigate the diagnostic task known as fault isolation, and pay special attention to meeting two of the main challenges . model development and real-time reasoning . often associated with real-world application of model-based diagnosis technologies. To address the challenge of model development, we develop a systematic approach to representing electrical power systems as Bayesian networks, supported by an easy-to-use speci.cation language. To address the real-time reasoning challenge, we compile Bayesian networks into arithmetic circuits. Arithmetic circuit evaluation supports real-time diagnosis by being predictable and fast. In essence, we introduce a high-level EPS speci.cation language from which Bayesian networks that can diagnose multiple simultaneous failures are auto-generated, and we illustrate the feasibility of using arithmetic circuits, compiled from Bayesian networks, for real-time diagnosis on real-world EPSs of interest to NASA. The experimental system is a real-world EPS, namely the Advanced Diagnostic and Prognostic Testbed (ADAPT) located at the NASA Ames Research Center. In experiments with the ADAPT Bayesian network, which currently contains 503 discrete nodes and 579 edges, we .nd high diagnostic accuracy in scenarios where one to three faults, both in components and sensors, were inserted. The time taken to compute the most probable explanation using arithmetic circuits has a small mean of 0.2625 milliseconds and standard deviation of 0.2028 milliseconds. In experiments with data from ADAPT we also show that arithmetic circuit evaluation substantially outperforms joint tree propagation and variable elimination, two alternative algorithms for diagnosis using Bayesian network inference.
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Calibrating Bayesian Network Representations of Social-Behavioral Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitney, Paul D.; Walsh, Stephen J.
2010-04-08
While human behavior has long been studied, recent and ongoing advances in computational modeling present opportunities for recasting research outcomes in human behavior. In this paper we describe how Bayesian networks can represent outcomes of human behavior research. We demonstrate a Bayesian network that represents political radicalization research – and show a corresponding visual representation of aspects of this research outcome. Since Bayesian networks can be quantitatively compared with external observations, the representation can also be used for empirical assessments of the research which the network summarizes. For a political radicalization model based on published research, we show this empiricalmore » comparison with data taken from the Minorities at Risk Organizational Behaviors database.« less
Costal vulnerability systems-network using Fuzzy and Bayesian approaches
NASA Astrophysics Data System (ADS)
Taramelli, A.; Valentini, E.; Filipponi, F.; Nguyen Xuan, A.; Arosio, M.
2016-12-01
Marine drivers such as surge in the context of SLR, are threatening low-lying coastal plains. In order to deal with disturbances a deeper understanding of benefits deriving from ecosystem services assesment, management and planning (e.g. the role of dune ridges in surge mitigation and climate adaptation) can enhance the resilience of coastal systems. In this frame assessing the vulnerability is a key concern of many SOS (social, ecological, institutional) that deals with several challenges like the definition of Essential Variables (EVs) able to synthesize the required information, the assignment of different weight to be attributed to each considered variable, the selection of method for combining the relevant variables, etc.. To this end it is unclear how SLR, subsidence and erosion might affect coastal subsistence resources because of highly complex interactions and because of the subjective system of weighting many variables and their interaction within the systems. In this contribution, making the best use of many EO products, in situ data and modelling, we propose a multidimensional surge vulnerability assessment that aims at combining together geophysical and socioeconomic variable on the base of different approaches: 1) Fuzzy Logic; 2) Bayesian approach. The final goal is providing insight in understanding how to quantify regulating ecosystem services.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
Using Bayesian Networks for Candidate Generation in Consistency-based Diagnosis
NASA Technical Reports Server (NTRS)
Narasimhan, Sriram; Mengshoel, Ole
2008-01-01
Consistency-based diagnosis relies heavily on the assumption that discrepancies between model predictions and sensor observations can be detected accurately. When sources of uncertainty like sensor noise and model abstraction exist robust schemes have to be designed to make a binary decision on whether predictions are consistent with observations. This risks the occurrence of false alarms and missed alarms when an erroneous decision is made. Moreover when multiple sensors (with differing sensing properties) are available the degree of match between predictions and observations can be used to guide the search for fault candidates. In this paper we propose a novel approach to handle this problem using Bayesian networks. In the consistency- based diagnosis formulation, automatically generated Bayesian networks are used to encode a probabilistic measure of fit between predictions and observations. A Bayesian network inference algorithm is used to compute most probable fault candidates.
Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen
2017-09-25
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
A Bayesian Multi-Level Factor Analytic Model of Consumer Price Sensitivities across Categories
ERIC Educational Resources Information Center
Duvvuri, Sri Devi; Gruca, Thomas S.
2010-01-01
Identifying price sensitive consumers is an important problem in marketing. We develop a Bayesian multi-level factor analytic model of the covariation among household-level price sensitivities across product categories that are substitutes. Based on a multivariate probit model of category incidence, this framework also allows the researcher to…
Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models
ERIC Educational Resources Information Center
Price, Larry R.
2012-01-01
The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…
A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging
Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu
2017-01-01
This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583
Incorporating approximation error in surrogate based Bayesian inversion
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.; Li, W.; Wu, L.
2015-12-01
There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.
Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan
2009-01-01
The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397
Protein construct storage: Bayesian variable selection and prediction with mixtures.
Clyde, M A; Parmigiani, G
1998-07-01
Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.
Bayesian estimation inherent in a Mexican-hat-type neural network
NASA Astrophysics Data System (ADS)
Takiyama, Ken
2016-05-01
Brain functions, such as perception, motor control and learning, and decision making, have been explained based on a Bayesian framework, i.e., to decrease the effects of noise inherent in the human nervous system or external environment, our brain integrates sensory and a priori information in a Bayesian optimal manner. However, it remains unclear how Bayesian computations are implemented in the brain. Herein, I address this issue by analyzing a Mexican-hat-type neural network, which was used as a model of the visual cortex, motor cortex, and prefrontal cortex. I analytically demonstrate that the dynamics of an order parameter in the model corresponds exactly to a variational inference of a linear Gaussian state-space model, a Bayesian estimation, when the strength of recurrent synaptic connectivity is appropriately stronger than that of an external stimulus, a plausible condition in the brain. This exact correspondence can reveal the relationship between the parameters in the Bayesian estimation and those in the neural network, providing insight for understanding brain functions.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
Hippert, Henrique S; Taylor, James W
2010-04-01
Artificial neural networks have frequently been proposed for electricity load forecasting because of their capabilities for the nonlinear modelling of large multivariate data sets. Modelling with neural networks is not an easy task though; two of the main challenges are defining the appropriate level of model complexity, and choosing the input variables. This paper evaluates techniques for automatic neural network modelling within a Bayesian framework, as applied to six samples containing daily load and weather data for four different countries. We analyse input selection as carried out by the Bayesian 'automatic relevance determination', and the usefulness of the Bayesian 'evidence' for the selection of the best structure (in terms of number of neurones), as compared to methods based on cross-validation. Copyright 2009 Elsevier Ltd. All rights reserved.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.
Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu
2018-05-01
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Kimberley K. Ayre; Wayne G. Landis
2012-01-01
We present a Bayesian network model based on the ecological risk assessment framework to evaluate potential impacts to habitats and resources resulting from wildfire, grazing, forest management activities, and insect outbreaks in a forested landscape in northeastern Oregon. The Bayesian network structure consisted of three tiers of nodes: landscape disturbances,...
Hierarchical Bayesian Models of Subtask Learning
ERIC Educational Resources Information Center
Anglim, Jeromy; Wynton, Sarah K. A.
2015-01-01
The current study used Bayesian hierarchical methods to challenge and extend previous work on subtask learning consistency. A general model of individual-level subtask learning was proposed focusing on power and exponential functions with constraints to test for inconsistency. To study subtask learning, we developed a novel computer-based booking…
CDMBE: A Case Description Model Based on Evidence
Zhu, Jianlin; Yang, Xiaoping; Zhou, Jing
2015-01-01
By combining the advantages of argument map and Bayesian network, a case description model based on evidence (CDMBE), which is suitable to continental law system, is proposed to describe the criminal cases. The logic of the model adopts the credibility logical reason and gets evidence-based reasoning quantitatively based on evidences. In order to consist with practical inference rules, five types of relationship and a set of rules are defined to calculate the credibility of assumptions based on the credibility and supportability of the related evidences. Experiments show that the model can get users' ideas into a figure and the results calculated from CDMBE are in line with those from Bayesian model. PMID:26421006
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
A Gaussian random field model for similarity-based smoothing in Bayesian disease mapping.
Baptista, Helena; Mendes, Jorge M; MacNab, Ying C; Xavier, Miguel; Caldas-de-Almeida, José
2016-08-01
Conditionally specified Gaussian Markov random field (GMRF) models with adjacency-based neighbourhood weight matrix, commonly known as neighbourhood-based GMRF models, have been the mainstream approach to spatial smoothing in Bayesian disease mapping. In the present paper, we propose a conditionally specified Gaussian random field (GRF) model with a similarity-based non-spatial weight matrix to facilitate non-spatial smoothing in Bayesian disease mapping. The model, named similarity-based GRF, is motivated for modelling disease mapping data in situations where the underlying small area relative risks and the associated determinant factors do not vary systematically in space, and the similarity is defined by "similarity" with respect to the associated disease determinant factors. The neighbourhood-based GMRF and the similarity-based GRF are compared and accessed via a simulation study and by two case studies, using new data on alcohol abuse in Portugal collected by the World Mental Health Survey Initiative and the well-known lip cancer data in Scotland. In the presence of disease data with no evidence of positive spatial correlation, the simulation study showed a consistent gain in efficiency from the similarity-based GRF, compared with the adjacency-based GMRF with the determinant risk factors as covariate. This new approach broadens the scope of the existing conditional autocorrelation models. © The Author(s) 2016.
NASA Astrophysics Data System (ADS)
Kim, Seongryong; Tkalčić, Hrvoje; Mustać, Marija; Rhie, Junkee; Ford, Sean
2016-04-01
A framework is presented within which we provide rigorous estimations for seismic sources and structures in the Northeast Asia. We use Bayesian inversion methods, which enable statistical estimations of models and their uncertainties based on data information. Ambiguities in error statistics and model parameterizations are addressed by hierarchical and trans-dimensional (trans-D) techniques, which can be inherently implemented in the Bayesian inversions. Hence reliable estimation of model parameters and their uncertainties is possible, thus avoiding arbitrary regularizations and parameterizations. Hierarchical and trans-D inversions are performed to develop a three-dimensional velocity model using ambient noise data. To further improve the model, we perform joint inversions with receiver function data using a newly developed Bayesian method. For the source estimation, a novel moment tensor inversion method is presented and applied to regional waveform data of the North Korean nuclear explosion tests. By the combination of new Bayesian techniques and the structural model, coupled with meaningful uncertainties related to each of the processes, more quantitative monitoring and discrimination of seismic events is possible.
ERIC Educational Resources Information Center
Vos, Hans J.
An approach to simultaneous optimization of assignments of subjects to treatments followed by an end-of-mastery test is presented using the framework of Bayesian decision theory. Focus is on demonstrating how rules for the simultaneous optimization of sequences of decisions can be found. The main advantages of the simultaneous approach, compared…
Yu, Rongjie; Abdel-Aty, Mohamed
2013-07-01
The Bayesian inference method has been frequently adopted to develop safety performance functions. One advantage of the Bayesian inference is that prior information for the independent variables can be included in the inference procedures. However, there are few studies that discussed how to formulate informative priors for the independent variables and evaluated the effects of incorporating informative priors in developing safety performance functions. This paper addresses this deficiency by introducing four approaches of developing informative priors for the independent variables based on historical data and expert experience. Merits of these informative priors have been tested along with two types of Bayesian hierarchical models (Poisson-gamma and Poisson-lognormal models). Deviance information criterion (DIC), R-square values, and coefficients of variance for the estimations were utilized as evaluation measures to select the best model(s). Comparison across the models indicated that the Poisson-gamma model is superior with a better model fit and it is much more robust with the informative priors. Moreover, the two-stage Bayesian updating informative priors provided the best goodness-of-fit and coefficient estimation accuracies. Furthermore, informative priors for the inverse dispersion parameter have also been introduced and tested. Different types of informative priors' effects on the model estimations and goodness-of-fit have been compared and concluded. Finally, based on the results, recommendations for future research topics and study applications have been made. Copyright © 2013 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao
Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less
NASA Astrophysics Data System (ADS)
Melendez, Jordan; Wesolowski, Sarah; Furnstahl, Dick
2017-09-01
Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. A Bayesian model yields posterior probability distribution functions for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. As a general example of a statistical approach to truncation errors, the model was applied to chiral EFT for neutron-proton scattering using various semi-local potentials of Epelbaum, Krebs, and Meißner (EKM). Here we discuss how our model can learn correlation information from the data and how to perform Bayesian model checking to validate that the EFT is working as advertised. Supported in part by NSF PHY-1614460 and DOE NUCLEI SciDAC DE-SC0008533.
Automated side-chain model building and sequence assignment by template matching.
Terwilliger, Thomas C
2003-01-01
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field
NASA Astrophysics Data System (ADS)
Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen
2017-10-01
Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.
A Dynamic Bayesian Network Based Structural Learning towards Automated Handwritten Digit Recognition
NASA Astrophysics Data System (ADS)
Pauplin, Olivier; Jiang, Jianmin
Pattern recognition using Dynamic Bayesian Networks (DBNs) is currently a growing area of study. In this paper, we present DBN models trained for classification of handwritten digit characters. The structure of these models is partly inferred from the training data of each class of digit before performing parameter learning. Classification results are presented for the four described models.
Hosseini, Marjan; Kerachian, Reza
2017-09-01
This paper presents a new methodology for analyzing the spatiotemporal variability of water table levels and redesigning a groundwater level monitoring network (GLMN) using the Bayesian Maximum Entropy (BME) technique and a multi-criteria decision-making approach based on ordered weighted averaging (OWA). The spatial sampling is determined using a hexagonal gridding pattern and a new method, which is proposed to assign a removal priority number to each pre-existing station. To design temporal sampling, a new approach is also applied to consider uncertainty caused by lack of information. In this approach, different time lag values are tested by regarding another source of information, which is simulation result of a numerical groundwater flow model. Furthermore, to incorporate the existing uncertainties in available monitoring data, the flexibility of the BME interpolation technique is taken into account in applying soft data and improving the accuracy of the calculations. To examine the methodology, it is applied to the Dehgolan plain in northwestern Iran. Based on the results, a configuration of 33 monitoring stations for a regular hexagonal grid of side length 3600 m is proposed, in which the time lag between samples is equal to 5 weeks. Since the variance estimation errors of the BME method are almost identical for redesigned and existing networks, the redesigned monitoring network is more cost-effective and efficient than the existing monitoring network with 52 stations and monthly sampling frequency.
Technical note: Bayesian calibration of dynamic ruminant nutrition models.
Reed, K F; Arhonditsis, G B; France, J; Kebreab, E
2016-08-01
Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Wang, Shi-Heng; Chen, Wei J; Tsai, Yu-Chin; Huang, Yung-Hsiang; Hwu, Hai-Gwo; Hsiao, Chuhsing K
2013-01-01
The copy number variation (CNV) is a type of genetic variation in the genome. It is measured based on signal intensity measures and can be assessed repeatedly to reduce the uncertainty in PCR-based typing. Studies have shown that CNVs may lead to phenotypic variation and modification of disease expression. Various challenges exist, however, in the exploration of CNV-disease association. Here we construct latent variables to infer the discrete CNV values and to estimate the probability of mutations. In addition, we propose to pool rare variants to increase the statistical power and we conduct family studies to mitigate the computational burden in determining the composition of CNVs on each chromosome. To explore in a stochastic sense the association between the collapsing CNV variants and disease status, we utilize a Bayesian hierarchical model incorporating the mutation parameters. This model assigns integers in a probabilistic sense to the quantitatively measured copy numbers, and is able to test simultaneously the association for all variants of interest in a regression framework. This integrative model can account for the uncertainty in copy number assignment and differentiate if the variation was de novo or inherited on the basis of posterior probabilities. For family studies, this model can accommodate the dependence within family members and among repeated CNV data. Moreover, the Mendelian rule can be assumed under this model and yet the genetic variation, including de novo and inherited variation, can still be included and quantified directly for each individual. Finally, simulation studies show that this model has high true positive and low false positive rates in the detection of de novo mutation.
Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2014-02-01
Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.
Bayesian outcome-based strategy classification.
Lee, Michael D
2016-03-01
Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014) recently developed a method for making inferences about the decision processes people use in multi-attribute forced choice tasks. Their paper makes a number of worthwhile theoretical and methodological contributions. Theoretically, they provide an insightful psychological motivation for a probabilistic extension of the widely-used "weighted additive" (WADD) model, and show how this model, as well as other important models like "take-the-best" (TTB), can and should be expressed in terms of meaningful priors. Methodologically, they develop an inference approach based on the Minimum Description Length (MDL) principles that balances both the goodness-of-fit and complexity of the decision models they consider. This paper aims to preserve these useful contributions, but provide a complementary Bayesian approach with some theoretical and methodological advantages. We develop a simple graphical model, implemented in JAGS, that allows for fully Bayesian inferences about which models people use to make decisions. To demonstrate the Bayesian approach, we apply it to the models and data considered by Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014), showing how a prior predictive analysis of the models, and posterior inferences about which models people use and the parameter settings at which they use them, can contribute to our understanding of human decision making.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory
ERIC Educational Resources Information Center
Muthen, Bengt; Asparouhov, Tihomir
2012-01-01
This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed…
Lu, Dan; Ye, Ming; Curtis, Gary P.
2015-08-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. Our study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict themore » reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. Moreover, these reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Finally, limitations of applying MLBMA to the synthetic study and future real-world modeling are discussed.« less
Sequential Inverse Problems Bayesian Principles and the Logistic Map Example
NASA Astrophysics Data System (ADS)
Duan, Lian; Farmer, Chris L.; Moroz, Irene M.
2010-09-01
Bayesian statistics provides a general framework for solving inverse problems, but is not without interpretation and implementation problems. This paper discusses difficulties arising from the fact that forward models are always in error to some extent. Using a simple example based on the one-dimensional logistic map, we argue that, when implementation problems are minimal, the Bayesian framework is quite adequate. In this paper the Bayesian Filter is shown to be able to recover excellent state estimates in the perfect model scenario (PMS) and to distinguish the PMS from the imperfect model scenario (IMS). Through a quantitative comparison of the way in which the observations are assimilated in both the PMS and the IMS scenarios, we suggest that one can, sometimes, measure the degree of imperfection.
Bayesian module identification from multiple noisy networks.
Zamani Dadaneh, Siamak; Qian, Xiaoning
2016-12-01
Module identification has been studied extensively in order to gain deeper understanding of complex systems, such as social networks as well as biological networks. Modules are often defined as groups of vertices in these networks that are topologically cohesive with similar interaction patterns with the rest of the vertices. Most of the existing module identification algorithms assume that the given networks are faithfully measured without errors. However, in many real-world applications, for example, when analyzing protein-protein interaction networks from high-throughput profiling techniques, there is significant noise with both false positive and missing links between vertices. In this paper, we propose a new model for more robust module identification by taking advantage of multiple observed networks with significant noise so that signals in multiple networks can be strengthened and help improve the solution quality by combining information from various sources. We adopt a hierarchical Bayesian model to integrate multiple noisy snapshots that capture the underlying modular structure of the networks under study. By introducing a latent root assignment matrix and its relations to instantaneous module assignments in all the observed networks to capture the underlying modular structure and combine information across multiple networks, an efficient variational Bayes algorithm can be derived to accurately and robustly identify the underlying modules from multiple noisy networks. Experiments on synthetic and protein-protein interaction data sets show that our proposed model enhances both the accuracy and resolution in detecting cohesive modules, and it is less vulnerable to noise in the observed data. In addition, it shows higher power in predicting missing edges compared to individual-network methods.
Hierarchy Bayesian model based services awareness of high-speed optical access networks
NASA Astrophysics Data System (ADS)
Bai, Hui-feng
2018-03-01
As the speed of optical access networks soars with ever increasing multiple services, the service-supporting ability of optical access networks suffers greatly from the shortage of service awareness. Aiming to solve this problem, a hierarchy Bayesian model based services awareness mechanism is proposed for high-speed optical access networks. This approach builds a so-called hierarchy Bayesian model, according to the structure of typical optical access networks. Moreover, the proposed scheme is able to conduct simple services awareness operation in each optical network unit (ONU) and to perform complex services awareness from the whole view of system in optical line terminal (OLT). Simulation results show that the proposed scheme is able to achieve better quality of services (QoS), in terms of packet loss rate and time delay.
Gilet, Estelle; Diard, Julien; Bessière, Pierre
2011-01-01
In this paper, we study the collaboration of perception and action representations involved in cursive letter recognition and production. We propose a mathematical formulation for the whole perception–action loop, based on probabilistic modeling and Bayesian inference, which we call the Bayesian Action–Perception (BAP) model. Being a model of both perception and action processes, the purpose of this model is to study the interaction of these processes. More precisely, the model includes a feedback loop from motor production, which implements an internal simulation of movement. Motor knowledge can therefore be involved during perception tasks. In this paper, we formally define the BAP model and show how it solves the following six varied cognitive tasks using Bayesian inference: i) letter recognition (purely sensory), ii) writer recognition, iii) letter production (with different effectors), iv) copying of trajectories, v) copying of letters, and vi) letter recognition (with internal simulation of movements). We present computer simulations of each of these cognitive tasks, and discuss experimental predictions and theoretical developments. PMID:21674043
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.
Duarte, Belmiro P M; Wong, Weng Kee
2015-08-01
This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach
Duarte, Belmiro P. M.; Wong, Weng Kee
2014-01-01
Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159
Storey, Rebecca
2007-01-01
Comparison of different adult age estimation methods on the same skeletal sample with unknown ages could forward paleodemographic inference, while researchers sort out various controversies. The original aging method for the auricular surface (Lovejoy et al., 1985a) assigned an age estimation based on several separate characteristics. Researchers have found this original method hard to apply. It is usually forgotten that before assigning an age, there was a seriation, an ordering of all available individuals from youngest to oldest. Thus, age estimation reflected the place of an individual within its sample. A recent article (Buckberry and Chamberlain, 2002) proposed a revised method that scores theses various characteristics into age stages, which can then be used with a Bayesian method to estimate an adult age distribution for the sample. Both methods were applied to the adult auricular surfaces of a Pre-Columbian Maya skeletal population from Copan, Honduras and resulted in age distributions with significant numbers of older adults. However, contrary to the usual paleodemographic distribution, one Bayesian estimation based on uniform prior probabilities yielded a population with 57% of the ages at death over 65, while another based on a high mortality life table still had 12% of the individuals aged over 75 years. The seriation method yielded an age distribution more similar to that known from preindustrial historical situations, without excessive longevity of adults. Paleodemography must still wrestle with its elusive goal of accurate adult age estimation from skeletons, a necessary base for demographic study of past populations. (c) 2006 Wiley-Liss, Inc
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
ERIC Educational Resources Information Center
Zwick, Rebecca; Lenaburg, Lubella
2009-01-01
In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…
Incorporating Resilience into Dynamic Social Models
2016-07-20
solved by simply using the information provided by the scenario. Instead, additional knowledge is required from relevant fields that study these...resilience function by leveraging Bayesian Knowledge Bases (BKBs), a probabilistic reasoning network framework[5],[6]. BKBs allow for inferencing...reasoning network framework based on Bayesian Knowledge Bases (BKBs). BKBs are central to our social resilience framework as they are used to
Laminar fMRI and computational theories of brain function.
Stephan, K E; Petzschner, F H; Kasper, L; Bayer, J; Wellstein, K V; Stefanics, G; Pruessmann, K P; Heinzle, J
2017-11-02
Recently developed methods for functional MRI at the resolution of cortical layers (laminar fMRI) offer a novel window into neurophysiological mechanisms of cortical activity. Beyond physiology, laminar fMRI also offers an unprecedented opportunity to test influential theories of brain function. Specifically, hierarchical Bayesian theories of brain function, such as predictive coding, assign specific computational roles to different cortical layers. Combined with computational models, laminar fMRI offers a unique opportunity to test these proposals noninvasively in humans. This review provides a brief overview of predictive coding and related hierarchical Bayesian theories, summarises their predictions with regard to layered cortical computations, examines how these predictions could be tested by laminar fMRI, and considers methodological challenges. We conclude by discussing the potential of laminar fMRI for clinically useful computational assays of layer-specific information processing. Copyright © 2017 Elsevier Inc. All rights reserved.
Valence-Dependent Belief Updating: Computational Validation
Kuzmanovic, Bojana; Rigoux, Lionel
2017-01-01
People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments. PMID:28706499
Valence-Dependent Belief Updating: Computational Validation.
Kuzmanovic, Bojana; Rigoux, Lionel
2017-01-01
People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments.
Convergence analysis of surrogate-based methods for Bayesian inverse problems
NASA Astrophysics Data System (ADS)
Yan, Liang; Zhang, Yuan-Xiang
2017-12-01
The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.
Online Variational Bayesian Filtering-Based Mobile Target Tracking in Wireless Sensor Networks
Zhou, Bingpeng; Chen, Qingchun; Li, Tiffany Jing; Xiao, Pei
2014-01-01
The received signal strength (RSS)-based online tracking for a mobile node in wireless sensor networks (WSNs) is investigated in this paper. Firstly, a multi-layer dynamic Bayesian network (MDBN) is introduced to characterize the target mobility with either directional or undirected movement. In particular, it is proposed to employ the Wishart distribution to approximate the time-varying RSS measurement precision's randomness due to the target movement. It is shown that the proposed MDBN offers a more general analysis model via incorporating the underlying statistical information of both the target movement and observations, which can be utilized to improve the online tracking capability by exploiting the Bayesian statistics. Secondly, based on the MDBN model, a mean-field variational Bayesian filtering (VBF) algorithm is developed to realize the online tracking of a mobile target in the presence of nonlinear observations and time-varying RSS precision, wherein the traditional Bayesian filtering scheme cannot be directly employed. Thirdly, a joint optimization between the real-time velocity and its prior expectation is proposed to enable online velocity tracking in the proposed online tacking scheme. Finally, the associated Bayesian Cramer–Rao Lower Bound (BCRLB) analysis and numerical simulations are conducted. Our analysis unveils that, by exploiting the potential state information via the general MDBN model, the proposed VBF algorithm provides a promising solution to the online tracking of a mobile node in WSNs. In addition, it is shown that the final tracking accuracy linearly scales with its expectation when the RSS measurement precision is time-varying. PMID:25393784
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2013-08-01
Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
A Bayesian Approach for Summarizing and Modeling Time-Series Exposure Data with Left Censoring.
Houseman, E Andres; Virji, M Abbas
2017-08-01
Direct reading instruments are valuable tools for measuring exposure as they provide real-time measurements for rapid decision making. However, their use is limited to general survey applications in part due to issues related to their performance. Moreover, statistical analysis of real-time data is complicated by autocorrelation among successive measurements, non-stationary time series, and the presence of left-censoring due to limit-of-detection (LOD). A Bayesian framework is proposed that accounts for non-stationary autocorrelation and LOD issues in exposure time-series data in order to model workplace factors that affect exposure and estimate summary statistics for tasks or other covariates of interest. A spline-based approach is used to model non-stationary autocorrelation with relatively few assumptions about autocorrelation structure. Left-censoring is addressed by integrating over the left tail of the distribution. The model is fit using Markov-Chain Monte Carlo within a Bayesian paradigm. The method can flexibly account for hierarchical relationships, random effects and fixed effects of covariates. The method is implemented using the rjags package in R, and is illustrated by applying it to real-time exposure data. Estimates for task means and covariates from the Bayesian model are compared to those from conventional frequentist models including linear regression, mixed-effects, and time-series models with different autocorrelation structures. Simulations studies are also conducted to evaluate method performance. Simulation studies with percent of measurements below the LOD ranging from 0 to 50% showed lowest root mean squared errors for task means and the least biased standard deviations from the Bayesian model compared to the frequentist models across all levels of LOD. In the application, task means from the Bayesian model were similar to means from the frequentist models, while the standard deviations were different. Parameter estimates for covariates were significant in some frequentist models, but in the Bayesian model their credible intervals contained zero; such discrepancies were observed in multiple datasets. Variance components from the Bayesian model reflected substantial autocorrelation, consistent with the frequentist models, except for the auto-regressive moving average model. Plots of means from the Bayesian model showed good fit to the observed data. The proposed Bayesian model provides an approach for modeling non-stationary autocorrelation in a hierarchical modeling framework to estimate task means, standard deviations, quantiles, and parameter estimates for covariates that are less biased and have better performance characteristics than some of the contemporary methods. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.
SIG-VISA: Signal-based Vertically Integrated Seismic Monitoring
NASA Astrophysics Data System (ADS)
Moore, D.; Mayeda, K. M.; Myers, S. C.; Russell, S.
2013-12-01
Traditional seismic monitoring systems rely on discrete detections produced by station processing software; however, while such detections may constitute a useful summary of station activity, they discard large amounts of information present in the original recorded signal. We present SIG-VISA (Signal-based Vertically Integrated Seismic Analysis), a system for seismic monitoring through Bayesian inference on seismic signals. By directly modeling the recorded signal, our approach incorporates additional information unavailable to detection-based methods, enabling higher sensitivity and more accurate localization using techniques such as waveform matching. SIG-VISA's Bayesian forward model of seismic signal envelopes includes physically-derived models of travel times and source characteristics as well as Gaussian process (kriging) statistical models of signal properties that combine interpolation of historical data with extrapolation of learned physical trends. Applying Bayesian inference, we evaluate the model on earthquakes as well as the 2009 DPRK test event, demonstrating a waveform matching effect as part of the probabilistic inference, along with results on event localization and sensitivity. In particular, we demonstrate increased sensitivity from signal-based modeling, in which the SIGVISA signal model finds statistical evidence for arrivals even at stations for which the IMS station processing failed to register any detection.
NASA Astrophysics Data System (ADS)
He, Wei; Williard, Nicholas; Osterman, Michael; Pecht, Michael
A new method for state of health (SOH) and remaining useful life (RUL) estimations for lithium-ion batteries using Dempster-Shafer theory (DST) and the Bayesian Monte Carlo (BMC) method is proposed. In this work, an empirical model based on the physical degradation behavior of lithium-ion batteries is developed. Model parameters are initialized by combining sets of training data based on DST. BMC is then used to update the model parameters and predict the RUL based on available data through battery capacity monitoring. As more data become available, the accuracy of the model in predicting RUL improves. Two case studies demonstrating this approach are presented.
Bayesian B-spline mapping for dynamic quantitative traits.
Xing, Jun; Li, Jiahan; Yang, Runqing; Zhou, Xiaojing; Xu, Shizhong
2012-04-01
Owing to their ability and flexibility to describe individual gene expression at different time points, random regression (RR) analyses have become a popular procedure for the genetic analysis of dynamic traits whose phenotypes are collected over time. Specifically, when modelling the dynamic patterns of gene expressions in the RR framework, B-splines have been proved successful as an alternative to orthogonal polynomials. In the so-called Bayesian B-spline quantitative trait locus (QTL) mapping, B-splines are used to characterize the patterns of QTL effects and individual-specific time-dependent environmental errors over time, and the Bayesian shrinkage estimation method is employed to estimate model parameters. Extensive simulations demonstrate that (1) in terms of statistical power, Bayesian B-spline mapping outperforms the interval mapping based on the maximum likelihood; (2) for the simulated dataset with complicated growth curve simulated by B-splines, Legendre polynomial-based Bayesian mapping is not capable of identifying the designed QTLs accurately, even when higher-order Legendre polynomials are considered and (3) for the simulated dataset using Legendre polynomials, the Bayesian B-spline mapping can find the same QTLs as those identified by Legendre polynomial analysis. All simulation results support the necessity and flexibility of B-spline in Bayesian mapping of dynamic traits. The proposed method is also applied to a real dataset, where QTLs controlling the growth trajectory of stem diameters in Populus are located.
Analyzing chromatographic data using multilevel modeling.
Wiczling, Paweł
2018-06-01
It is relatively easy to collect chromatographic measurements for a large number of analytes, especially with gradient chromatographic methods coupled with mass spectrometry detection. Such data often have a hierarchical or clustered structure. For example, analytes with similar hydrophobicity and dissociation constant tend to be more alike in their retention than a randomly chosen set of analytes. Multilevel models recognize the existence of such data structures by assigning a model for each parameter, with its parameters also estimated from data. In this work, a multilevel model is proposed to describe retention time data obtained from a series of wide linear organic modifier gradients of different gradient duration and different mobile phase pH for a large set of acids and bases. The multilevel model consists of (1) the same deterministic equation describing the relationship between retention time and analyte-specific and instrument-specific parameters, (2) covariance relationships relating various physicochemical properties of the analyte to chromatographically specific parameters through quantitative structure-retention relationship based equations, and (3) stochastic components of intra-analyte and interanalyte variability. The model was implemented in Stan, which provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods. Graphical abstract Relationships between log k and MeOH content for acidic, basic, and neutral compounds with different log P. CI credible interval, PSA polar surface area.
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
Genomic Model with Correlation Between Additive and Dominance Effects.
Xiang, Tao; Christensen, Ole Fredslund; Vitezica, Zulma Gladis; Legarra, Andres
2018-05-09
Dominance genetic effects are rarely included in pedigree-based genetic evaluation. With the availability of single nucleotide polymorphism markers and the development of genomic evaluation, estimates of dominance genetic effects have become feasible using genomic best linear unbiased prediction (GBLUP). Usually, studies involving additive and dominance genetic effects ignore possible relationships between them. It has been often suggested that the magnitude of functional additive and dominance effects at the quantitative trait loci are related, but there is no existing GBLUP-like approach accounting for such correlation. Wellmann and Bennewitz showed two ways of considering directional relationships between additive and dominance effects, which they estimated in a Bayesian framework. However, these relationships cannot be fitted at the level of individuals instead of loci in a mixed model and are not compatible with standard animal or plant breeding software. This comes from a fundamental ambiguity in assigning the reference allele at a given locus. We show that, if there has been selection, assigning the most frequent as the reference allele orients the correlation between functional additive and dominance effects. As a consequence, the most frequent reference allele is expected to have a positive value. We also demonstrate that selection creates negative covariance between genotypic additive and dominance genetic values. For parameter estimation, it is possible to use a combined additive and dominance relationship matrix computed from marker genotypes, and to use standard restricted maximum likelihood (REML) algorithms based on an equivalent model. Through a simulation study, we show that such correlations can easily be estimated by mixed model software and accuracy of prediction for genetic values is slightly improved if such correlations are used in GBLUP. However, a model assuming uncorrelated effects and fitting orthogonal breeding values and dominant deviations performed similarly for prediction. Copyright © 2018, Genetics.
A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods
ERIC Educational Resources Information Center
Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich
2013-01-01
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…
ERIC Educational Resources Information Center
Doskey, Steven Craig
2014-01-01
This research presents an innovative means of gauging Systems Engineering effectiveness through a Systems Engineering Relative Effectiveness Index (SE REI) model. The SE REI model uses a Bayesian Belief Network to map causal relationships in government acquisitions of Complex Information Systems (CIS), enabling practitioners to identify and…
Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.
2009-01-01
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
Application of a predictive Bayesian model to environmental accounting.
Anex, R P; Englehardt, J D
2001-03-30
Environmental accounting techniques are intended to capture important environmental costs and benefits that are often overlooked in standard accounting practices. Environmental accounting methods themselves often ignore or inadequately represent large but highly uncertain environmental costs and costs conditioned by specific prior events. Use of a predictive Bayesian model is demonstrated for the assessment of such highly uncertain environmental and contingent costs. The predictive Bayesian approach presented generates probability distributions for the quantity of interest (rather than parameters thereof). A spreadsheet implementation of a previously proposed predictive Bayesian model, extended to represent contingent costs, is described and used to evaluate whether a firm should undertake an accelerated phase-out of its PCB containing transformers. Variability and uncertainty (due to lack of information) in transformer accident frequency and severity are assessed simultaneously using a combination of historical accident data, engineering model-based cost estimates, and subjective judgement. Model results are compared using several different risk measures. Use of the model for incorporation of environmental risk management into a company's overall risk management strategy is discussed.
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Universal Darwinism As a Process of Bayesian Inference
Campbell, John O.
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an “experiment” in the external world environment, and the results of that “experiment” or the “surprise” entailed by predicted and actual outcomes of the “experiment.” Minimization of free energy implies that the implicit measure of “surprise” experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Ren, J; Jenkinson, I; Wang, J; Xu, D L; Yang, J B
2008-01-01
Focusing on people and organizations, this paper aims to contribute to offshore safety assessment by proposing a methodology to model causal relationships. The methodology is proposed in a general sense that it will be capable of accommodating modeling of multiple risk factors considered in offshore operations and will have the ability to deal with different types of data that may come from different resources. Reason's "Swiss cheese" model is used to form a generic offshore safety assessment framework, and Bayesian Network (BN) is tailored to fit into the framework to construct a causal relationship model. The proposed framework uses a five-level-structure model to address latent failures within the causal sequence of events. The five levels include Root causes level, Trigger events level, Incidents level, Accidents level, and Consequences level. To analyze and model a specified offshore installation safety, a BN model was established following the guideline of the proposed five-level framework. A range of events was specified, and the related prior and conditional probabilities regarding the BN model were assigned based on the inherent characteristics of each event. This paper shows that Reason's "Swiss cheese" model and BN can be jointly used in offshore safety assessment. On the one hand, the five-level conceptual model is enhanced by BNs that are capable of providing graphical demonstration of inter-relationships as well as calculating numerical values of occurrence likelihood for each failure event. Bayesian inference mechanism also makes it possible to monitor how a safety situation changes when information flow travel forwards and backwards within the networks. On the other hand, BN modeling relies heavily on experts' personal experiences and is therefore highly domain specific. "Swiss cheese" model is such a theoretic framework that it is based on solid behavioral theory and therefore can be used to provide industry with a roadmap for BN modeling and implications. A case study of the collision risk between a Floating Production, Storage and Offloading (FPSO) unit and authorized vessels caused by human and organizational factors (HOFs) during operations is used to illustrate an industrial application of the proposed methodology.
Laws of cognition and the cognition of law.
Kahan, Dan M
2015-02-01
This paper presents a compact synthesis of the study of cognition in legal decisionmaking. Featured dynamics include the story-telling model (Pennington & Hastie, 1986), lay prototypes (Smith, 1991), motivated cognition (Sood, 2012), and coherence-based reasoning (Simon, Pham, Le, & Holyoak, 2001). Unlike biases and heuristics understood to bound or constrain rationality, these dynamics identify how information shapes a variety of cognitive inputs-from prior beliefs to perceptions of events to the probative weight assigned new information-that rational decisionmaking presupposes. The operation of these mechanisms can be shown to radically alter the significance that jurors give to evidence, and hence the conclusions they reach, within a Bayesian framework of information processing. How these dynamics interact with the professional judgment of lawyers and judges, the paper notes, remains in need of investigation. Copyright © 2014 Elsevier B.V. All rights reserved.
A Bayesian sequential processor approach to spectroscopic portal system decisions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sale, K; Candy, J; Breitfeller, E
The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waitingmore » for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.« less
Engelhardt, Benjamin; Kschischo, Maik; Fröhlich, Holger
2017-06-01
Ordinary differential equations (ODEs) are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted. Wrongly modelled biological mechanisms as well as relevant external influence factors that are not included into the model are likely to manifest in major discrepancies between model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue, we suggest a Bayesian approach to estimate hidden influences in ODE-based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specified as well as missed molecular interactions in the model. We demonstrate the performance of our Bayesian dynamic elastic-net with several ordinary differential equation models from the literature, such as human JAK-STAT signalling, information processing at the erythropoietin receptor, isomerization of liquid α -Pinene, G protein cycling in yeast and UV-B triggered signalling in plants. Moreover, we investigate a set of commonly known network motifs and a gene-regulatory network. Altogether our method supports the modeller in an algorithmic manner to identify possible sources of errors in ODE-based models on the basis of experimental data. © 2017 The Author(s).
Impact assessment of extreme storm events using a Bayesian network
den Heijer, C.(Kees); Knipping, Dirk T.J.A.; Plant, Nathaniel G.; van Thiel de Vries, Jaap S. M.; Baart, Fedor; van Gelder, Pieter H. A. J. M.
2012-01-01
This paper describes an investigation on the usefulness of Bayesian Networks in the safety assessment of dune coasts. A network has been created that predicts the erosion volume based on hydraulic boundary conditions and a number of cross-shore profile indicators. Field measurement data along a large part of the Dutch coast has been used to train the network. Corresponding storm impact on the dunes was calculated with an empirical dune erosion model named duros+. Comparison between the Bayesian Network predictions and the original duros+ results, here considered as observations, results in a skill up to 0.88, provided that the training data covers the range of predictions. Hence, the predictions from a deterministic model (duros+) can be captured in a probabilistic model (Bayesian Network) such that both the process knowledge and uncertainties can be included in impact and vulnerability assessments.
Two-Stage Bayesian Model Averaging in Endogenous Variable Models*
Lenkoski, Alex; Eicher, Theo S.; Raftery, Adrian E.
2013-01-01
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed. PMID:24223471
NASA Astrophysics Data System (ADS)
Plant, N. G.; Thieler, E. R.; Gutierrez, B.; Lentz, E. E.; Zeigler, S. L.; Van Dongeren, A.; Fienen, M. N.
2016-12-01
We evaluate the strengths and weaknesses of Bayesian networks that have been used to address scientific and decision-support questions related to coastal geomorphology. We will provide an overview of coastal geomorphology research that has used Bayesian networks and describe what this approach can do and when it works (or fails to work). Over the past decade, Bayesian networks have been formulated to analyze the multi-variate structure and evolution of coastal morphology and associated human and ecological impacts. The approach relates observable system variables to each other by estimating discrete correlations. The resulting Bayesian-networks make predictions that propagate errors, conduct inference via Bayes rule, or both. In scientific applications, the model results are useful for hypothesis testing, using confidence estimates to gage the strength of tests while applications to coastal resource management are aimed at decision-support, where the probabilities of desired ecosystems outcomes are evaluated. The range of Bayesian-network applications to coastal morphology includes emulation of high-resolution wave transformation models to make oceanographic predictions, morphologic response to storms and/or sea-level rise, groundwater response to sea-level rise and morphologic variability, habitat suitability for endangered species, and assessment of monetary or human-life risk associated with storms. All of these examples are based on vast observational data sets, numerical model output, or both. We will discuss the progression of our experiments, which has included testing whether the Bayesian-network approach can be implemented and is appropriate for addressing basic and applied scientific problems and evaluating the hindcast and forecast skill of these implementations. We will present and discuss calibration/validation tests that are used to assess the robustness of Bayesian-network models and we will compare these results to tests of other models. This will demonstrate how Bayesian networks are used to extract new insights about coastal morphologic behavior, assess impacts to societal and ecological systems, and communicate probabilistic predictions to decision makers.
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Zaikin, Alexey; Míguez, Joaquín
2017-01-01
We compare three state-of-the-art Bayesian inference methods for the estimation of the unknown parameters in a stochastic model of a genetic network. In particular, we introduce a stochastic version of the paradigmatic synthetic multicellular clock model proposed by Ullner et al., 2007. By introducing dynamical noise in the model and assuming that the partial observations of the system are contaminated by additive noise, we enable a principled mechanism to represent experimental uncertainties in the synthesis of the multicellular system and pave the way for the design of probabilistic methods for the estimation of any unknowns in the model. Within this setup, we tackle the Bayesian estimation of a subset of the model parameters. Specifically, we compare three Monte Carlo based numerical methods for the approximation of the posterior probability density function of the unknown parameters given a set of partial and noisy observations of the system. The schemes we assess are the particle Metropolis-Hastings (PMH) algorithm, the nonlinear population Monte Carlo (NPMC) method and the approximate Bayesian computation sequential Monte Carlo (ABC-SMC) scheme. We present an extensive numerical simulation study, which shows that while the three techniques can effectively solve the problem there are significant differences both in estimation accuracy and computational efficiency. PMID:28797087
Causal Analysis After Haavelmo
Heckman, James; Pinto, Rodrigo
2014-01-01
Haavelmo's seminal 1943 and 1944 papers are the first rigorous treatment of causality. In them, he distinguished the definition of causal parameters from their identification. He showed that causal parameters are defined using hypothetical models that assign variation to some of the inputs determining outcomes while holding all other inputs fixed. He thus formalized and made operational Marshall's (1890) ceteris paribus analysis. We embed Haavelmo's framework into the recursive framework of Directed Acyclic Graphs (DAGs) used in one influential recent approach to causality (Pearl, 2000) and in the related literature on Bayesian nets (Lauritzen, 1996). We compare the simplicity of an analysis of causality based on Haavelmo's methodology with the complex and nonintuitive approach used in the causal literature of DAGs—the “do-calculus” of Pearl (2009). We discuss the severe limitations of DAGs and in particular of the do-calculus of Pearl in securing identification of economic models. We extend our framework to consider models for simultaneous causality, a central contribution of Haavelmo. In general cases, DAGs cannot be used to analyze models for simultaneous causality, but Haavelmo's approach naturally generalizes to cover them. PMID:25729123
Application of bayesian networks to real-time flood risk estimation
NASA Astrophysics Data System (ADS)
Garrote, L.; Molina, M.; Blasco, G.
2003-04-01
This paper presents the application of a computational paradigm taken from the field of artificial intelligence - the bayesian network - to model the behaviour of hydrologic basins during floods. The final goal of this research is to develop representation techniques for hydrologic simulation models in order to define, develop and validate a mechanism, supported by a software environment, oriented to build decision models for the prediction and management of river floods in real time. The emphasis is placed on providing decision makers with tools to incorporate their knowledge of basin behaviour, usually formulated in terms of rainfall-runoff models, in the process of real-time decision making during floods. A rainfall-runoff model is only a step in the process of decision making. If a reliable rainfall forecast is available and the rainfall-runoff model is well calibrated, decisions can be based mainly on model results. However, in most practical situations, uncertainties in rainfall forecasts or model performance have to be incorporated in the decision process. The computation paradigm adopted for the simulation of hydrologic processes is the bayesian network. A bayesian network is a directed acyclic graph that represents causal influences between linked variables. Under this representation, uncertain qualitative variables are related through causal relations quantified with conditional probabilities. The solution algorithm allows the computation of the expected probability distribution of unknown variables conditioned to the observations. An approach to represent hydrologic processes by bayesian networks with temporal and spatial extensions is presented in this paper, together with a methodology for the development of bayesian models using results produced by deterministic hydrologic simulation models
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
NASA Astrophysics Data System (ADS)
Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.
2015-12-01
Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.
Bayesian model checking: A comparison of tests
NASA Astrophysics Data System (ADS)
Lucy, L. B.
2018-06-01
Two procedures for checking Bayesian models are compared using a simple test problem based on the local Hubble expansion. Over four orders of magnitude, p-values derived from a global goodness-of-fit criterion for posterior probability density functions agree closely with posterior predictive p-values. The former can therefore serve as an effective proxy for the difficult-to-calculate posterior predictive p-values.
Instruction in information structuring improves Bayesian judgment in intelligence analysts.
Mandel, David R
2015-01-01
An experiment was conducted to test the effectiveness of brief instruction in information structuring (i.e., representing and integrating information) for improving the coherence of probability judgments and binary choices among intelligence analysts. Forty-three analysts were presented with comparable sets of Bayesian judgment problems before and immediately after instruction. After instruction, analysts' probability judgments were more coherent (i.e., more additive and compliant with Bayes theorem). Instruction also improved the coherence of binary choices regarding category membership: after instruction, subjects were more likely to invariably choose the category to which they assigned the higher probability of a target's membership. The research provides a rare example of evidence-based validation of effectiveness in instruction to improve the statistical assessment skills of intelligence analysts. Such instruction could also be used to improve the assessment quality of other types of experts who are required to integrate statistical information or make probabilistic assessments.
Assignment of a non-informative prior when using a calibration function
NASA Astrophysics Data System (ADS)
Lira, I.; Grientschnig, D.
2012-01-01
The evaluation of measurement uncertainty associated with the use of calibration functions was addressed in a talk at the 19th IMEKO World Congress 2009 in Lisbon (Proceedings, pp 2346-51). Therein, an example involving a cubic function was analysed by a Bayesian approach and by the Monte Carlo method described in Supplement 1 to the 'Guide to the Expression of Uncertainty in Measurement'. Results were found to be discrepant. In this paper we examine a simplified version of the example and show that the reported discrepancy is caused by the choice of the prior in the Bayesian analysis, which does not conform to formal rules for encoding the absence of prior knowledge. Two options for assigning a non-informative prior free from this shortcoming are considered; they are shown to be equivalent.
Bayesian Networks Improve Causal Environmental Assessments for Evidence-Based Policy.
Carriger, John F; Barron, Mace G; Newman, Michael C
2016-12-20
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on valued ecological resources. These aspects are demonstrated through hypothetical problem scenarios that explore some major benefits of using Bayesian networks for reasoning and making inferences in evidence-based policy.
A Fatigue Crack Size Evaluation Method Based on Lamb Wave Simulation and Limited Experimental Data
He, Jingjing; Ran, Yunmeng; Liu, Bin; Yang, Jinsong; Guan, Xuefei
2017-01-01
This paper presents a systematic and general method for Lamb wave-based crack size quantification using finite element simulations and Bayesian updating. The method consists of construction of a baseline quantification model using finite element simulation data and Bayesian updating with limited Lamb wave data from target structure. The baseline model correlates two proposed damage sensitive features, namely the normalized amplitude and phase change, with the crack length through a response surface model. The two damage sensitive features are extracted from the first received S0 mode wave package. The model parameters of the baseline model are estimated using finite element simulation data. To account for uncertainties from numerical modeling, geometry, material and manufacturing between the baseline model and the target model, Bayesian method is employed to update the baseline model with a few measurements acquired from the actual target structure. A rigorous validation is made using in-situ fatigue testing and Lamb wave data from coupon specimens and realistic lap-joint components. The effectiveness and accuracy of the proposed method is demonstrated under different loading and damage conditions. PMID:28902148
Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree
de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel
2009-01-01
The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140
Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework
NASA Astrophysics Data System (ADS)
Achieng, K. O.; Zhu, J.
2017-12-01
There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
Nowakowska, Marzena
2017-04-01
The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bayesian learning and the psychology of rule induction
Endress, Ansgar D.
2014-01-01
In recent years, Bayesian learning models have been applied to an increasing variety of domains. While such models have been criticized on theoretical grounds, the underlying assumptions and predictions are rarely made concrete and tested experimentally. Here, I use Frank and Tenenbaum's (2011) Bayesian model of rule-learning as a case study to spell out the underlying assumptions, and to confront them with the empirical results Frank and Tenenbaum (2011) propose to simulate, as well as with novel experiments. While rule-learning is arguably well suited to rational Bayesian approaches, I show that their models are neither psychologically plausible nor ideal observer models. Further, I show that their central assumption is unfounded: humans do not always preferentially learn more specific rules, but, at least in some situations, those rules that happen to be more salient. Even when granting the unsupported assumptions, I show that all of the experiments modeled by Frank and Tenenbaum (2011) either contradict their models, or have a large number of more plausible interpretations. I provide an alternative account of the experimental data based on simple psychological mechanisms, and show that this account both describes the data better, and is easier to falsify. I conclude that, despite the recent surge in Bayesian models of cognitive phenomena, psychological phenomena are best understood by developing and testing psychological theories rather than models that can be fit to virtually any data. PMID:23454791
A systematic review of Bayesian articles in psychology: The last 25 years.
van de Schoot, Rens; Winter, Sonja D; Ryan, Oisín; Zondervan-Zwijnenburg, Mariëlle; Depaoli, Sarah
2017-06-01
Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Størset, Elisabet; Holford, Nick; Hennig, Stefanie; Bergmann, Troels K; Bergan, Stein; Bremer, Sara; Åsberg, Anders; Midtvedt, Karsten; Staatz, Christine E
2014-09-01
The aim was to develop a theory-based population pharmacokinetic model of tacrolimus in adult kidney transplant recipients and to externally evaluate this model and two previous empirical models. Data were obtained from 242 patients with 3100 tacrolimus whole blood concentrations. External evaluation was performed by examining model predictive performance using Bayesian forecasting. Pharmacokinetic disposition parameters were estimated based on tacrolimus plasma concentrations, predicted from whole blood concentrations, haematocrit and literature values for tacrolimus binding to red blood cells. Disposition parameters were allometrically scaled to fat free mass. Tacrolimus whole blood clearance/bioavailability standardized to haematocrit of 45% and fat free mass of 60 kg was estimated to be 16.1 l h−1 [95% CI 12.6, 18.0 l h−1]. Tacrolimus clearance was 30% higher (95% CI 13, 46%) and bioavailability 18% lower (95% CI 2, 29%) in CYP3A5 expressers compared with non-expressers. An Emax model described decreasing tacrolimus bioavailability with increasing prednisolone dose. The theory-based model was superior to the empirical models during external evaluation displaying a median prediction error of −1.2% (95% CI −3.0, 0.1%). Based on simulation, Bayesian forecasting led to 65% (95% CI 62, 68%) of patients achieving a tacrolimus average steady-state concentration within a suggested acceptable range. A theory-based population pharmacokinetic model was superior to two empirical models for prediction of tacrolimus concentrations and seemed suitable for Bayesian prediction of tacrolimus doses early after kidney transplantation.
Sarigiannis, Dimosthenis A; Karakitsios, Spyros P; Gotti, Alberto; Papaloukas, Costas L; Kassomenos, Pavlos A; Pilidis, Georgios A
2009-01-01
The objective of the current study was the development of a reliable modeling platform to calculate in real time the personal exposure and the associated health risk for filling station employees evaluating current environmental parameters (traffic, meteorological and amount of fuel traded) determined by the appropriate sensor network. A set of Artificial Neural Networks (ANNs) was developed to predict benzene exposure pattern for the filling station employees. Furthermore, a Physiology Based Pharmaco-Kinetic (PBPK) risk assessment model was developed in order to calculate the lifetime probability distribution of leukemia to the employees, fed by data obtained by the ANN model. Bayesian algorithm was involved in crucial points of both model sub compartments. The application was evaluated in two filling stations (one urban and one rural). Among several algorithms available for the development of the ANN exposure model, Bayesian regularization provided the best results and seemed to be a promising technique for prediction of the exposure pattern of that occupational population group. On assessing the estimated leukemia risk under the scope of providing a distribution curve based on the exposure levels and the different susceptibility of the population, the Bayesian algorithm was a prerequisite of the Monte Carlo approach, which is integrated in the PBPK-based risk model. In conclusion, the modeling system described herein is capable of exploiting the information collected by the environmental sensors in order to estimate in real time the personal exposure and the resulting health risk for employees of gasoline filling stations.
Sarigiannis, Dimosthenis A.; Karakitsios, Spyros P.; Gotti, Alberto; Papaloukas, Costas L.; Kassomenos, Pavlos A.; Pilidis, Georgios A.
2009-01-01
The objective of the current study was the development of a reliable modeling platform to calculate in real time the personal exposure and the associated health risk for filling station employees evaluating current environmental parameters (traffic, meteorological and amount of fuel traded) determined by the appropriate sensor network. A set of Artificial Neural Networks (ANNs) was developed to predict benzene exposure pattern for the filling station employees. Furthermore, a Physiology Based Pharmaco-Kinetic (PBPK) risk assessment model was developed in order to calculate the lifetime probability distribution of leukemia to the employees, fed by data obtained by the ANN model. Bayesian algorithm was involved in crucial points of both model sub compartments. The application was evaluated in two filling stations (one urban and one rural). Among several algorithms available for the development of the ANN exposure model, Bayesian regularization provided the best results and seemed to be a promising technique for prediction of the exposure pattern of that occupational population group. On assessing the estimated leukemia risk under the scope of providing a distribution curve based on the exposure levels and the different susceptibility of the population, the Bayesian algorithm was a prerequisite of the Monte Carlo approach, which is integrated in the PBPK-based risk model. In conclusion, the modeling system described herein is capable of exploiting the information collected by the environmental sensors in order to estimate in real time the personal exposure and the resulting health risk for employees of gasoline filling stations. PMID:22399936
NASA Astrophysics Data System (ADS)
Li, L.; Xu, C.-Y.; Engeland, K.
2012-04-01
With respect to model calibration, parameter estimation and analysis of uncertainty sources, different approaches have been used in hydrological models. Bayesian method is one of the most widely used methods for uncertainty assessment of hydrological models, which incorporates different sources of information into a single analysis through Bayesian theorem. However, none of these applications can well treat the uncertainty in extreme flows of hydrological models' simulations. This study proposes a Bayesian modularization method approach in uncertainty assessment of conceptual hydrological models by considering the extreme flows. It includes a comprehensive comparison and evaluation of uncertainty assessments by a new Bayesian modularization method approach and traditional Bayesian models using the Metropolis Hasting (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions are used in combination with traditional Bayesian: the AR (1) plus Normal and time period independent model (Model 1), the AR (1) plus Normal and time period dependent model (Model 2) and the AR (1) plus multi-normal model (Model 3). The results reveal that (1) the simulations derived from Bayesian modularization method are more accurate with the highest Nash-Sutcliffe efficiency value, and (2) the Bayesian modularization method performs best in uncertainty estimates of entire flows and in terms of the application and computational efficiency. The study thus introduces a new approach for reducing the extreme flow's effect on the discharge uncertainty assessment of hydrological models via Bayesian. Keywords: extreme flow, uncertainty assessment, Bayesian modularization, hydrological model, WASMOD
Espino-Hernandez, Gabriela; Gustafson, Paul; Burstyn, Igor
2011-05-14
In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a naive analysis. Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed. The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed. In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model.
The impossibility of probabilities
NASA Astrophysics Data System (ADS)
Zimmerman, Peter D.
2017-11-01
This paper discusses the problem of assigning probabilities to the likelihood of nuclear terrorism events, in particular examining the limitations of using Bayesian priors for this purpose. It suggests an alternate approach to analyzing the threat of nuclear terrorism.
Research on Bayes matting algorithm based on Gaussian mixture model
NASA Astrophysics Data System (ADS)
Quan, Wei; Jiang, Shan; Han, Cheng; Zhang, Chao; Jiang, Zhengang
2015-12-01
The digital matting problem is a classical problem of imaging. It aims at separating non-rectangular foreground objects from a background image, and compositing with a new background image. Accurate matting determines the quality of the compositing image. A Bayesian matting Algorithm Based on Gaussian Mixture Model is proposed to solve this matting problem. Firstly, the traditional Bayesian framework is improved by introducing Gaussian mixture model. Then, a weighting factor is added in order to suppress the noises of the compositing images. Finally, the effect is further improved by regulating the user's input. This algorithm is applied to matting jobs of classical images. The results are compared to the traditional Bayesian method. It is shown that our algorithm has better performance in detail such as hair. Our algorithm eliminates the noise well. And it is very effectively in dealing with the kind of work, such as interested objects with intricate boundaries.
Hierarchical Bayesian sparse image reconstruction with application to MRFM.
Dobigeon, Nicolas; Hero, Alfred O; Tourneret, Jean-Yves
2009-09-01
This paper presents a hierarchical Bayesian model to reconstruct sparse images when the observations are obtained from linear transformations and corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be used to estimate the image to be recovered, e.g., by maximizing the estimated posterior distribution. In our fully Bayesian approach, the posteriors of all the parameters are available. Thus, our algorithm provides more information than other previously proposed sparse reconstruction methods that only give a point estimate. The performance of the proposed hierarchical Bayesian sparse reconstruction method is illustrated on synthetic data and real data collected from a tobacco virus sample using a prototype MRFM instrument.
A Bayesian Alternative for Multi-objective Ecohydrological Model Specification
NASA Astrophysics Data System (ADS)
Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.
2015-12-01
Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.
Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction
Shen, Li; Qi, Yuan; Kim, Sungeun; Nho, Kwangsik; Wan, Jing; Risacher, Shannon L.; Saykin, Andrew J.
2010-01-01
We apply sparse Bayesian learning methods, automatic relevance determination (ARD) and predictive ARD (PARD), to Alzheimer’s disease (AD) classification to make accurate prediction and identify critical imaging markers relevant to AD at the same time. ARD is one of the most successful Bayesian feature selection methods. PARD is a powerful Bayesian feature selection method, and provides sparse models that is easy to interpret. PARD selects the model with the best estimate of the predictive performance instead of choosing the one with the largest marginal model likelihood. Comparative study with support vector machine (SVM) shows that ARD/PARD in general outperform SVM in terms of prediction accuracy. Additional comparison with surface-based general linear model (GLM) analysis shows that regions with strongest signals are identified by both GLM and ARD/PARD. While GLM P-map returns significant regions all over the cortex, ARD/PARD provide a small number of relevant and meaningful imaging markers with predictive power, including both cortical and subcortical measures. PMID:20879451
Bayesian analysis of caustic-crossing microlensing events
NASA Astrophysics Data System (ADS)
Cassan, A.; Horne, K.; Kains, N.; Tsapras, Y.; Browne, P.
2010-06-01
Aims: Caustic-crossing binary-lens microlensing events are important anomalous events because they are capable of detecting an extrasolar planet companion orbiting the lens star. Fast and robust modelling methods are thus of prime interest in helping to decide whether a planet is detected by an event. Cassan introduced a new set of parameters to model binary-lens events, which are closely related to properties of the light curve. In this work, we explain how Bayesian priors can be added to this framework, and investigate on interesting options. Methods: We develop a mathematical formulation that allows us to compute analytically the priors on the new parameters, given some previous knowledge about other physical quantities. We explicitly compute the priors for a number of interesting cases, and show how this can be implemented in a fully Bayesian, Markov chain Monte Carlo algorithm. Results: Using Bayesian priors can accelerate microlens fitting codes by reducing the time spent considering physically implausible models, and helps us to discriminate between alternative models based on the physical plausibility of their parameters.
Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566
Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network
NASA Astrophysics Data System (ADS)
Li, Zhiqiang; Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu
2018-04-01
This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit.
Profile-Based LC-MS Data Alignment—A Bayesian Approach
Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.
2014-01-01
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872
NASA Astrophysics Data System (ADS)
Mazrou, H.; Bezoubiri, F.
2018-07-01
In this work, a new program developed under MATLAB environment and supported by the Bayesian software WinBUGS has been combined to the traditional unfolding codes namely MAXED and GRAVEL, to evaluate a neutron spectrum from the Bonner spheres measured counts obtained around a shielded 241AmBe based-neutron irradiator located at a Secondary Standards Dosimetry Laboratory (SSDL) at CRNA. In the first step, the results obtained by the standalone Bayesian program, using a parametric neutron spectrum model based on a linear superposition of three components namely: a thermal-Maxwellian distribution, an epithermal (1/E behavior) and a kind of a Watt fission and Evaporation models to represent the fast component, were compared to those issued from MAXED and GRAVEL assuming a Monte Carlo default spectrum. Through the selection of new upper limits for some free parameters, taking into account the physical characteristics of the irradiation source, of both considered models, good agreement was obtained for investigated integral quantities i.e. fluence rate and ambient dose equivalent rate compared to MAXED and GRAVEL results. The difference was generally below 4% for investigated parameters suggesting, thereby, the reliability of the proposed models. In the second step, the Bayesian results obtained from the previous calculations were used, as initial guess spectra, for the traditional unfolding codes, MAXED and GRAVEL to derive the solution spectra. Here again the results were in very good agreement, confirming the stability of the Bayesian solution.
A Bayesian approach to model structural error and input variability in groundwater modeling
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Lin, Y. F. F.; Liang, F.
2015-12-01
Effective water resource management typically relies on numerical models to analyze groundwater flow and solute transport processes. Model structural error (due to simplification and/or misrepresentation of the "true" environmental system) and input forcing variability (which commonly arises since some inputs are uncontrolled or estimated with high uncertainty) are ubiquitous in groundwater models. Calibration that overlooks errors in model structure and input data can lead to biased parameter estimates and compromised predictions. We present a fully Bayesian approach for a complete assessment of uncertainty for spatially distributed groundwater models. The approach explicitly recognizes stochastic input and uses data-driven error models based on nonparametric kernel methods to account for model structural error. We employ exploratory data analysis to assist in specifying informative prior for error models to improve identifiability. The inference is facilitated by an efficient sampling algorithm based on DREAM-ZS and a parameter subspace multiple-try strategy to reduce the required number of forward simulations of the groundwater model. We demonstrate the Bayesian approach through a synthetic case study of surface-ground water interaction under changing pumping conditions. It is found that explicit treatment of errors in model structure and input data (groundwater pumping rate) has substantial impact on the posterior distribution of groundwater model parameters. Using error models reduces predictive bias caused by parameter compensation. In addition, input variability increases parametric and predictive uncertainty. The Bayesian approach allows for a comparison among the contributions from various error sources, which could inform future model improvement and data collection efforts on how to best direct resources towards reducing predictive uncertainty.
Predicting Mycobacterium tuberculosis Complex Clades Using Knowledge-Based Bayesian Networks
Bennett, Kristin P.
2014-01-01
We develop a novel approach for incorporating expert rules into Bayesian networks for classification of Mycobacterium tuberculosis complex (MTBC) clades. The proposed knowledge-based Bayesian network (KBBN) treats sets of expert rules as prior distributions on the classes. Unlike prior knowledge-based support vector machine approaches which require rules expressed as polyhedral sets, KBBN directly incorporates the rules without any modification. KBBN uses data to refine rule-based classifiers when the rule set is incomplete or ambiguous. We develop a predictive KBBN model for 69 MTBC clades found in the SITVIT international collection. We validate the approach using two testbeds that model knowledge of the MTBC obtained from two different experts and large DNA fingerprint databases to predict MTBC genetic clades and sublineages. These models represent strains of MTBC using high-throughput biomarkers called spacer oligonucleotide types (spoligotypes), since these are routinely gathered from MTBC isolates of tuberculosis (TB) patients. Results show that incorporating rules into problems can drastically increase classification accuracy if data alone are insufficient. The SITVIT KBBN is publicly available for use on the World Wide Web. PMID:24864238
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...
2015-02-05
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
A Bayesian Nonparametric Approach to Test Equating
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
Model Diagnostics for Bayesian Networks
ERIC Educational Resources Information Center
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
Validation of the thermal challenge problem using Bayesian Belief Networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McFarland, John; Swiler, Laura Painton
The thermal challenge problem has been developed at Sandia National Laboratories as a testbed for demonstrating various types of validation approaches and prediction methods. This report discusses one particular methodology to assess the validity of a computational model given experimental data. This methodology is based on Bayesian Belief Networks (BBNs) and can incorporate uncertainty in experimental measurements, in physical quantities, and model uncertainties. The approach uses the prior and posterior distributions of model output to compute a validation metric based on Bayesian hypothesis testing (a Bayes' factor). This report discusses various aspects of the BBN, specifically in the context ofmore » the thermal challenge problem. A BBN is developed for a given set of experimental data in a particular experimental configuration. The development of the BBN and the method for ''solving'' the BBN to develop the posterior distribution of model output through Monte Carlo Markov Chain sampling is discussed in detail. The use of the BBN to compute a Bayes' factor is demonstrated.« less
Evaluating Great Lakes bald eagle nesting habitat with Bayesian inference
Teryl G. Grubb; William W. Bowerman; Allen J. Bath; John P. Giesy; D. V. Chip Weseloh
2003-01-01
Bayesian inference facilitated structured interpretation of a nonreplicated, experience-based survey of potential nesting habitat for bald eagles (Haliaeetus leucocephalus) along the five Great Lakes shorelines. We developed a pattern recognition (PATREC) model of our aerial search image with six habitat attributes: (a) tree cover, (b) proximity and...
Global Land Carbon Uptake from Trait Distributions
NASA Astrophysics Data System (ADS)
Butler, E. E.; Datta, A.; Flores-Moreno, H.; Fazayeli, F.; Chen, M.; Wythers, K. R.; Banerjee, A.; Atkin, O. K.; Kattge, J.; Reich, P. B.
2016-12-01
Historically, functional diversity in land surface models has been represented through a range of plant functional types (PFTs), each of which has a single value for all of its functional traits. Here we expand the diversity of the land surface by using a distribution of trait values for each PFT. The data for these trait distributions is from a sub-set of the global database of plant traits, TRY, and this analysis uses three leaf traits: mass based nitrogen and phosphorus content and specific leaf area, which influence both photosynthesis and respiration. The data are extrapolated into continuous surfaces through two methodologies. The first, a categorical method, classifies the species observed in TRY into satellite estimates of their plant functional type abundances - analogous to how traits are currently assigned to PFTs in land surface models. Second, a Bayesian spatial method which additionally estimates how the distribution of a trait changes in accord with both climate and soil covariates. These two methods produce distinct patterns of diversity which are incorporated into a land surface model to estimate how the range of trait values affects the global land carbon budget.
A Bayesian approach to meta-analysis of plant pathology studies.
Mila, A L; Ngugi, H K
2011-01-01
Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework. Bayesian meta-analysis can readily include information not easily incorporated in classical methods, and allow for a full evaluation of competing models. Given the power and flexibility of Bayesian methods, we expect them to become widely adopted for meta-analysis of plant pathology studies.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-01-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272
Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.
Dettmer, Jan; Dosso, Stan E; Osler, John C
2010-12-01
This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.
Efficient Posterior Probability Mapping Using Savage-Dickey Ratios
Penny, William D.; Ridgway, Gerard R.
2013-01-01
Statistical Parametric Mapping (SPM) is the dominant paradigm for mass-univariate analysis of neuroimaging data. More recently, a Bayesian approach termed Posterior Probability Mapping (PPM) has been proposed as an alternative. PPM offers two advantages: (i) inferences can be made about effect size thus lending a precise physiological meaning to activated regions, (ii) regions can be declared inactive. This latter facility is most parsimoniously provided by PPMs based on Bayesian model comparisons. To date these comparisons have been implemented by an Independent Model Optimization (IMO) procedure which separately fits null and alternative models. This paper proposes a more computationally efficient procedure based on Savage-Dickey approximations to the Bayes factor, and Taylor-series approximations to the voxel-wise posterior covariance matrices. Simulations show the accuracy of this Savage-Dickey-Taylor (SDT) method to be comparable to that of IMO. Results on fMRI data show excellent agreement between SDT and IMO for second-level models, and reasonable agreement for first-level models. This Savage-Dickey test is a Bayesian analogue of the classical SPM-F and allows users to implement model comparison in a truly interactive manner. PMID:23533640
Kwon, Deukwoo; Hoffman, F Owen; Moroz, Brian E; Simon, Steven L
2016-02-10
Most conventional risk analysis methods rely on a single best estimate of exposure per person, which does not allow for adjustment for exposure-related uncertainty. Here, we propose a Bayesian model averaging method to properly quantify the relationship between radiation dose and disease outcomes by accounting for shared and unshared uncertainty in estimated dose. Our Bayesian risk analysis method utilizes multiple realizations of sets (vectors) of doses generated by a two-dimensional Monte Carlo simulation method that properly separates shared and unshared errors in dose estimation. The exposure model used in this work is taken from a study of the risk of thyroid nodules among a cohort of 2376 subjects who were exposed to fallout from nuclear testing in Kazakhstan. We assessed the performance of our method through an extensive series of simulations and comparisons against conventional regression risk analysis methods. When the estimated doses contain relatively small amounts of uncertainty, the Bayesian method using multiple a priori plausible draws of dose vectors gave similar results to the conventional regression-based methods of dose-response analysis. However, when large and complex mixtures of shared and unshared uncertainties are present, the Bayesian method using multiple dose vectors had significantly lower relative bias than conventional regression-based risk analysis methods and better coverage, that is, a markedly increased capability to include the true risk coefficient within the 95% credible interval of the Bayesian-based risk estimate. An evaluation of the dose-response using our method is presented for an epidemiological study of thyroid disease following radiation exposure. Copyright © 2015 John Wiley & Sons, Ltd.
Theory-based Bayesian models of inductive learning and reasoning.
Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles
2006-07-01
Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
SensibleSleep: A Bayesian Model for Learning Sleep Patterns from Smartphone Events
Sekara, Vedran; Jonsson, Håkan; Larsen, Jakob Eg; Lehmann, Sune
2017-01-01
We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals’ daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armband sleep trackers. We show that the model is able to produce reliable sleep estimates with an accuracy of 0.89, both at the individual and at the collective level. Moreover the Bayesian model is able to quantify uncertainty and encode prior knowledge about sleep patterns. Compared with existing smartphone-based systems, our method requires only screen on/off events, and is therefore much less intrusive in terms of privacy and more battery-efficient. PMID:28076375
SensibleSleep: A Bayesian Model for Learning Sleep Patterns from Smartphone Events.
Cuttone, Andrea; Bækgaard, Per; Sekara, Vedran; Jonsson, Håkan; Larsen, Jakob Eg; Lehmann, Sune
2017-01-01
We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals' daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armband sleep trackers. We show that the model is able to produce reliable sleep estimates with an accuracy of 0.89, both at the individual and at the collective level. Moreover the Bayesian model is able to quantify uncertainty and encode prior knowledge about sleep patterns. Compared with existing smartphone-based systems, our method requires only screen on/off events, and is therefore much less intrusive in terms of privacy and more battery-efficient.
NASA Astrophysics Data System (ADS)
Budi Astuti, Ani; Iriawan, Nur; Irhamah; Kuswanto, Heri; Sasiarini, Laksmi
2017-10-01
Bayesian statistics proposes an approach that is very flexible in the number of samples and distribution of data. Bayesian Mixture Model (BMM) is a Bayesian approach for multimodal models. Diabetes Mellitus (DM) is more commonly known in the Indonesian community as sweet pee. This disease is one type of chronic non-communicable diseases but it is very dangerous to humans because of the effects of other diseases complications caused. WHO reports in 2013 showed DM disease was ranked 6th in the world as the leading causes of human death. In Indonesia, DM disease continues to increase over time. These research would be studied patterns and would be built the BMM models of the DM data through simulation studies where the simulation data built on cases of blood sugar levels of DM patients in RSUD Saiful Anwar Malang. The results have been successfully demonstrated pattern of distribution of the DM data which has a normal mixture distribution. The BMM models have succeed to accommodate the real condition of the DM data based on the data driven concept.
Bayesian Model Averaging for Propensity Score Analysis
ERIC Educational Resources Information Center
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Inferring on the Intentions of Others by Hierarchical Bayesian Learning
Diaconescu, Andreea O.; Mathys, Christoph; Weber, Lilian A. E.; Daunizeau, Jean; Kasper, Lars; Lomakina, Ekaterina I.; Fehr, Ernst; Stephan, Klaas E.
2014-01-01
Inferring on others' (potentially time-varying) intentions is a fundamental problem during many social transactions. To investigate the underlying mechanisms, we applied computational modeling to behavioral data from an economic game in which 16 pairs of volunteers (randomly assigned to “player” or “adviser” roles) interacted. The player performed a probabilistic reinforcement learning task, receiving information about a binary lottery from a visual pie chart. The adviser, who received more predictive information, issued an additional recommendation. Critically, the game was structured such that the adviser's incentives to provide helpful or misleading information varied in time. Using a meta-Bayesian modeling framework, we found that the players' behavior was best explained by the deployment of hierarchical learning: they inferred upon the volatility of the advisers' intentions in order to optimize their predictions about the validity of their advice. Beyond learning, volatility estimates also affected the trial-by-trial variability of decisions: participants were more likely to rely on their estimates of advice accuracy for making choices when they believed that the adviser's intentions were presently stable. Finally, our model of the players' inference predicted the players' interpersonal reactivity index (IRI) scores, explicit ratings of the advisers' helpfulness and the advisers' self-reports on their chosen strategy. Overall, our results suggest that humans (i) employ hierarchical generative models to infer on the changing intentions of others, (ii) use volatility estimates to inform decision-making in social interactions, and (iii) integrate estimates of advice accuracy with non-social sources of information. The Bayesian framework presented here can quantify individual differences in these mechanisms from simple behavioral readouts and may prove useful in future clinical studies of maladaptive social cognition. PMID:25187943
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
Bayesian experimental design for models with intractable likelihoods.
Drovandi, Christopher C; Pettitt, Anthony N
2013-12-01
In this paper we present a methodology for designing experiments for efficiently estimating the parameters of models with computationally intractable likelihoods. The approach combines a commonly used methodology for robust experimental design, based on Markov chain Monte Carlo sampling, with approximate Bayesian computation (ABC) to ensure that no likelihood evaluations are required. The utility function considered for precise parameter estimation is based upon the precision of the ABC posterior distribution, which we form efficiently via the ABC rejection algorithm based on pre-computed model simulations. Our focus is on stochastic models and, in particular, we investigate the methodology for Markov process models of epidemics and macroparasite population evolution. The macroparasite example involves a multivariate process and we assess the loss of information from not observing all variables. © 2013, The International Biometric Society.
Application of a data-mining method based on Bayesian networks to lesion-deficit analysis
NASA Technical Reports Server (NTRS)
Herskovits, Edward H.; Gerring, Joan P.
2003-01-01
Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.
Nonlinear and non-Gaussian Bayesian based handwriting beautification
NASA Astrophysics Data System (ADS)
Shi, Cao; Xiao, Jianguo; Xu, Canhui; Jia, Wenhua
2013-03-01
A framework is proposed in this paper to effectively and efficiently beautify handwriting by means of a novel nonlinear and non-Gaussian Bayesian algorithm. In the proposed framework, format and size of handwriting image are firstly normalized, and then typeface in computer system is applied to optimize vision effect of handwriting. The Bayesian statistics is exploited to characterize the handwriting beautification process as a Bayesian dynamic model. The model parameters to translate, rotate and scale typeface in computer system are controlled by state equation, and the matching optimization between handwriting and transformed typeface is employed by measurement equation. Finally, the new typeface, which is transformed from the original one and gains the best nonlinear and non-Gaussian optimization, is the beautification result of handwriting. Experimental results demonstrate the proposed framework provides a creative handwriting beautification methodology to improve visual acceptance.
Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret
2018-01-01
Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.
Comparing Families of Dynamic Causal Models
Penny, Will D.; Stephan, Klaas E.; Daunizeau, Jean; Rosa, Maria J.; Friston, Karl J.; Schofield, Thomas M.; Leff, Alex P.
2010-01-01
Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data. PMID:20300649
NASA Astrophysics Data System (ADS)
Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens
2018-06-01
This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Impact of censoring on learning Bayesian networks in survival modelling.
Stajduhar, Ivan; Dalbelo-Basić, Bojana; Bogunović, Nikola
2009-11-01
Bayesian networks are commonly used for presenting uncertainty and covariate interactions in an easily interpretable way. Because of their efficient inference and ability to represent causal relationships, they are an excellent choice for medical decision support systems in diagnosis, treatment, and prognosis. Although good procedures for learning Bayesian networks from data have been defined, their performance in learning from censored survival data has not been widely studied. In this paper, we explore how to use these procedures to learn about possible interactions between prognostic factors and their influence on the variate of interest. We study how censoring affects the probability of learning correct Bayesian network structures. Additionally, we analyse the potential usefulness of the learnt models for predicting the time-independent probability of an event of interest. We analysed the influence of censoring with a simulation on synthetic data sampled from randomly generated Bayesian networks. We used two well-known methods for learning Bayesian networks from data: a constraint-based method and a score-based method. We compared the performance of each method under different levels of censoring to those of the naive Bayes classifier and the proportional hazards model. We did additional experiments on several datasets from real-world medical domains. The machine-learning methods treated censored cases in the data as event-free. We report and compare results for several commonly used model evaluation metrics. On average, the proportional hazards method outperformed other methods in most censoring setups. As part of the simulation study, we also analysed structural similarities of the learnt networks. Heavy censoring, as opposed to no censoring, produces up to a 5% surplus and up to 10% missing total arcs. It also produces up to 50% missing arcs that should originally be connected to the variate of interest. Presented methods for learning Bayesian networks from data can be used to learn from censored survival data in the presence of light censoring (up to 20%) by treating censored cases as event-free. Given intermediate or heavy censoring, the learnt models become tuned to the majority class and would thus require a different approach.
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad
2016-02-01
Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less
Eberle, Jonas; Warnock, Rachel C M; Ahrens, Dirk
2016-05-05
Defining species units can be challenging, especially during the earliest stages of speciation, when phylogenetic inference and delimitation methods may be compromised by incomplete lineage sorting (ILS) or secondary gene flow. Integrative approaches to taxonomy, which combine molecular and morphological evidence, have the potential to be valuable in such cases. In this study we investigated the South African scarab beetle genus Pleophylla using data collected from 110 individuals of eight putative morphospecies. The dataset included four molecular markers (cox1, 16S, rrnL, ITS1) and morphometric data based on male genital morphology. We applied a suite of molecular and morphological approaches to species delimitation, and implemented a novel Bayesian approach in the software iBPP, which enables continuous morphological trait and molecular data to be combined. Traditional morphology-based species assignments were supported quantitatively by morphometric analyses of the male genitalia (eigenshape analysis, CVA, LDA). While the ITS1-based delineation was also broadly congruent with the morphospecies, the cox1 data resulted in over-splitting (GMYC modelling, haplotype networks, PTP, ABGD). In the most extreme case morphospecies shared identical haplotypes, which may be attributable to ILS based on statistical tests performed using the software JML. We found the strongest support for putative morphospecies based on phylogenetic evidence using the combined approach implemented in iBPP. However, support for putative species was sensitive to the use of alternative guide trees and alternative combinations of priors on the population size (θ) and rootage (τ 0 ) parameters, especially when the analysis was based on molecular or morphological data alone. We demonstrate that continuous morphological trait data can be extremely valuable in assessing competing hypotheses to species delimitation. In particular, we show that the inclusion of morphological data in an integrative Bayesian framework can improve the resolution of inferred species units. However, we also demonstrate that this approach is extremely sensitive to guide tree and prior parameter choice. These parameters should be chosen with caution - if possible - based on independent empirical evidence, or careful sensitivity analyses should be performed to assess the robustness of results. Young species provide exemplars for investigating the mechanisms of speciation and for assessing the performance of tools used to delimit species on the basis of molecular and/or morphological evidence.
Bayesian decision support for coding occupational injury data.
Nanda, Gaurav; Grattan, Kathleen M; Chu, MyDzung T; Davis, Letitia K; Lehto, Mark R
2016-06-01
Studies on autocoding injury data have found that machine learning algorithms perform well for categories that occur frequently but often struggle with rare categories. Therefore, manual coding, although resource-intensive, cannot be eliminated. We propose a Bayesian decision support system to autocode a large portion of the data, filter cases for manual review, and assist human coders by presenting them top k prediction choices and a confusion matrix of predictions from Bayesian models. We studied the prediction performance of Single-Word (SW) and Two-Word-Sequence (TW) Naïve Bayes models on a sample of data from the 2011 Survey of Occupational Injury and Illness (SOII). We used the agreement in prediction results of SW and TW models, and various prediction strength thresholds for autocoding and filtering cases for manual review. We also studied the sensitivity of the top k predictions of the SW model, TW model, and SW-TW combination, and then compared the accuracy of the manually assigned codes to SOII data with that of the proposed system. The accuracy of the proposed system, assuming well-trained coders reviewing a subset of only 26% of cases flagged for review, was estimated to be comparable (86.5%) to the accuracy of the original coding of the data set (range: 73%-86.8%). Overall, the TW model had higher sensitivity than the SW model, and the accuracy of the prediction results increased when the two models agreed, and for higher prediction strength thresholds. The sensitivity of the top five predictions was 93%. The proposed system seems promising for coding injury data as it offers comparable accuracy and less manual coding. Accurate and timely coded occupational injury data is useful for surveillance as well as prevention activities that aim to make workplaces safer. Copyright © 2016 Elsevier Ltd and National Safety Council. All rights reserved.
NASA Astrophysics Data System (ADS)
Skataric, Maja; Bose, Sandip; Zeroug, Smaine; Tilke, Peter
2017-02-01
It is not uncommon in the field of non-destructive evaluation that multiple measurements encompassing a variety of modalities are available for analysis and interpretation for determining the underlying states of nature of the materials or parts being tested. Despite and sometimes due to the richness of data, significant challenges arise in the interpretation manifested as ambiguities and inconsistencies due to various uncertain factors in the physical properties (inputs), environment, measurement device properties, human errors, and the measurement data (outputs). Most of these uncertainties cannot be described by any rigorous mathematical means, and modeling of all possibilities is usually infeasible for many real time applications. In this work, we will discuss an approach based on Hierarchical Bayesian Graphical Models (HBGM) for the improved interpretation of complex (multi-dimensional) problems with parametric uncertainties that lack usable physical models. In this setting, the input space of the physical properties is specified through prior distributions based on domain knowledge and expertise, which are represented as Gaussian mixtures to model the various possible scenarios of interest for non-destructive testing applications. Forward models are then used offline to generate the expected distribution of the proposed measurements which are used to train a hierarchical Bayesian network. In Bayesian analysis, all model parameters are treated as random variables, and inference of the parameters is made on the basis of posterior distribution given the observed data. Learned parameters of the posterior distribution obtained after the training can therefore be used to build an efficient classifier for differentiating new observed data in real time on the basis of pre-trained models. We will illustrate the implementation of the HBGM approach to ultrasonic measurements used for cement evaluation of cased wells in the oil industry.
BEASTling: A software tool for linguistic phylogenetics using BEAST 2
Forkel, Robert; Kaiping, Gereon A.; Atkinson, Quentin D.
2017-01-01
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts. PMID:28796784
BEASTling: A software tool for linguistic phylogenetics using BEAST 2.
Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D
2017-01-01
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.
Learning time series for intelligent monitoring
NASA Technical Reports Server (NTRS)
Manganaris, Stefanos; Fisher, Doug
1994-01-01
We address the problem of classifying time series according to their morphological features in the time domain. In a supervised machine-learning framework, we induce a classification procedure from a set of preclassified examples. For each class, we infer a model that captures its morphological features using Bayesian model induction and the minimum message length approach to assign priors. In the performance task, we classify a time series in one of the learned classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. We report results from experiments in a monitoring domain of interest to NASA.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2010-11-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry
Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen
2011-01-01
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266
Time series forecasting using ERNN and QR based on Bayesian model averaging
NASA Astrophysics Data System (ADS)
Pwasong, Augustine; Sathasivam, Saratha
2017-08-01
The Bayesian model averaging technique is a multi-model combination technique. The technique was employed to amalgamate the Elman recurrent neural network (ERNN) technique with the quadratic regression (QR) technique. The amalgamation produced a hybrid technique known as the hybrid ERNN-QR technique. The potentials of forecasting with the hybrid technique are compared with the forecasting capabilities of individual techniques of ERNN and QR. The outcome revealed that the hybrid technique is superior to the individual techniques in the mean square error sense.
A new Bayesian recursive technique for parameter estimation
NASA Astrophysics Data System (ADS)
Kaheil, Yasir H.; Gill, M. Kashif; McKee, Mac; Bastidas, Luis
2006-08-01
The performance of any model depends on how well its associated parameters are estimated. In the current application, a localized Bayesian recursive estimation (LOBARE) approach is devised for parameter estimation. The LOBARE methodology is an extension of the Bayesian recursive estimation (BARE) method. It is applied in this paper on two different types of models: an artificial intelligence (AI) model in the form of a support vector machine (SVM) application for forecasting soil moisture and a conceptual rainfall-runoff (CRR) model represented by the Sacramento soil moisture accounting (SAC-SMA) model. Support vector machines, based on statistical learning theory (SLT), represent the modeling task as a quadratic optimization problem and have already been used in various applications in hydrology. They require estimation of three parameters. SAC-SMA is a very well known model that estimates runoff. It has a 13-dimensional parameter space. In the LOBARE approach presented here, Bayesian inference is used in an iterative fashion to estimate the parameter space that will most likely enclose a best parameter set. This is done by narrowing the sampling space through updating the "parent" bounds based on their fitness. These bounds are actually the parameter sets that were selected by BARE runs on subspaces of the initial parameter space. The new approach results in faster convergence toward the optimal parameter set using minimum training/calibration data and fewer sets of parameter values. The efficacy of the localized methodology is also compared with the previously used BARE algorithm.
Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun
2017-08-01
Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon
2017-12-01
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes. © 2017, The International Biometric Society.
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
Applications of Bayesian spectrum representation in acoustics
NASA Astrophysics Data System (ADS)
Botts, Jonathan M.
This dissertation utilizes a Bayesian inference framework to enhance the solution of inverse problems where the forward model maps to acoustic spectra. A Bayesian solution to filter design inverts a acoustic spectra to pole-zero locations of a discrete-time filter model. Spatial sound field analysis with a spherical microphone array is a data analysis problem that requires inversion of spatio-temporal spectra to directions of arrival. As with many inverse problems, a probabilistic analysis results in richer solutions than can be achieved with ad-hoc methods. In the filter design problem, the Bayesian inversion results in globally optimal coefficient estimates as well as an estimate the most concise filter capable of representing the given spectrum, within a single framework. This approach is demonstrated on synthetic spectra, head-related transfer function spectra, and measured acoustic reflection spectra. The Bayesian model-based analysis of spatial room impulse responses is presented as an analogous problem with equally rich solution. The model selection mechanism provides an estimate of the number of arrivals, which is necessary to properly infer the directions of simultaneous arrivals. Although, spectrum inversion problems are fairly ubiquitous, the scope of this dissertation has been limited to these two and derivative problems. The Bayesian approach to filter design is demonstrated on an artificial spectrum to illustrate the model comparison mechanism and then on measured head-related transfer functions to show the potential range of application. Coupled with sampling methods, the Bayesian approach is shown to outperform least-squares filter design methods commonly used in commercial software, confirming the need for a global search of the parameter space. The resulting designs are shown to be comparable to those that result from global optimization methods, but the Bayesian approach has the added advantage of a filter length estimate within the same unified framework. The application to reflection data is useful for representing frequency-dependent impedance boundaries in finite difference acoustic simulations. Furthermore, since the filter transfer function is a parametric model, it can be modified to incorporate arbitrary frequency weighting and account for the band-limited nature of measured reflection spectra. Finally, the model is modified to compensate for dispersive error in the finite difference simulation, from the filter design process. Stemming from the filter boundary problem, the implementation of pressure sources in finite difference simulation is addressed in order to assure that schemes properly converge. A class of parameterized source functions is proposed and shown to offer straightforward control of residual error in the simulation. Guided by the notion that the solution to be approximated affects the approximation error, sources are designed which reduce residual dispersive error to the size of round-off errors. The early part of a room impulse response can be characterized by a series of isolated plane waves. Measured with an array of microphones, plane waves map to a directional response of the array or spatial intensity map. Probabilistic inversion of this response results in estimates of the number and directions of image source arrivals. The model-based inversion is shown to avoid ambiguities associated with peak-finding or inspection of the spatial intensity map. For this problem, determining the number of arrivals in a given frame is critical for properly inferring the state of the sound field. This analysis is effectively compression of the spatial room response, which is useful for analysis or encoding of the spatial sound field. Parametric, model-based formulations of these problems enhance the solution in all cases, and a Bayesian interpretation provides a principled approach to model comparison and parameter estimation. v
A Comparison of Imputation Methods for Bayesian Factor Analysis Models
ERIC Educational Resources Information Center
Merkle, Edgar C.
2011-01-01
Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
Bayesian transformation cure frailty models with multivariate failure time data.
Yin, Guosheng
2008-12-10
We propose a class of transformation cure frailty models to accommodate a survival fraction in multivariate failure time data. Established through a general power transformation, this family of cure frailty models includes the proportional hazards and the proportional odds modeling structures as two special cases. Within the Bayesian paradigm, we obtain the joint posterior distribution and the corresponding full conditional distributions of the model parameters for the implementation of Gibbs sampling. Model selection is based on the conditional predictive ordinate statistic and deviance information criterion. As an illustration, we apply the proposed method to a real data set from dentistry.
Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula
NASA Astrophysics Data System (ADS)
Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.
2016-03-01
A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.
Hefke, Gwynneth; Davison, Sean; D'Amato, Maria Eugenia
2015-12-01
The utilization of binary markers in human individual identification is gaining ground in forensic genetics. We analyzed the polymorphisms from the first commercial indel kit Investigator DIPplex (Qiagen) in 512 individuals from Afrikaner, Indian, admixed Cape Colored, and the native Bantu Xhosa and Zulu origin in South Africa and evaluated forensic and population genetics parameters for their forensic application in South Africa. The levels of genetic diversity in population and forensic parameters in South Africa are similar to other published data, with lower diversity values for the native Bantu. Departures from Hardy-Weinberg expectations were observed in HLD97 in Indians, Admixed and Bantus, along with 6.83% null homozygotes in the Bantu populations. Sequencing of the flanking regions showed a previously reported transition G>A in rs17245568. Strong population structure was detected with Fst, AMOVA, and the Bayesian unsupervised clustering method in STRUCTURE. Therefore we evaluated the efficiency of individual assignments to population groups using the ancestral membership proportions from STRUCTURE and the Bayesian classification algorithm in Snipper App Suite. Both methods showed low cross-assignment error (0-4%) between Bantus and either Afrikaners or Indians. The differentiation between populations seems to be driven by four loci under positive selection pressure. Based on these results, we draw recommendations for the application of this kit in SA. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Nonparametric Bayesian models through probit stick-breaking processes
Rodríguez, Abel; Dunson, David B.
2013-01-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology. PMID:24358072
Nonparametric Bayesian models through probit stick-breaking processes.
Rodríguez, Abel; Dunson, David B
2011-03-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.
Bayesian calibration for forensic age estimation.
Ferrante, Luigi; Skrami, Edlira; Gesuita, Rosaria; Cameriere, Roberto
2015-05-10
Forensic medicine is increasingly called upon to assess the age of individuals. Forensic age estimation is mostly required in relation to illegal immigration and identification of bodies or skeletal remains. A variety of age estimation methods are based on dental samples and use of regression models, where the age of an individual is predicted by morphological tooth changes that take place over time. From the medico-legal point of view, regression models, with age as the dependent random variable entail that age tends to be overestimated in the young and underestimated in the old. To overcome this bias, we describe a new full Bayesian calibration method (asymmetric Laplace Bayesian calibration) for forensic age estimation that uses asymmetric Laplace distribution as the probability model. The method was compared with three existing approaches (two Bayesian and a classical method) using simulated data. Although its accuracy was comparable with that of the other methods, the asymmetric Laplace Bayesian calibration appears to be significantly more reliable and robust in case of misspecification of the probability model. The proposed method was also applied to a real dataset of values of the pulp chamber of the right lower premolar measured on x-ray scans of individuals of known age. Copyright © 2015 John Wiley & Sons, Ltd.
Fournier, Auriel M. V.; Sullivan, Alexis R.; Bump, Joseph K.; Perkins, Marie; Shieldcastle, Mark C.; King, Sammy L.
2017-01-01
Stable hydrogen isotope (δD) methods for tracking animal movement are widely used yet often produce low resolution assignments. Incorporating prior knowledge of abundance, distribution or movement patterns can ameliorate this limitation, but data are lacking for most species. We demonstrate how observations reported by citizen scientists can be used to develop robust estimates of species distributions and to constrain δD assignments.We developed a Bayesian framework to refine isotopic estimates of migrant animal origins conditional on species distribution models constructed from citizen scientist observations. To illustrate this approach, we analysed the migratory connectivity of the Virginia rail Rallus limicola, a secretive and declining migratory game bird in North America.Citizen science observations enabled both estimation of sampling bias and construction of bias-corrected species distribution models. Conditioning δD assignments on these species distribution models yielded comparably high-resolution assignments.Most Virginia rails wintering across five Gulf Coast sites spent the previous summer near the Great Lakes, although a considerable minority originated from the Chesapeake Bay watershed or Prairie Pothole region of North Dakota. Conversely, the majority of migrating Virginia rails from a site in the Great Lakes most likely spent the previous winter on the Gulf Coast between Texas and Louisiana.Synthesis and applications. In this analysis, Virginia rail migratory connectivity does not fully correspond to the administrative flyways used to manage migratory birds. This example demonstrates that with the increasing availability of citizen science data to create species distribution models, our framework can produce high-resolution estimates of migratory connectivity for many animals, including cryptic species. Empirical evidence of links between seasonal habitats will help enable effective habitat management, hunting quotas and population monitoring and also highlight critical knowledge gaps.
Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring.
Carroll, Carlos; Johnson, Devin S; Dunk, Jeffrey R; Zielinski, William J
2010-12-01
Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data's spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and invertebrate taxa of conservation concern (Church's sideband snails [Monadenia churchi], red tree voles [Arborimus longicaudus], and Pacific fishers [Martes pennanti pacifica]) that provide examples of a range of distributional extents and dispersal abilities. We used presence-absence data derived from regional monitoring programs to develop models with both landscape and site-level environmental covariates. We used Markov chain Monte Carlo algorithms and a conditional autoregressive or intrinsic conditional autoregressive model framework to fit spatial models. The fit of Bayesian spatial models was between 35 and 55% better than the fit of nonspatial analogue models. Bayesian spatial models outperformed analogous models developed with maximum entropy (Maxent) methods. Although the best spatial and nonspatial models included similar environmental variables, spatial models provided estimates of residual spatial effects that suggested how ecological processes might structure distribution patterns. Spatial models built from presence-absence data improved fit most for localized endemic species with ranges constrained by poorly known biogeographic factors and for widely distributed species suspected to be strongly affected by unmeasured environmental variables or population processes. By treating spatial effects as a variable of interest rather than a nuisance, hierarchical Bayesian spatial models, especially when they are based on a common broad-scale spatial lattice (here the national Forest Inventory and Analysis grid of 24 km(2) hexagons), can increase the relevance of habitat models to multispecies conservation planning. Journal compilation © 2010 Society for Conservation Biology. No claim to original US government works.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
A simulation study on Bayesian Ridge regression models for several collinearity levels
NASA Astrophysics Data System (ADS)
Efendi, Achmad; Effrihan
2017-12-01
When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
A 3D model retrieval approach based on Bayesian networks lightfield descriptor
NASA Astrophysics Data System (ADS)
Xiao, Qinhan; Li, Yanjun
2009-12-01
A new 3D model retrieval methodology is proposed by exploiting a novel Bayesian networks lightfield descriptor (BNLD). There are two key novelties in our approach: (1) a BN-based method for building lightfield descriptor; and (2) a 3D model retrieval scheme based on the proposed BNLD. To overcome the disadvantages of the existing 3D model retrieval methods, we explore BN for building a new lightfield descriptor. Firstly, 3D model is put into lightfield, about 300 binary-views can be obtained along a sphere, then Fourier descriptors and Zernike moments descriptors can be calculated out from binaryviews. Then shape feature sequence would be learned into a BN model based on BN learning algorithm; Secondly, we propose a new 3D model retrieval method by calculating Kullback-Leibler Divergence (KLD) between BNLDs. Beneficial from the statistical learning, our BNLD is noise robustness as compared to the existing methods. The comparison between our method and the lightfield descriptor-based approach is conducted to demonstrate the effectiveness of our proposed methodology.
Bayesian model reduction and empirical Bayes for group (DCM) studies
Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter
2016-01-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570
Bayesian effect estimation accounting for adjustment uncertainty.
Wang, Chi; Parmigiani, Giovanni; Dominici, Francesca
2012-09-01
Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter, ω, denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (ω= 1), BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC, with ω > 1, estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We, then, compare BAC, a recent approach of Crainiceanu, Dominici, and Parmigiani (2008, Biometrika 95, 635-651), and traditional BMA in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999-2005. Using each approach, we estimate the short-term effects of on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty. © 2012, The International Biometric Society.
Asakura, Nobuhiko; Inui, Toshio
2016-01-01
Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities. PMID:28082941
Asakura, Nobuhiko; Inui, Toshio
2016-01-01
Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities.
Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A.; Valdés-Hernández, Pedro A.; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A.
2017-01-01
The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website. PMID:29200994
Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A; Valdés-Hernández, Pedro A; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A
2017-01-01
The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website.
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network
Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu
2018-01-01
This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit. PMID:29765629
Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan
2018-06-01
An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Prediction and assimilation of surf-zone processes using a Bayesian network: Part I: Forward models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
Prediction of coastal processes, including waves, currents, and sediment transport, can be obtained from a variety of detailed geophysical-process models with many simulations showing significant skill. This capability supports a wide range of research and applied efforts that can benefit from accurate numerical predictions. However, the predictions are only as accurate as the data used to drive the models and, given the large temporal and spatial variability of the surf zone, inaccuracies in data are unavoidable such that useful predictions require corresponding estimates of uncertainty. We demonstrate how a Bayesian-network model can be used to provide accurate predictions of wave-height evolution in the surf zone given very sparse and/or inaccurate boundary-condition data. The approach is based on a formal treatment of a data-assimilation problem that takes advantage of significant reduction of the dimensionality of the model system. We demonstrate that predictions of a detailed geophysical model of the wave evolution are reproduced accurately using a Bayesian approach. In this surf-zone application, forward prediction skill was 83%, and uncertainties in the model inputs were accurately transferred to uncertainty in output variables. We also demonstrate that if modeling uncertainties were not conveyed to the Bayesian network (i.e., perfect data or model were assumed), then overly optimistic prediction uncertainties were computed. More consistent predictions and uncertainties were obtained by including model-parameter errors as a source of input uncertainty. Improved predictions (skill of 90%) were achieved because the Bayesian network simultaneously estimated optimal parameters while predicting wave heights.
Restricted Gene Flow among Hospital Subpopulations of Enterococcus faecium
Willems, Rob J. L.; Top, Janetta; van Schaik, Willem; Leavis, Helen; Bonten, Marc; Sirén, Jukka; Hanage, William P.; Corander, Jukka
2012-01-01
ABSTRACT Enterococcus faecium has recently emerged as an important multiresistant nosocomial pathogen. Defining population structure in this species is required to provide insight into the existence, distribution, and dynamics of specific multiresistant or pathogenic lineages in particular environments, like the hospital. Here, we probe the population structure of E. faecium using Bayesian-based population genetic modeling implemented in Bayesian Analysis of Population Structure (BAPS) software. The analysis involved 1,720 isolates belonging to 519 sequence types (STs) (491 for E. faecium and 28 for Enterococcus faecalis). E. faecium isolates grouped into 13 BAPS (sub)groups, but the large majority (80%) of nosocomial isolates clustered in two subgroups (2-1 and 3-3). Phylogenetic and eBURST analysis of BAPS groups 2 and 3 confirmed the existence of three separate hospital lineages (17, 18, and 78), highlighting different evolutionary trajectories for BAPS 2-1 (lineage 78) and 3-3 (lineage 17 and lineage 18) isolates. Phylogenomic analysis of 29 E. faecium isolates showed agreement between BAPS assignment of STs and their relative positions in the phylogenetic tree. Odds ratio calculation confirmed the significant association between hospital isolates with BAPS 3-3 and lineages 17, 18, and 78. Admixture analysis showed a scarce number of recombination events between the different BAPS groups. For the E. faecium hospital population, we propose an evolutionary model in which strains with a high propensity to colonize and infect hospitalized patients arise through horizontal gene transfer. Once adapted to the distinct hospital niche, this subpopulation becomes isolated, and recombination with other populations declines. PMID:22807567
A probabilistic model framework for evaluating year-to-year variation in crop productivity
NASA Astrophysics Data System (ADS)
Yokozawa, M.; Iizumi, T.; Tao, F.
2008-12-01
Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.
Learning Bayesian Networks from Correlated Data
NASA Astrophysics Data System (ADS)
Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola
2016-05-01
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.
Uncertainty Management for Diagnostics and Prognostics of Batteries using Bayesian Techniques
NASA Technical Reports Server (NTRS)
Saha, Bhaskar; Goebel, kai
2007-01-01
Uncertainty management has always been the key hurdle faced by diagnostics and prognostics algorithms. A Bayesian treatment of this problem provides an elegant and theoretically sound approach to the modern Condition- Based Maintenance (CBM)/Prognostic Health Management (PHM) paradigm. The application of the Bayesian techniques to regression and classification in the form of Relevance Vector Machine (RVM), and to state estimation as in Particle Filters (PF), provides a powerful tool to integrate the diagnosis and prognosis of battery health. The RVM, which is a Bayesian treatment of the Support Vector Machine (SVM), is used for model identification, while the PF framework uses the learnt model, statistical estimates of noise and anticipated operational conditions to provide estimates of remaining useful life (RUL) in the form of a probability density function (PDF). This type of prognostics generates a significant value addition to the management of any operation involving electrical systems.
NASA Astrophysics Data System (ADS)
Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo
2015-04-01
The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.
Cui, Jian; Liu, Jinghua; Li, Yuhua; Shi, Tieliu
2011-01-01
Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome. PMID:21297957
Maximum entropy perception-action space: a Bayesian model of eye movement selection
NASA Astrophysics Data System (ADS)
Colas, Francis; Bessière, Pierre; Girard, Benoît
2011-03-01
In this article, we investigate the issue of the selection of eye movements in a free-eye Multiple Object Tracking task. We propose a Bayesian model of retinotopic maps with a complex logarithmic mapping. This model is structured in two parts: a representation of the visual scene, and a decision model based on the representation. We compare different decision models based on different features of the representation and we show that taking into account uncertainty helps predict the eye movements of subjects recorded in a psychophysics experiment. Finally, based on experimental data, we postulate that the complex logarithmic mapping has a functional relevance, as the density of objects in this space in more uniform than expected. This may indicate that the representation space and control strategies are such that the object density is of maximum entropy.
Bayesian modelling of uncertainties of Monte Carlo radiative-transfer simulations
NASA Astrophysics Data System (ADS)
Beaujean, Frederik; Eggers, Hans C.; Kerzendorf, Wolfgang E.
2018-07-01
One of the big challenges in astrophysics is the comparison of complex simulations to observations. As many codes do not directly generate observables (e.g. hydrodynamic simulations), the last step in the modelling process is often a radiative-transfer treatment. For this step, the community relies increasingly on Monte Carlo radiative transfer due to the ease of implementation and scalability with computing power. We consider simulations in which the number of photon packets is Poisson distributed, while the weight assigned to a single photon packet follows any distribution of choice. We show how to estimate the statistical uncertainty of the sum of weights in each bin from the output of a single radiative-transfer simulation. Our Bayesian approach produces a posterior distribution that is valid for any number of packets in a bin, even zero packets, and is easy to implement in practice. Our analytic results for large number of packets show that we generalize existing methods that are valid only in limiting cases. The statistical problem considered here appears in identical form in a wide range of Monte Carlo simulations including particle physics and importance sampling. It is particularly powerful in extracting information when the available data are sparse or quantities are small.
Bayesian networks improve causal environmental ...
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on value
Zhao, Rui; Catalano, Paul; DeGruttola, Victor G.; Michor, Franziska
2017-01-01
The dynamics of tumor burden, secreted proteins or other biomarkers over time, is often used to evaluate the effectiveness of therapy and to predict outcomes for patients. Many methods have been proposed to investigate longitudinal trends to better characterize patients and to understand disease progression. However, most approaches assume a homogeneous patient population and a uniform response trajectory over time and across patients. Here, we present a mixture piecewise linear Bayesian hierarchical model, which takes into account both population heterogeneity and nonlinear relationships between biomarkers and time. Simulation results show that our method was able to classify subjects according to their patterns of treatment response with greater than 80% accuracy in the three scenarios tested. We then applied our model to a large randomized controlled phase III clinical trial of multiple myeloma patients. Analysis results suggest that the longitudinal tumor burden trajectories in multiple myeloma patients are heterogeneous and nonlinear, even among patients assigned to the same treatment cohort. In addition, between cohorts, there are distinct differences in terms of the regression parameters and the distributions among categories in the mixture. Those results imply that longitudinal data from clinical trials may harbor unobserved subgroups and nonlinear relationships; accounting for both may be important for analyzing longitudinal data. PMID:28723910
Modeling Area-Level Health Rankings.
Courtemanche, Charles; Soneji, Samir; Tchernis, Rusty
2015-10-01
Rank county health using a Bayesian factor analysis model. Secondary county data from the National Center for Health Statistics (through 2007) and Behavioral Risk Factor Surveillance System (through 2009). Our model builds on the existing county health rankings (CHRs) by using data-derived weights to compute ranks from mortality and morbidity variables, and by quantifying uncertainty based on population, spatial correlation, and missing data. We apply our model to Wisconsin, which has comprehensive data, and Texas, which has substantial missing information. The data were downloaded from www.countyhealthrankings.org. Our estimated rankings are more similar to the CHRs for Wisconsin than Texas, as the data-derived factor weights are closer to the assigned weights for Wisconsin. The correlations between the CHRs and our ranks are 0.89 for Wisconsin and 0.65 for Texas. Uncertainty is especially severe for Texas given the state's substantial missing data. The reliability of comprehensive CHRs varies from state to state. We advise focusing on the counties that remain among the least healthy after incorporating alternate weighting methods and accounting for uncertainty. Our results also highlight the need for broader geographic coverage in health data. © Health Research and Educational Trust.
NASA Astrophysics Data System (ADS)
Iskandar, Ismed; Satria Gondokaryono, Yudi
2016-02-01
In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range between the true value and the maximum likelihood estimated value lines.
Bayesian median regression for temporal gene expression data
NASA Astrophysics Data System (ADS)
Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.
2007-09-01
Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.
Hulin, Anne; Blanchet, Benoît; Audard, Vincent; Barau, Caroline; Furlan, Valérie; Durrbach, Antoine; Taïeb, Fabrice; Lang, Philippe; Grimbert, Philippe; Tod, Michel
2009-04-01
A significant relationship between mycophenolic acid (MPA) area under the plasma concentration-time curve (AUC) and the risk for rejection has been reported. Based on 3 concentration measurements, 3 approaches have been proposed for the estimation of MPA AUC, involving either a multilinear regression approach model (MLRA) or a Bayesian estimation using either gamma absorption or zero-order absorption population models. The aim of the study was to compare the 3 approaches for the estimation of MPA AUC in 150 renal transplant patients treated with mycophenolate mofetil and tacrolimus. The population parameters were determined in 77 patients (learning study). The AUC estimation methods were compared in the learning population and in 73 patients from another center (validation study). In the latter study, the reference AUCs were estimated by the trapezoidal rule on 8 measurements. MPA concentrations were measured by liquid chromatography. The gamma absorption model gave the best fit. In the learning study, the AUCs estimated by both Bayesian methods were very similar, whereas the multilinear approach was highly correlated but yielded estimates about 20% lower than Bayesian methods. This resulted in dosing recommendations differing by 250 mg/12 h or more in 27% of cases. In the validation study, AUC estimates based on the Bayesian method with gamma absorption model and multilinear regression approach model were, respectively, 12% higher and 7% lower than the reference values. To conclude, the bicompartmental model with gamma absorption rate gave the best fit. The 3 AUC estimation methods are highly correlated but not concordant. For a given patient, the same estimation method should always be used.
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Ting, Chih-Chung; Yu, Chia-Chen; Maloney, Laurence T.
2015-01-01
In Bayesian decision theory, knowledge about the probabilities of possible outcomes is captured by a prior distribution and a likelihood function. The prior reflects past knowledge and the likelihood summarizes current sensory information. The two combined (integrated) form a posterior distribution that allows estimation of the probability of different possible outcomes. In this study, we investigated the neural mechanisms underlying Bayesian integration using a novel lottery decision task in which both prior knowledge and likelihood information about reward probability were systematically manipulated on a trial-by-trial basis. Consistent with Bayesian integration, as sample size increased, subjects tended to weigh likelihood information more compared with prior information. Using fMRI in humans, we found that the medial prefrontal cortex (mPFC) correlated with the mean of the posterior distribution, a statistic that reflects the integration of prior knowledge and likelihood of reward probability. Subsequent analysis revealed that both prior and likelihood information were represented in mPFC and that the neural representations of prior and likelihood in mPFC reflected changes in the behaviorally estimated weights assigned to these different sources of information in response to changes in the environment. Together, these results establish the role of mPFC in prior-likelihood integration and highlight its involvement in representing and integrating these distinct sources of information. PMID:25632152
Bayesian randomized clinical trials: From fixed to adaptive design.
Yin, Guosheng; Lam, Chi Kin; Shi, Haolun
2017-08-01
Randomized controlled studies are the gold standard for phase III clinical trials. Using α-spending functions to control the overall type I error rate, group sequential methods are well established and have been dominating phase III studies. Bayesian randomized design, on the other hand, can be viewed as a complement instead of competitive approach to the frequentist methods. For the fixed Bayesian design, the hypothesis testing can be cast in the posterior probability or Bayes factor framework, which has a direct link to the frequentist type I error rate. Bayesian group sequential design relies upon Bayesian decision-theoretic approaches based on backward induction, which is often computationally intensive. Compared with the frequentist approaches, Bayesian methods have several advantages. The posterior predictive probability serves as a useful and convenient tool for trial monitoring, and can be updated at any time as the data accrue during the trial. The Bayesian decision-theoretic framework possesses a direct link to the decision making in the practical setting, and can be modeled more realistically to reflect the actual cost-benefit analysis during the drug development process. Other merits include the possibility of hierarchical modeling and the use of informative priors, which would lead to a more comprehensive utilization of information from both historical and longitudinal data. From fixed to adaptive design, we focus on Bayesian randomized controlled clinical trials and make extensive comparisons with frequentist counterparts through numerical studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Skrivanek, Zachary; Berry, Scott; Berry, Don; Chien, Jenny; Geiger, Mary Jane; Anderson, James H.; Gaydos, Brenda
2012-01-01
Background Dulaglutide (dula, LY2189265), a long-acting glucagon-like peptide-1 analog, is being developed to treat type 2 diabetes mellitus. Methods To foster the development of dula, we designed a two-stage adaptive, dose-finding, inferentially seamless phase 2/3 study. The Bayesian theoretical framework is used to adaptively randomize patients in stage 1 to 7 dula doses and, at the decision point, to either stop for futility or to select up to 2 dula doses for stage 2. After dose selection, patients continue to be randomized to the selected dula doses or comparator arms. Data from patients assigned the selected doses will be pooled across both stages and analyzed with an analysis of covariance model, using baseline hemoglobin A1c and country as covariates. The operating characteristics of the trial were assessed by extensive simulation studies. Results Simulations demonstrated that the adaptive design would identify the correct doses 88% of the time, compared to as low as 6% for a fixed-dose design (the latter value based on frequentist decision rules analogous to the Bayesian decision rules for adaptive design). Conclusions This article discusses the decision rules used to select the dula dose(s); the mathematical details of the adaptive algorithm—including a description of the clinical utility index used to mathematically quantify the desirability of a dose based on safety and efficacy measurements; and a description of the simulation process and results that quantify the operating characteristics of the design. PMID:23294775
Bayes' theorem application in the measure information diagnostic value assessment
NASA Astrophysics Data System (ADS)
Orzechowski, Piotr D.; Makal, Jaroslaw; Nazarkiewicz, Andrzej
2006-03-01
The paper presents Bayesian method application in the measure information diagnostic value assessment that is used in the computer-aided diagnosis system. The computer system described here has been created basing on the Bayesian Network and is used in Benign Prostatic Hyperplasia (BPH) diagnosis. The graphic diagnostic model enables to juxtapose experts' knowledge with data.
Pidlisecky, Adam; Haines, S.S.
2011-01-01
Conventional processing methods for seismic cone penetrometer data present several shortcomings, most notably the absence of a robust velocity model uncertainty estimate. We propose a new seismic cone penetrometer testing (SCPT) data-processing approach that employs Bayesian methods to map measured data errors into quantitative estimates of model uncertainty. We first calculate travel-time differences for all permutations of seismic trace pairs. That is, we cross-correlate each trace at each measurement location with every trace at every other measurement location to determine travel-time differences that are not biased by the choice of any particular reference trace and to thoroughly characterize data error. We calculate a forward operator that accounts for the different ray paths for each measurement location, including refraction at layer boundaries. We then use a Bayesian inversion scheme to obtain the most likely slowness (the reciprocal of velocity) and a distribution of probable slowness values for each model layer. The result is a velocity model that is based on correct ray paths, with uncertainty bounds that are based on the data error. ?? NRC Research Press 2011.
Wang, Ling; Abdel-Aty, Mohamed; Wang, Xuesong; Yu, Rongjie
2018-02-01
There have been plenty of traffic safety studies based on average daily traffic (ADT), average hourly traffic (AHT), or microscopic traffic at 5 min intervals. Nevertheless, not enough research has compared the performance of these three types of safety studies, and seldom of previous studies have intended to find whether the results of one type of study is transferable to the other two studies. First, this study built three models: a Bayesian Poisson-lognormal model to estimate the daily crash frequency using ADT, a Bayesian Poisson-lognormal model to estimate the hourly crash frequency using AHT, and a Bayesian logistic regression model for the real-time safety analysis using microscopic traffic. The model results showed that the crash contributing factors found by different models were comparable but not the same. Four variables, i.e., the logarithm of volume, the standard deviation of speed, the logarithm of segment length, and the existence of diverge segment, were positively significant in the three models. Additionally, weaving segments experienced higher daily and hourly crash frequencies than merge and basic segments. Then, each of the ADT-based, AHT-based, and real-time models was used to estimate safety conditions at different levels: daily and hourly, meanwhile, the real-time model was also used in 5 min intervals. The results uncovered that the ADT- and AHT-based safety models performed similar in predicting daily and hourly crash frequencies, and the real-time safety model was able to provide hourly crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Stoffenmanager exposure model: company-specific exposure assessments using a Bayesian methodology.
van de Ven, Peter; Fransman, Wouter; Schinkel, Jody; Rubingh, Carina; Warren, Nicholas; Tielemans, Erik
2010-04-01
The web-based tool "Stoffenmanager" was initially developed to assist small- and medium-sized enterprises in the Netherlands to make qualitative risk assessments and to provide advice on control at the workplace. The tool uses a mechanistic model to arrive at a "Stoffenmanager score" for exposure. In a recent study it was shown that variability in exposure measurements given a certain Stoffenmanager score is still substantial. This article discusses an extension to the tool that uses a Bayesian methodology for quantitative workplace/scenario-specific exposure assessment. This methodology allows for real exposure data observed in the company of interest to be combined with the prior estimate (based on the Stoffenmanager model). The output of the tool is a company-specific assessment of exposure levels for a scenario for which data is available. The Bayesian approach provides a transparent way of synthesizing different types of information and is especially preferred in situations where available data is sparse, as is often the case in small- and medium sized-enterprises. Real-world examples as well as simulation studies were used to assess how different parameters such as sample size, difference between prior and data, uncertainty in prior, and variance in the data affect the eventual posterior distribution of a Bayesian exposure assessment.
ERIC Educational Resources Information Center
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
NASA Astrophysics Data System (ADS)
Yasmirullah, Septia Devi Prihastuti; Iriawan, Nur; Sipayung, Feronika Rosalinda
2017-11-01
The success of regional economic establishment could be measured by economic growth. Since the Act No. 32 of 2004 has been implemented, unbalance economic among the regency in Indonesia is increasing. This condition is contrary different with the government goal to build society welfare through the economic activity development in each region. This research aims to examine economic growth through the distribution of bank credits to each Indonesia's regency. The data analyzed in this research is hierarchically structured data which follow normal distribution in first level. Two modeling approaches are employed in this research, a global-one level Bayesian approach and two-level hierarchical Bayesian approach. The result shows that hierarchical Bayesian has succeeded to demonstrate a better estimation than a global-one level Bayesian. It proves that the different economic growth in each province is significantly influenced by the variations of micro level characteristics in each province. These variations are significantly affected by cities and province characteristics in second level.
2018-01-01
We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias. PMID:29614129
Bayesian model reduction and empirical Bayes for group (DCM) studies.
Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter
2016-03-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ebrahimian, Hamed; Astroza, Rodrigo; Conte, Joel P.; de Callafon, Raymond A.
2017-02-01
This paper presents a framework for structural health monitoring (SHM) and damage identification of civil structures. This framework integrates advanced mechanics-based nonlinear finite element (FE) modeling and analysis techniques with a batch Bayesian estimation approach to estimate time-invariant model parameters used in the FE model of the structure of interest. The framework uses input excitation and dynamic response of the structure and updates a nonlinear FE model of the structure to minimize the discrepancies between predicted and measured response time histories. The updated FE model can then be interrogated to detect, localize, classify, and quantify the state of damage and predict the remaining useful life of the structure. As opposed to recursive estimation methods, in the batch Bayesian estimation approach, the entire time history of the input excitation and output response of the structure are used as a batch of data to estimate the FE model parameters through a number of iterations. In the case of non-informative prior, the batch Bayesian method leads to an extended maximum likelihood (ML) estimation method to estimate jointly time-invariant model parameters and the measurement noise amplitude. The extended ML estimation problem is solved efficiently using a gradient-based interior-point optimization algorithm. Gradient-based optimization algorithms require the FE response sensitivities with respect to the model parameters to be identified. The FE response sensitivities are computed accurately and efficiently using the direct differentiation method (DDM). The estimation uncertainties are evaluated based on the Cramer-Rao lower bound (CRLB) theorem by computing the exact Fisher Information matrix using the FE response sensitivities with respect to the model parameters. The accuracy of the proposed uncertainty quantification approach is verified using a sampling approach based on the unscented transformation. Two validation studies, based on realistic structural FE models of a bridge pier and a moment resisting steel frame, are performed to validate the performance and accuracy of the presented nonlinear FE model updating approach and demonstrate its application to SHM. These validation studies show the excellent performance of the proposed framework for SHM and damage identification even in the presence of high measurement noise and/or way-out initial estimates of the model parameters. Furthermore, the detrimental effects of the input measurement noise on the performance of the proposed framework are illustrated and quantified through one of the validation studies.
Baldacchino, Tara; Jacobs, William R; Anderson, Sean R; Worden, Keith; Rowson, Jennifer
2018-01-01
This contribution presents a novel methodology for myolectric-based control using surface electromyographic (sEMG) signals recorded during finger movements. A multivariate Bayesian mixture of experts (MoE) model is introduced which provides a powerful method for modeling force regression at the fingertips, while also performing finger movement classification as a by-product of the modeling algorithm. Bayesian inference of the model allows uncertainties to be naturally incorporated into the model structure. This method is tested using data from the publicly released NinaPro database which consists of sEMG recordings for 6 degree-of-freedom force activations for 40 intact subjects. The results demonstrate that the MoE model achieves similar performance compared to the benchmark set by the authors of NinaPro for finger force regression. Additionally, inherent to the Bayesian framework is the inclusion of uncertainty in the model parameters, naturally providing confidence bounds on the force regression predictions. Furthermore, the integrated clustering step allows a detailed investigation into classification of the finger movements, without incurring any extra computational effort. Subsequently, a systematic approach to assessing the importance of the number of electrodes needed for accurate control is performed via sensitivity analysis techniques. A slight degradation in regression performance is observed for a reduced number of electrodes, while classification performance is unaffected.
NASA Astrophysics Data System (ADS)
Gopalan, Giri; Hrafnkelsson, Birgir; Aðalgeirsdóttir, Guðfinna; Jarosch, Alexander H.; Pálsson, Finnur
2018-03-01
Bayesian hierarchical modeling can assist the study of glacial dynamics and ice flow properties. This approach will allow glaciologists to make fully probabilistic predictions for the thickness of a glacier at unobserved spatio-temporal coordinates, and it will also allow for the derivation of posterior probability distributions for key physical parameters such as ice viscosity and basal sliding. The goal of this paper is to develop a proof of concept for a Bayesian hierarchical model constructed, which uses exact analytical solutions for the shallow ice approximation (SIA) introduced by Bueler et al. (2005). A suite of test simulations utilizing these exact solutions suggests that this approach is able to adequately model numerical errors and produce useful physical parameter posterior distributions and predictions. A byproduct of the development of the Bayesian hierarchical model is the derivation of a novel finite difference method for solving the SIA partial differential equation (PDE). An additional novelty of this work is the correction of numerical errors induced through a numerical solution using a statistical model. This error correcting process models numerical errors that accumulate forward in time and spatial variation of numerical errors between the dome, interior, and margin of a glacier.
A Bayes linear Bayes method for estimation of correlated event rates.
Quigley, John; Wilson, Kevin J; Walls, Lesley; Bedford, Tim
2013-12-01
Typically, full Bayesian estimation of correlated event rates can be computationally challenging since estimators are intractable. When estimation of event rates represents one activity within a larger modeling process, there is an incentive to develop more efficient inference than provided by a full Bayesian model. We develop a new subjective inference method for correlated event rates based on a Bayes linear Bayes model under the assumption that events are generated from a homogeneous Poisson process. To reduce the elicitation burden we introduce homogenization factors to the model and, as an alternative to a subjective prior, an empirical method using the method of moments is developed. Inference under the new method is compared against estimates obtained under a full Bayesian model, which takes a multivariate gamma prior, where the predictive and posterior distributions are derived in terms of well-known functions. The mathematical properties of both models are presented. A simulation study shows that the Bayes linear Bayes inference method and the full Bayesian model provide equally reliable estimates. An illustrative example, motivated by a problem of estimating correlated event rates across different users in a simple supply chain, shows how ignoring the correlation leads to biased estimation of event rates. © 2013 Society for Risk Analysis.
Alderman, Phillip D.; Stanfill, Bryan
2016-10-06
Recent international efforts have brought renewed emphasis on the comparison of different agricultural systems models. Thus far, analysis of model-ensemble simulated results has not clearly differentiated between ensemble prediction uncertainties due to model structural differences per se and those due to parameter value uncertainties. Additionally, despite increasing use of Bayesian parameter estimation approaches with field-scale crop models, inadequate attention has been given to the full posterior distributions for estimated parameters. The objectives of this study were to quantify the impact of parameter value uncertainty on prediction uncertainty for modeling spring wheat phenology using Bayesian analysis and to assess the relativemore » contributions of model-structure-driven and parameter-value-driven uncertainty to overall prediction uncertainty. This study used a random walk Metropolis algorithm to estimate parameters for 30 spring wheat genotypes using nine phenology models based on multi-location trial data for days to heading and days to maturity. Across all cases, parameter-driven uncertainty accounted for between 19 and 52% of predictive uncertainty, while model-structure-driven uncertainty accounted for between 12 and 64%. Here, this study demonstrated the importance of quantifying both model-structure- and parameter-value-driven uncertainty when assessing overall prediction uncertainty in modeling spring wheat phenology. More generally, Bayesian parameter estimation provided a useful framework for quantifying and analyzing sources of prediction uncertainty.« less
HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python.
Wiecki, Thomas V; Sofer, Imri; Frank, Michael J
2013-01-01
The diffusion model is a commonly used tool to infer latent psychological processes underlying decision-making, and to link them to neural mechanisms based on response times. Although efficient open source software has been made available to quantitatively fit the model to data, current estimation methods require an abundance of response time measurements to recover meaningful parameters, and only provide point estimates of each parameter. In contrast, hierarchical Bayesian parameter estimation methods are useful for enhancing statistical power, allowing for simultaneous estimation of individual subject parameters and the group distribution that they are drawn from, while also providing measures of uncertainty in these parameters in the posterior distribution. Here, we present a novel Python-based toolbox called HDDM (hierarchical drift diffusion model), which allows fast and flexible estimation of the the drift-diffusion model and the related linear ballistic accumulator model. HDDM requires fewer data per subject/condition than non-hierarchical methods, allows for full Bayesian data analysis, and can handle outliers in the data. Finally, HDDM supports the estimation of how trial-by-trial measurements (e.g., fMRI) influence decision-making parameters. This paper will first describe the theoretical background of the drift diffusion model and Bayesian inference. We then illustrate usage of the toolbox on a real-world data set from our lab. Finally, parameter recovery studies show that HDDM beats alternative fitting methods like the χ(2)-quantile method as well as maximum likelihood estimation. The software and documentation can be downloaded at: http://ski.clps.brown.edu/hddm_docs/
NASA Astrophysics Data System (ADS)
Han, Feng; Zheng, Yi
2018-06-01
Significant Input uncertainty is a major source of error in watershed water quality (WWQ) modeling. It remains challenging to address the input uncertainty in a rigorous Bayesian framework. This study develops the Bayesian Analysis of Input and Parametric Uncertainties (BAIPU), an approach for the joint analysis of input and parametric uncertainties through a tight coupling of Markov Chain Monte Carlo (MCMC) analysis and Bayesian Model Averaging (BMA). The formal likelihood function for this approach is derived considering a lag-1 autocorrelated, heteroscedastic, and Skew Exponential Power (SEP) distributed error model. A series of numerical experiments were performed based on a synthetic nitrate pollution case and on a real study case in the Newport Bay Watershed, California. The Soil and Water Assessment Tool (SWAT) and Differential Evolution Adaptive Metropolis (DREAM(ZS)) were used as the representative WWQ model and MCMC algorithm, respectively. The major findings include the following: (1) the BAIPU can be implemented and used to appropriately identify the uncertain parameters and characterize the predictive uncertainty; (2) the compensation effect between the input and parametric uncertainties can seriously mislead the modeling based management decisions, if the input uncertainty is not explicitly accounted for; (3) the BAIPU accounts for the interaction between the input and parametric uncertainties and therefore provides more accurate calibration and uncertainty results than a sequential analysis of the uncertainties; and (4) the BAIPU quantifies the credibility of different input assumptions on a statistical basis and can be implemented as an effective inverse modeling approach to the joint inference of parameters and inputs.
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
Case studies in Bayesian microbial risk assessments.
Kennedy, Marc C; Clough, Helen E; Turner, Joanne
2009-12-21
The quantification of uncertainty and variability is a key component of quantitative risk analysis. Recent advances in Bayesian statistics make it ideal for integrating multiple sources of information, of different types and quality, and providing a realistic estimate of the combined uncertainty in the final risk estimates. We present two case studies related to foodborne microbial risks. In the first, we combine models to describe the sequence of events resulting in illness from consumption of milk contaminated with VTEC O157. We used Monte Carlo simulation to propagate uncertainty in some of the inputs to computer models describing the farm and pasteurisation process. Resulting simulated contamination levels were then assigned to consumption events from a dietary survey. Finally we accounted for uncertainty in the dose-response relationship and uncertainty due to limited incidence data to derive uncertainty about yearly incidences of illness in young children. Options for altering the risk were considered by running the model with different hypothetical policy-driven exposure scenarios. In the second case study we illustrate an efficient Bayesian sensitivity analysis for identifying the most important parameters of a complex computer code that simulated VTEC O157 prevalence within a managed dairy herd. This was carried out in 2 stages, first to screen out the unimportant inputs, then to perform a more detailed analysis on the remaining inputs. The method works by building a Bayesian statistical approximation to the computer code using a number of known code input/output pairs (training runs). We estimated that the expected total number of children aged 1.5-4.5 who become ill due to VTEC O157 in milk is 8.6 per year, with 95% uncertainty interval (0,11.5). The most extreme policy we considered was banning on-farm pasteurisation of milk, which reduced the estimate to 6.4 with 95% interval (0,11). In the second case study the effective number of inputs was reduced from 30 to 7 in the screening stage, and just 2 inputs were found to explain 82.8% of the output variance. A combined total of 500 runs of the computer code were used. These case studies illustrate the use of Bayesian statistics to perform detailed uncertainty and sensitivity analyses, integrating multiple information sources in a way that is both rigorous and efficient.
Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.
Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence
2012-12-01
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
Bayesian soft X-ray tomography using non-stationary Gaussian Processes
NASA Astrophysics Data System (ADS)
Li, Dong; Svensson, J.; Thomsen, H.; Medina, F.; Werner, A.; Wolf, R.
2013-08-01
In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.
Bayesian soft X-ray tomography using non-stationary Gaussian Processes.
Li, Dong; Svensson, J; Thomsen, H; Medina, F; Werner, A; Wolf, R
2013-08-01
In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.
Bayesian Inversion of 2D Models from Airborne Transient EM Data
NASA Astrophysics Data System (ADS)
Blatter, D. B.; Key, K.; Ray, A.
2016-12-01
The inherent non-uniqueness in most geophysical inverse problems leads to an infinite number of Earth models that fit observed data to within an adequate tolerance. To resolve this ambiguity, traditional inversion methods based on optimization techniques such as the Gauss-Newton and conjugate gradient methods rely on an additional regularization constraint on the properties that an acceptable model can possess, such as having minimal roughness. While allowing such an inversion scheme to converge on a solution, regularization makes it difficult to estimate the uncertainty associated with the model parameters. This is because regularization biases the inversion process toward certain models that satisfy the regularization constraint and away from others that don't, even when both may suitably fit the data. By contrast, a Bayesian inversion framework aims to produce not a single `most acceptable' model but an estimate of the posterior likelihood of the model parameters, given the observed data. In this work, we develop a 2D Bayesian framework for the inversion of transient electromagnetic (TEM) data. Our method relies on a reversible-jump Markov Chain Monte Carlo (RJ-MCMC) Bayesian inverse method with parallel tempering. Previous gradient-based inversion work in this area used a spatially constrained scheme wherein individual (1D) soundings were inverted together and non-uniqueness was tackled by using lateral and vertical smoothness constraints. By contrast, our work uses a 2D model space of Voronoi cells whose parameterization (including number of cells) is fully data-driven. To make the problem work practically, we approximate the forward solution for each TEM sounding using a local 1D approximation where the model is obtained from the 2D model by retrieving a vertical profile through the Voronoi cells. The implicit parsimony of the Bayesian inversion process leads to the simplest models that adequately explain the data, obviating the need for explicit smoothness constraints. In addition, credible intervals in model space are directly obtained, resolving some of the uncertainty introduced by regularization. An example application shows how the method can be used to quantify the uncertainty in airborne EM soundings for imaging subglacial brine channels and groundwater systems.
Bayesian Inference for Signal-Based Seismic Monitoring
NASA Astrophysics Data System (ADS)
Moore, D.
2015-12-01
Traditional seismic monitoring systems rely on discrete detections produced by station processing software, discarding significant information present in the original recorded signal. SIG-VISA (Signal-based Vertically Integrated Seismic Analysis) is a system for global seismic monitoring through Bayesian inference on seismic signals. By modeling signals directly, our forward model is able to incorporate a rich representation of the physics underlying the signal generation process, including source mechanisms, wave propagation, and station response. This allows inference in the model to recover the qualitative behavior of recent geophysical methods including waveform matching and double-differencing, all as part of a unified Bayesian monitoring system that simultaneously detects and locates events from a global network of stations. We demonstrate recent progress in scaling up SIG-VISA to efficiently process the data stream of global signals recorded by the International Monitoring System (IMS), including comparisons against existing processing methods that show increased sensitivity from our signal-based model and in particular the ability to locate events (including aftershock sequences that can tax analyst processing) precisely from waveform correlation effects. We also provide a Bayesian analysis of an alleged low-magnitude event near the DPRK test site in May 2010 [1] [2], investigating whether such an event could plausibly be detected through automated processing in a signal-based monitoring system. [1] Zhang, Miao and Wen, Lianxing. "Seismological Evidence for a Low-Yield Nuclear Test on 12 May 2010 in North Korea". Seismological Research Letters, January/February 2015. [2] Richards, Paul. "A Seismic Event in North Korea on 12 May 2010". CTBTO SnT 2015 oral presentation, video at https://video-archive.ctbto.org/index.php/kmc/preview/partner_id/103/uiconf_id/4421629/entry_id/0_ymmtpps0/delivery/http
Numerical study on the sequential Bayesian approach for radioactive materials detection
NASA Astrophysics Data System (ADS)
Qingpei, Xiang; Dongfeng, Tian; Jianyu, Zhu; Fanhua, Hao; Ge, Ding; Jun, Zeng
2013-01-01
A new detection method, based on the sequential Bayesian approach proposed by Candy et al., offers new horizons for the research of radioactive detection. Compared with the commonly adopted detection methods incorporated with statistical theory, the sequential Bayesian approach offers the advantages of shorter verification time during the analysis of spectra that contain low total counts, especially in complex radionuclide components. In this paper, a simulation experiment platform implanted with the methodology of sequential Bayesian approach was developed. Events sequences of γ-rays associating with the true parameters of a LaBr3(Ce) detector were obtained based on an events sequence generator using Monte Carlo sampling theory to study the performance of the sequential Bayesian approach. The numerical experimental results are in accordance with those of Candy. Moreover, the relationship between the detection model and the event generator, respectively represented by the expected detection rate (Am) and the tested detection rate (Gm) parameters, is investigated. To achieve an optimal performance for this processor, the interval of the tested detection rate as a function of the expected detection rate is also presented.
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
ERIC Educational Resources Information Center
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Mycofier: a new machine learning-based classifier for fungal ITS sequences.
Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel
2016-08-11
The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
NASA Astrophysics Data System (ADS)
Varouchakis, Emmanouil; Hristopulos, Dionissios
2015-04-01
Space-time geostatistical approaches can improve the reliability of dynamic groundwater level models in areas with limited spatial and temporal data. Space-time residual Kriging (STRK) is a reliable method for spatiotemporal interpolation that can incorporate auxiliary information. The method usually leads to an underestimation of the prediction uncertainty. The uncertainty of spatiotemporal models is usually estimated by determining the space-time Kriging variance or by means of cross validation analysis. For de-trended data the former is not usually applied when complex spatiotemporal trend functions are assigned. A Bayesian approach based on the bootstrap idea and sequential Gaussian simulation are employed to determine the uncertainty of the spatiotemporal model (trend and covariance) parameters. These stochastic modelling approaches produce multiple realizations, rank the prediction results on the basis of specified criteria and capture the range of the uncertainty. The correlation of the spatiotemporal residuals is modeled using a non-separable space-time variogram based on the Spartan covariance family (Hristopulos and Elogne 2007, Varouchakis and Hristopulos 2013). We apply these simulation methods to investigate the uncertainty of groundwater level variations. The available dataset consists of bi-annual (dry and wet hydrological period) groundwater level measurements in 15 monitoring locations for the time period 1981 to 2010. The space-time trend function is approximated using a physical law that governs the groundwater flow in the aquifer in the presence of pumping. The main objective of this research is to compare the performance of two simulation methods for prediction uncertainty estimation. In addition, we investigate the performance of the Spartan spatiotemporal covariance function for spatiotemporal geostatistical analysis. Hristopulos, D.T. and Elogne, S.N. 2007. Analytic properties and covariance functions for a new class of generalized Gibbs random fields. IΕΕΕ Transactions on Information Theory, 53:4667-4467. Varouchakis, E.A. and Hristopulos, D.T. 2013. Improvement of groundwater level prediction in sparsely gauged basins using physical laws and local geographic features as auxiliary variables. Advances in Water Resources, 52:34-49. Research supported by the project SPARTA 1591: "Development of Space-Time Random Fields based on Local Interaction Models and Applications in the Processing of Spatiotemporal Datasets". "SPARTA" is implemented under the "ARISTEIA" Action of the operational programme Education and Lifelong Learning and is co-funded by the European Social Fund (ESF) and National Resources.
Fast Low-Rank Bayesian Matrix Completion With Hierarchical Gaussian Prior Models
NASA Astrophysics Data System (ADS)
Yang, Linxiao; Fang, Jun; Duan, Huiping; Li, Hongbin; Zeng, Bing
2018-06-01
The problem of low rank matrix completion is considered in this paper. To exploit the underlying low-rank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the low-rank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a low-rank solution. Based on the proposed hierarchical prior model, a variational Bayesian method is developed for matrix completion, where the generalized approximate massage passing (GAMP) technique is embedded into the variational Bayesian inference in order to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over existing state-of-the-art matrix completion methods.
Bayesian estimation of dynamic matching function for U-V analysis in Japan
NASA Astrophysics Data System (ADS)
Kyo, Koki; Noda, Hideo; Kitagawa, Genshiro
2012-05-01
In this paper we propose a Bayesian method for analyzing unemployment dynamics. We derive a Beveridge curve for unemployment and vacancy (U-V) analysis from a Bayesian model based on a labor market matching function. In our framework, the efficiency of matching and the elasticities of new hiring with respect to unemployment and vacancy are regarded as time varying parameters. To construct a flexible model and obtain reasonable estimates in an underdetermined estimation problem, we treat the time varying parameters as random variables and introduce smoothness priors. The model is then described in a state space representation, enabling the parameter estimation to be carried out using Kalman filter and fixed interval smoothing. In such a representation, dynamic features of the cyclic unemployment rate and the structural-frictional unemployment rate can be accurately captured.
Prediction and assimilation of surf-zone processes using a Bayesian network: Part II: Inverse models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
A Bayesian network model has been developed to simulate a relatively simple problem of wave propagation in the surf zone (detailed in Part I). Here, we demonstrate that this Bayesian model can provide both inverse modeling and data-assimilation solutions for predicting offshore wave heights and depth estimates given limited wave-height and depth information from an onshore location. The inverse method is extended to allow data assimilation using observational inputs that are not compatible with deterministic solutions of the problem. These inputs include sand bar positions (instead of bathymetry) and estimates of the intensity of wave breaking (instead of wave-height observations). Our results indicate that wave breaking information is essential to reduce prediction errors. In many practical situations, this information could be provided from a shore-based observer or from remote-sensing systems. We show that various combinations of the assimilated inputs significantly reduce the uncertainty in the estimates of water depths and wave heights in the model domain. Application of the Bayesian network model to new field data demonstrated significant predictive skill (R2 = 0.7) for the inverse estimate of a month-long time series of offshore wave heights. The Bayesian inverse results include uncertainty estimates that were shown to be most accurate when given uncertainty in the inputs (e.g., depth and tuning parameters). Furthermore, the inverse modeling was extended to directly estimate tuning parameters associated with the underlying wave-process model. The inverse estimates of the model parameters not only showed an offshore wave height dependence consistent with results of previous studies but the uncertainty estimates of the tuning parameters also explain previously reported variations in the model parameters.
2012-01-01
Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Adrion, Christine; Mansmann, Ulrich
2012-09-10
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Bayesian survival analysis in clinical trials: What methods are used in practice?
Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V
2017-02-01
Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment effect was based on historical data in only four trials. Decision rules were pre-defined in eight cases when trials used Bayesian monitoring, and in only one case when trials adopted a Bayesian approach to the final analysis. Conclusion Few trials implemented a Bayesian survival analysis and few incorporated external data into priors. There is scope to improve the quality of reporting of Bayesian methods in survival trials. Extension of the Consolidated Standards of Reporting Trials statement for reporting Bayesian clinical trials is recommended.
Statistical Bayesian method for reliability evaluation based on ADT data
NASA Astrophysics Data System (ADS)
Lu, Dawei; Wang, Lizhi; Sun, Yusheng; Wang, Xiaohong
2018-05-01
Accelerated degradation testing (ADT) is frequently conducted in the laboratory to predict the products’ reliability under normal operating conditions. Two kinds of methods, degradation path models and stochastic process models, are utilized to analyze degradation data and the latter one is the most popular method. However, some limitations like imprecise solution process and estimation result of degradation ratio still exist, which may affect the accuracy of the acceleration model and the extrapolation value. Moreover, the conducted solution of this problem, Bayesian method, lose key information when unifying the degradation data. In this paper, a new data processing and parameter inference method based on Bayesian method is proposed to handle degradation data and solve the problems above. First, Wiener process and acceleration model is chosen; Second, the initial values of degradation model and parameters of prior and posterior distribution under each level is calculated with updating and iteration of estimation values; Third, the lifetime and reliability values are estimated on the basis of the estimation parameters; Finally, a case study is provided to demonstrate the validity of the proposed method. The results illustrate that the proposed method is quite effective and accuracy in estimating the lifetime and reliability of a product.
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Kurushima, J. D.; Lipinski, M. J.; Gandolfi, B.; Froenicke, L.; Grahn, J. C.; Grahn, R. A.; Lyons, L. A.
2012-01-01
Summary Both cat breeders and the lay public have interests in the origins of their pets, not only in the genetic identity of the purebred individuals, but also the historical origins of common household cats. The cat fancy is a relatively new institution with over 85% of its 40–50 breeds arising only in the past 75 years, primarily through selection on single-gene aesthetic traits. The short, yet intense cat breed history poses a significant challenge to the development of a genetic marker-based breed identification strategy. Using different breed assignment strategies and methods, 477 cats representing 29 fancy breeds were analysed with 38 short tandem repeats, 148 intergenic and five phenotypic single nucleotide polymorphisms. Results suggest the frequentist method of Paetkau (accuracy single nucleotide polymorphisms = 0.78, short tandem repeats = 0.88) surpasses the Bayesian method of Rannala and Mountain (single nucleotide polymorphisms = 0.56, short tandem repeats = 0.83) for accurate assignment of individuals to the correct breed. Additionally, a post-assignment verification step with the five phenotypic single nucleotide polymorphisms accurately identified between 0.31 and 0.58 of the mis-assigned individuals raising the sensitivity of assignment with the frequentist method to 0.89 and 0.92 single nucleotide polymorphisms and short tandem repeats respectively. This study provides a novel multi-step assignment strategy and suggests that, despite their short breed history and breed family groupings, a majority of cats can be assigned to their proper breed or population of origin, i.e. race. PMID:23171373
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity, but development of predictive MoA classification models in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity MoA using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the dataset of 1098 chemicals with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2%. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blank
Upadhyay, S K; Mukherjee, Bhaswati; Gupta, Ashutosh
2009-09-01
Several models for studies related to tensile strength of materials are proposed in the literature where the size or length component has been taken to be an important factor for studying the specimens' failure behaviour. An important model, developed on the basis of cumulative damage approach, is the three-parameter extension of the Birnbaum-Saunders fatigue model that incorporates size of the specimen as an additional variable. This model is a strong competitor of the commonly used Weibull model and stands better than the traditional models, which do not incorporate the size effect. The paper considers two such cumulative damage models, checks their compatibility with a real dataset, compares them with some of the recent toolkits, and finally recommends a model, which appears an appropriate one. Throughout the study is Bayesian based on Markov chain Monte Carlo simulation.
Walsh, Daniel P.; Norton, Andrew S.; Storm, Daniel J.; Van Deelen, Timothy R.; Heisy, Dennis M.
2018-01-01
Implicit and explicit use of expert knowledge to inform ecological analyses is becoming increasingly common because it often represents the sole source of information in many circumstances. Thus, there is a need to develop statistical methods that explicitly incorporate expert knowledge, and can successfully leverage this information while properly accounting for associated uncertainty during analysis. Studies of cause-specific mortality provide an example of implicit use of expert knowledge when causes-of-death are uncertain and assigned based on the observer's knowledge of the most likely cause. To explicitly incorporate this use of expert knowledge and the associated uncertainty, we developed a statistical model for estimating cause-specific mortality using a data augmentation approach within a Bayesian hierarchical framework. Specifically, for each mortality event, we elicited the observer's belief of cause-of-death by having them specify the probability that the death was due to each potential cause. These probabilities were then used as prior predictive values within our framework. This hierarchical framework permitted a simple and rigorous estimation method that was easily modified to include covariate effects and regularizing terms. Although applied to survival analysis, this method can be extended to any event-time analysis with multiple event types, for which there is uncertainty regarding the true outcome. We conducted simulations to determine how our framework compared to traditional approaches that use expert knowledge implicitly and assume that cause-of-death is specified accurately. Simulation results supported the inclusion of observer uncertainty in cause-of-death assignment in modeling of cause-specific mortality to improve model performance and inference. Finally, we applied the statistical model we developed and a traditional method to cause-specific survival data for white-tailed deer, and compared results. We demonstrate that model selection results changed between the two approaches, and incorporating observer knowledge in cause-of-death increased the variability associated with parameter estimates when compared to the traditional approach. These differences between the two approaches can impact reported results, and therefore, it is critical to explicitly incorporate expert knowledge in statistical methods to ensure rigorous inference.
Multinomial Bayesian learning for modeling classical and nonclassical receptive field properties.
Hosoya, Haruo
2012-08-01
We study the interplay of Bayesian inference and natural image learning in a hierarchical vision system, in relation to the response properties of early visual cortex. We particularly focus on a Bayesian network with multinomial variables that can represent discrete feature spaces similar to hypercolumns combining minicolumns, enforce sparsity of activation to learn efficient representations, and explain divisive normalization. We demonstrate that maximal-likelihood learning using sampling-based Bayesian inference gives rise to classical receptive field properties similar to V1 simple cells and V2 cells, while inference performed on the trained network yields nonclassical context-dependent response properties such as cross-orientation suppression and filling in. Comparison with known physiological properties reveals some qualitative and quantitative similarities.
On the Adequacy of Bayesian Evaluations of Categorization Models: Reply to Vanpaemel and Lee (2012)
ERIC Educational Resources Information Center
Wills, Andy J.; Pothos, Emmanuel M.
2012-01-01
Vanpaemel and Lee (2012) argued, and we agree, that the comparison of formal models can be facilitated by Bayesian methods. However, Bayesian methods neither precede nor supplant our proposals (Wills & Pothos, 2012), as Bayesian methods can be applied both to our proposals and to their polar opposites. Furthermore, the use of Bayesian methods to…
Ferragina, A.; de los Campos, G.; Vazquez, A. I.; Cecchinato, A.; Bittante, G.
2017-01-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict “difficult-to-predict” dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm−1 were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R2 value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R2 (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R2 of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. PMID:26387015
A Bayesian approach to earthquake source studies
NASA Astrophysics Data System (ADS)
Minson, Sarah
Bayesian sampling has several advantages over conventional optimization approaches to solving inverse problems. It produces the distribution of all possible models sampled proportionally to how much each model is consistent with the data and the specified prior information, and thus images the entire solution space, revealing the uncertainties and trade-offs in the model. Bayesian sampling is applicable to both linear and non-linear modeling, and the values of the model parameters being sampled can be constrained based on the physics of the process being studied and do not have to be regularized. However, these methods are computationally challenging for high-dimensional problems. Until now the computational expense of Bayesian sampling has been too great for it to be practicable for most geophysical problems. I present a new parallel sampling algorithm called CATMIP for Cascading Adaptive Tempered Metropolis In Parallel. This technique, based on Transitional Markov chain Monte Carlo, makes it possible to sample distributions in many hundreds of dimensions, if the forward model is fast, or to sample computationally expensive forward models in smaller numbers of dimensions. The design of the algorithm is independent of the model being sampled, so CATMIP can be applied to many areas of research. I use CATMIP to produce a finite fault source model for the 2007 Mw 7.7 Tocopilla, Chile earthquake. Surface displacements from the earthquake were recorded by six interferograms and twelve local high-rate GPS stations. Because of the wealth of near-fault data, the source process is well-constrained. I find that the near-field high-rate GPS data have significant resolving power above and beyond the slip distribution determined from static displacements. The location and magnitude of the maximum displacement are resolved. The rupture almost certainly propagated at sub-shear velocities. The full posterior distribution can be used not only to calculate source parameters but also to determine their uncertainties. So while kinematic source modeling and the estimation of source parameters is not new, with CATMIP I am able to use Bayesian sampling to determine which parts of the source process are well-constrained and which are not.
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Genetic structure of cougar populations across the Wyoming basin: Metapopulation or megapopulation
Anderson, C.R.; Lindzey, F.G.; McDonald, D.B.
2004-01-01
We examined the genetic structure of 5 Wyoming cougar (Puma concolor) populations surrounding the Wyoming Basin, as well as a population from southwestern Colorado. When using 9 microsatellite DNA loci, observed heterozygosity was similar among populations (HO = 0.49-0.59) and intermediate to that of other large carnivores. Estimates of genetic structure (FST = 0.028, RST = 0.029) and number of migrants per generation (Nm) suggested high gene flow. Nm was lowest between distant populations and highest among adjacent populations. Examination of these data, plus Mantel test results of genetic versus geographic distance (P ??? 0.01), suggested both isolation by distance and an effect of habitat matrix. Bayesian assignment to population based on individual genotypes showed that cougars in this region were best described as a single panmictic population. Total effective population size for cougars in this region ranged from 1,797 to 4,532 depending on mutation model and analytical method used. Based on measures of gene flow, extinction risk in the near future appears low. We found no support for the existence of metapopulation structure among cougars in this region.
Yang, Ziheng; Zhu, Tianqi
2018-02-20
The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.
Hierarchical models of animal abundance and occurrence
Royle, J. Andrew; Dorazio, R.M.
2006-01-01
Much of animal ecology is devoted to studies of abundance and occurrence of species, based on surveys of spatially referenced sample units. These surveys frequently yield sparse counts that are contaminated by imperfect detection, making direct inference about abundance or occurrence based on observational data infeasible. This article describes a flexible hierarchical modeling framework for estimation and inference about animal abundance and occurrence from survey data that are subject to imperfect detection. Within this framework, we specify models of abundance and detectability of animals at the level of the local populations defined by the sample units. Information at the level of the local population is aggregated by specifying models that describe variation in abundance and detection among sites. We describe likelihood-based and Bayesian methods for estimation and inference under the resulting hierarchical model. We provide two examples of the application of hierarchical models to animal survey data, the first based on removal counts of stream fish and the second based on avian quadrat counts. For both examples, we provide a Bayesian analysis of the models using the software WinBUGS.
Bayesian Modeling for Identification and Estimation of the Learning Effects of Pointing Tasks
NASA Astrophysics Data System (ADS)
Kyo, Koki
Recently, in the field of human-computer interaction, a model containing the systematic factor and human factor has been proposed to evaluate the performance of the input devices of a computer. This is called the SH-model. In this paper, in order to extend the range of application of the SH-model, we propose some new models based on the Box-Cox transformation and apply a Bayesian modeling method for identification and estimation of the learning effects of pointing tasks. We consider the parameters describing the learning effect as random variables and introduce smoothness priors for them. Illustrative results show that the newly-proposed models work well.
Fong, Ted C T; Ho, Rainbow T H
2015-01-01
The aim of this study was to reexamine the dimensionality of the widely used 9-item Utrecht Work Engagement Scale using the maximum likelihood (ML) approach and Bayesian structural equation modeling (BSEM) approach. Three measurement models (1-factor, 3-factor, and bi-factor models) were evaluated in two split samples of 1,112 health-care workers using confirmatory factor analysis and BSEM, which specified small-variance informative priors for cross-loadings and residual covariances. Model fit and comparisons were evaluated by posterior predictive p-value (PPP), deviance information criterion, and Bayesian information criterion (BIC). None of the three ML-based models showed an adequate fit to the data. The use of informative priors for cross-loadings did not improve the PPP for the models. The 1-factor BSEM model with approximately zero residual covariances displayed a good fit (PPP>0.10) to both samples and a substantially lower BIC than its 3-factor and bi-factor counterparts. The BSEM results demonstrate empirical support for the 1-factor model as a parsimonious and reasonable representation of work engagement.
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data
Zhao, Xin; Cheung, Leo Wang-Kit
2007-01-01
Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Jones, Matt; Love, Bradley C
2011-08-01
The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls that have plagued previous theoretical movements.
Bayesian Model Selection under Time Constraints
NASA Astrophysics Data System (ADS)
Hoege, M.; Nowak, W.; Illman, W. A.
2017-12-01
Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
ERIC Educational Resources Information Center
Griffiths, Thomas L.; Chater, Nick; Norris, Dennis; Pouget, Alexandre
2012-01-01
Bowers and Davis (2012) criticize Bayesian modelers for telling "just so" stories about cognition and neuroscience. Their criticisms are weakened by not giving an accurate characterization of the motivation behind Bayesian modeling or the ways in which Bayesian models are used and by not evaluating this theoretical framework against specific…
Motion transparency: making models of motion perception transparent.
Snowden; Verstraten
1999-10-01
In daily life our visual system is bombarded with motion information. We see cars driving by, flocks of birds flying in the sky, clouds passing behind trees that are dancing in the wind. Vision science has a good understanding of the first stage of visual motion processing, that is, the mechanism underlying the detection of local motions. Currently, research is focused on the processes that occur beyond the first stage. At this level, local motions have to be integrated to form objects, define the boundaries between them, construct surfaces and so on. An interesting, if complicated case is known as motion transparency: the situation in which two overlapping surfaces move transparently over each other. In that case two motions have to be assigned to the same retinal location. Several researchers have tried to solve this problem from a computational point of view, using physiological and psychophysical results as a guideline. We will discuss two models: one uses the traditional idea known as 'filter selection' and the other a relatively new approach based on Bayesian inference. Predictions from these models are compared with our own visual behaviour and that of the neural substrates that are presumed to underlie these perceptions.
Model Selection in Historical Research Using Approximate Bayesian Computation
Rubio-Campillo, Xavier
2016-01-01
Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. Case Study This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Impact Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. PMID:26730953
NASA Technical Reports Server (NTRS)
He, Yuning
2015-01-01
The behavior of complex aerospace systems is governed by numerous parameters. For safety analysis it is important to understand how the system behaves with respect to these parameter values. In particular, understanding the boundaries between safe and unsafe regions is of major importance. In this paper, we describe a hierarchical Bayesian statistical modeling approach for the online detection and characterization of such boundaries. Our method for classification with active learning uses a particle filter-based model and a boundary-aware metric for best performance. From a library of candidate shapes incorporated with domain expert knowledge, the location and parameters of the boundaries are estimated using advanced Bayesian modeling techniques. The results of our boundary analysis are then provided in a form understandable by the domain expert. We illustrate our approach using a simulation model of a NASA neuro-adaptive flight control system, as well as a system for the detection of separation violations in the terminal airspace.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
An Anticipatory Model of Cavitation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allgood, G.O.; Dress, W.B., Jr.; Hylton, J.O.
1999-04-05
The Anticipatory System (AS) formalism developed by Robert Rosen provides some insight into the problem of embedding intelligent behavior in machines. AS emulates the anticipatory behavior of biological systems. AS bases its behavior on its expectations about the near future and those expectations are modified as the system gains experience. The expectation is based on an internal model that is drawn from an appeal to physical reality. To be adaptive, the model must be able to update itself. To be practical, the model must run faster than real-time. The need for a physical model and the requirement that the modelmore » execute at extreme speeds, has held back the application of AS to practical problems. Two recent advances make it possible to consider the use of AS for practical intelligent sensors. First, advances in transducer technology make it possible to obtain previously unavailable data from which a model can be derived. For example, acoustic emissions (AE) can be fed into a Bayesian system identifier that enables the separation of a weak characterizing signal, such as the signature of pump cavitation precursors, from a strong masking signal, such as a pump vibration feature. The second advance is the development of extremely fast, but inexpensive, digital signal processing hardware on which it is possible to run an adaptive Bayesian-derived model faster than real-time. This paper reports the investigation of an AS using a model of cavitation based on hydrodynamic principles and Bayesian analysis of data from high-performance AE sensors.« less
How Recent History Affects Perception: The Normative Approach and Its Heuristic Approximation
Raviv, Ofri; Ahissar, Merav; Loewenstein, Yonatan
2012-01-01
There is accumulating evidence that prior knowledge about expectations plays an important role in perception. The Bayesian framework is the standard computational approach to explain how prior knowledge about the distribution of expected stimuli is incorporated with noisy observations in order to improve performance. However, it is unclear what information about the prior distribution is acquired by the perceptual system over short periods of time and how this information is utilized in the process of perceptual decision making. Here we address this question using a simple two-tone discrimination task. We find that the “contraction bias”, in which small magnitudes are overestimated and large magnitudes are underestimated, dominates the pattern of responses of human participants. This contraction bias is consistent with the Bayesian hypothesis in which the true prior information is available to the decision-maker. However, a trial-by-trial analysis of the pattern of responses reveals that the contribution of most recent trials to performance is overweighted compared with the predictions of a standard Bayesian model. Moreover, we study participants' performance in a-typical distributions of stimuli and demonstrate substantial deviations from the ideal Bayesian detector, suggesting that the brain utilizes a heuristic approximation of the Bayesian inference. We propose a biologically plausible model, in which decision in the two-tone discrimination task is based on a comparison between the second tone and an exponentially-decaying average of the first tone and past tones. We show that this model accounts for both the contraction bias and the deviations from the ideal Bayesian detector hypothesis. These findings demonstrate the power of Bayesian-like heuristics in the brain, as well as their limitations in their failure to fully adapt to novel environments. PMID:23133343
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
Bayesian prediction of placebo analgesia in an instrumental learning model
Jung, Won-Mo; Lee, Ye-Seul; Wallraven, Christian; Chae, Younbyoung
2017-01-01
Placebo analgesia can be primarily explained by the Pavlovian conditioning paradigm in which a passively applied cue becomes associated with less pain. In contrast, instrumental conditioning employs an active paradigm that might be more similar to clinical settings. In the present study, an instrumental conditioning paradigm involving a modified trust game in a simulated clinical situation was used to induce placebo analgesia. Additionally, Bayesian modeling was applied to predict the placebo responses of individuals based on their choices. Twenty-four participants engaged in a medical trust game in which decisions to receive treatment from either a doctor (more effective with high cost) or a pharmacy (less effective with low cost) were made after receiving a reference pain stimulus. In the conditioning session, the participants received lower levels of pain following both choices, while high pain stimuli were administered in the test session even after making the decision. The choice-dependent pain in the conditioning session was modulated in terms of both intensity and uncertainty. Participants reported significantly less pain when they chose the doctor or the pharmacy for treatment compared to the control trials. The predicted pain ratings based on Bayesian modeling showed significant correlations with the actual reports from participants for both of the choice categories. The instrumental conditioning paradigm allowed for the active choice of optional cues and was able to induce the placebo analgesia effect. Additionally, Bayesian modeling successfully predicted pain ratings in a simulated clinical situation that fits well with placebo analgesia induced by instrumental conditioning. PMID:28225816
A Fault Diagnosis Methodology for Gear Pump Based on EEMD and Bayesian Network
Liu, Zengkai; Liu, Yonghong; Shan, Hongkai; Cai, Baoping; Huang, Qing
2015-01-01
This paper proposes a fault diagnosis methodology for a gear pump based on the ensemble empirical mode decomposition (EEMD) method and the Bayesian network. Essentially, the presented scheme is a multi-source information fusion based methodology. Compared with the conventional fault diagnosis with only EEMD, the proposed method is able to take advantage of all useful information besides sensor signals. The presented diagnostic Bayesian network consists of a fault layer, a fault feature layer and a multi-source information layer. Vibration signals from sensor measurement are decomposed by the EEMD method and the energy of intrinsic mode functions (IMFs) are calculated as fault features. These features are added into the fault feature layer in the Bayesian network. The other sources of useful information are added to the information layer. The generalized three-layer Bayesian network can be developed by fully incorporating faults and fault symptoms as well as other useful information such as naked eye inspection and maintenance records. Therefore, diagnostic accuracy and capacity can be improved. The proposed methodology is applied to the fault diagnosis of a gear pump and the structure and parameters of the Bayesian network is established. Compared with artificial neural network and support vector machine classification algorithms, the proposed model has the best diagnostic performance when sensor data is used only. A case study has demonstrated that some information from human observation or system repair records is very helpful to the fault diagnosis. It is effective and efficient in diagnosing faults based on uncertain, incomplete information. PMID:25938760
A Fault Diagnosis Methodology for Gear Pump Based on EEMD and Bayesian Network.
Liu, Zengkai; Liu, Yonghong; Shan, Hongkai; Cai, Baoping; Huang, Qing
2015-01-01
This paper proposes a fault diagnosis methodology for a gear pump based on the ensemble empirical mode decomposition (EEMD) method and the Bayesian network. Essentially, the presented scheme is a multi-source information fusion based methodology. Compared with the conventional fault diagnosis with only EEMD, the proposed method is able to take advantage of all useful information besides sensor signals. The presented diagnostic Bayesian network consists of a fault layer, a fault feature layer and a multi-source information layer. Vibration signals from sensor measurement are decomposed by the EEMD method and the energy of intrinsic mode functions (IMFs) are calculated as fault features. These features are added into the fault feature layer in the Bayesian network. The other sources of useful information are added to the information layer. The generalized three-layer Bayesian network can be developed by fully incorporating faults and fault symptoms as well as other useful information such as naked eye inspection and maintenance records. Therefore, diagnostic accuracy and capacity can be improved. The proposed methodology is applied to the fault diagnosis of a gear pump and the structure and parameters of the Bayesian network is established. Compared with artificial neural network and support vector machine classification algorithms, the proposed model has the best diagnostic performance when sensor data is used only. A case study has demonstrated that some information from human observation or system repair records is very helpful to the fault diagnosis. It is effective and efficient in diagnosing faults based on uncertain, incomplete information.
NASA Astrophysics Data System (ADS)
Wheeler, David C.; Waller, Lance A.
2009-03-01
In this paper, we compare and contrast a Bayesian spatially varying coefficient process (SVCP) model with a geographically weighted regression (GWR) model for the estimation of the potentially spatially varying regression effects of alcohol outlets and illegal drug activity on violent crime in Houston, Texas. In addition, we focus on the inherent coefficient shrinkage properties of the Bayesian SVCP model as a way to address increased coefficient variance that follows from collinearity in GWR models. We outline the advantages of the Bayesian model in terms of reducing inflated coefficient variance, enhanced model flexibility, and more formal measuring of model uncertainty for prediction. We find spatially varying effects for alcohol outlets and drug violations, but the amount of variation depends on the type of model used. For the Bayesian model, this variation is controllable through the amount of prior influence placed on the variance of the coefficients. For example, the spatial pattern of coefficients is similar for the GWR and Bayesian models when a relatively large prior variance is used in the Bayesian model.
Philosophy and the practice of Bayesian statistics
Gelman, Andrew; Shalizi, Cosma Rohilla
2015-01-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. PMID:22364575
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.
Scheel, Ida; Ferkingstad, Egil; Frigessi, Arnoldo; Haug, Ola; Hinnerichsen, Mikkel; Meze-Hausken, Elisabeth
2013-01-01
Climate change will affect the insurance industry. We develop a Bayesian hierarchical statistical approach to explain and predict insurance losses due to weather events at a local geographic scale. The number of weather-related insurance claims is modelled by combining generalized linear models with spatially smoothed variable selection. Using Gibbs sampling and reversible jump Markov chain Monte Carlo methods, this model is fitted on daily weather and insurance data from each of the 319 municipalities which constitute southern and central Norway for the period 1997–2006. Precise out-of-sample predictions validate the model. Our results show interesting regional patterns in the effect of different weather covariates. In addition to being useful for insurance pricing, our model can be used for short-term predictions based on weather forecasts and for long-term predictions based on downscaled climate models. PMID:23396890
Paleoclimate reconstruction through Bayesian data assimilation
NASA Astrophysics Data System (ADS)
Fer, I.; Raiho, A.; Rollinson, C.; Dietze, M.
2017-12-01
Methods of paleoclimate reconstruction from plant-based proxy data rely on assumptions of static vegetation-climate link which is often established between modern climate and vegetation. This approach might result in biased climate constructions as it does not account for vegetation dynamics. Predictive tools such as process-based dynamic vegetation models (DVM) and their Bayesian inversion could be used to construct the link between plant-based proxy data and palaeoclimate more realistically. In other words, given the proxy data, it is possible to infer the climate that could result in that particular vegetation composition, by comparing the DVM outputs to the proxy data within a Bayesian state data assimilation framework. In this study, using fossil pollen data from five sites across the northern hardwood region of the US, we assimilate fractional composition and aboveground biomass into dynamic vegetation models, LINKAGES, LPJ-GUESS and ED2. To do this, starting from 4 Global Climate Model outputs, we generate an ensemble of downscaled meteorological drivers for the period 850-2015. Then, as a first pass, we weigh these ensembles based on their fidelity with independent paleoclimate proxies. Next, we run the models with this ensemble of drivers, and comparing the ensemble model output to the vegetation data, adjust the model state estimates towards the data. At each iteration, we also reweight the climate values that make the model and data consistent, producing a reconstructed climate time-series dataset. We validated the method using present-day datasets, as well as a synthetic dataset, and then assessed the consistency of results across ecosystem models. Our method allows the combination of multiple data types to reconstruct the paleoclimate, with associated uncertainty estimates, based on ecophysiological and ecological processes rather than phenomenological correlations with proxy data.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
BayeSED: A General Approach to Fitting the Spectral Energy Distribution of Galaxies
NASA Astrophysics Data System (ADS)
Han, Yunkun; Han, Zhanwen
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large Ks -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has been performed for the first time. We found that the 2003 model by Bruzual & Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the Ks -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.
BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Yunkun; Han, Zhanwen, E-mail: hanyk@ynao.ac.cn, E-mail: zhanwenhan@ynao.ac.cn
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large K{sub s} -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has beenmore » performed for the first time. We found that the 2003 model by Bruzual and Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the K{sub s} -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.« less
Specifying and Refining a Measurement Model for a Computer-Based Interactive Assessment
ERIC Educational Resources Information Center
Levy, Roy; Mislevy, Robert J.
2004-01-01
The challenges of modeling students' performance in computer-based interactive assessments include accounting for multiple aspects of knowledge and skill that arise in different situations and the conditional dependencies among multiple aspects of performance. This article describes a Bayesian approach to modeling and estimating cognitive models…
A Rational Analysis of Rule-Based Concept Learning
ERIC Educational Resources Information Center
Goodman, Noah D.; Tenenbaum, Joshua B.; Feldman, Jacob; Griffiths, Thomas L.
2008-01-01
This article proposes a new model of human concept learning that provides a rational analysis of learning feature-based concepts. This model is built upon Bayesian inference for a grammatically structured hypothesis space--a concept language of logical rules. This article compares the model predictions to human generalization judgments in several…
Bayesian network learning for natural hazard assessments
NASA Astrophysics Data System (ADS)
Vogel, Kristin
2016-04-01
Even though quite different in occurrence and consequences, from a modelling perspective many natural hazards share similar properties and challenges. Their complex nature as well as lacking knowledge about their driving forces and potential effects make their analysis demanding. On top of the uncertainty about the modelling framework, inaccurate or incomplete event observations and the intrinsic randomness of the natural phenomenon add up to different interacting layers of uncertainty, which require a careful handling. Thus, for reliable natural hazard assessments it is crucial not only to capture and quantify involved uncertainties, but also to express and communicate uncertainties in an intuitive way. Decision-makers, who often find it difficult to deal with uncertainties, might otherwise return to familiar (mostly deterministic) proceedings. In the scope of the DFG research training group „NatRiskChange" we apply the probabilistic framework of Bayesian networks for diverse natural hazard and vulnerability studies. The great potential of Bayesian networks was already shown in previous natural hazard assessments. Treating each model component as random variable, Bayesian networks aim at capturing the joint distribution of all considered variables. Hence, each conditional distribution of interest (e.g. the effect of precautionary measures on damage reduction) can be inferred. The (in-)dependencies between the considered variables can be learned purely data driven or be given by experts. Even a combination of both is possible. By translating the (in-)dependences into a graph structure, Bayesian networks provide direct insights into the workings of the system and allow to learn about the underlying processes. Besides numerous studies on the topic, learning Bayesian networks from real-world data remains challenging. In previous studies, e.g. on earthquake induced ground motion and flood damage assessments, we tackled the problems arising with continuous variables and incomplete observations. Further studies rise the challenge of relying on very small data sets. Since parameter estimates for complex models based on few observations are unreliable, it is necessary to focus on simplified, yet still meaningful models. A so called Markov Blanket approach is developed to identify the most relevant model components and to construct a simple Bayesian network based on those findings. Since the proceeding is completely data driven, it can easily be transferred to various applications in natural hazard domains. This study is funded by the Deutsche Forschungsgemeinschaft (DFG) within the research training programme GRK 2043/1 "NatRiskChange - Natural hazards and risks in a changing world" at Potsdam University.
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity but MoA classification in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity mode of action using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the data set with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2% with a R2 of 0.959. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blanket of a structurally
Remaining useful life assessment of lithium-ion batteries in implantable medical devices
NASA Astrophysics Data System (ADS)
Hu, Chao; Ye, Hui; Jain, Gaurav; Schmidt, Craig
2018-01-01
This paper presents a prognostic study on lithium-ion batteries in implantable medical devices, in which a hybrid data-driven/model-based method is employed for remaining useful life assessment. The method is developed on and evaluated against data from two sets of lithium-ion prismatic cells used in implantable applications exhibiting distinct fade performance: 1) eight cells from Medtronic, PLC whose rates of capacity fade appear to be stable and gradually decrease over a 10-year test duration; and 2) eight cells from Manufacturer X whose rates appear to be greater and show sharp increase after some period over a 1.8-year test duration. The hybrid method enables online prediction of remaining useful life for predictive maintenance/control. It consists of two modules: 1) a sparse Bayesian learning module (data-driven) for inferring capacity from charge-related features; and 2) a recursive Bayesian filtering module (model-based) for updating empirical capacity fade models and predicting remaining useful life. A generic particle filter is adopted to implement recursive Bayesian filtering for the cells from the first set, whose capacity fade behavior can be represented by a single fade model; a multiple model particle filter with fixed-lag smoothing is proposed for the cells from the second data set, whose capacity fade behavior switches between multiple fade models.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
Ducrot, Virginie; Billoir, Elise; Péry, Alexandre R R; Garric, Jeanne; Charles, Sandrine
2010-05-01
Effects of zinc were studied in the freshwater worm Branchiura sowerbyi using partial and full life-cycle tests. Only newborn and juveniles were sensitive to zinc, displaying effects on survival, growth, and age at first brood at environmentally relevant concentrations. Threshold effect models were proposed to assess toxic effects on individuals. They were fitted to life-cycle test data using Bayesian inference and adequately described life-history trait data in exposed organisms. The daily asymptotic growth rate of theoretical populations was then simulated with a matrix population model, based upon individual-level outputs. Population-level outputs were in accordance with existing literature for controls. Working in a Bayesian framework allowed incorporating parameter uncertainty in the simulation of the population-level response to zinc exposure, thus increasing the relevance of test results in the context of ecological risk assessment.
Bayesian Computation for Log-Gaussian Cox Processes: A Comparative Analysis of Methods
Teng, Ming; Nathoo, Farouk S.; Johnson, Timothy D.
2017-01-01
The Log-Gaussian Cox Process is a commonly used model for the analysis of spatial point pattern data. Fitting this model is difficult because of its doubly-stochastic property, i.e., it is an hierarchical combination of a Poisson process at the first level and a Gaussian Process at the second level. Various methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulation studies as well as through two applications, the first examining ecological data and the second involving neuroimaging data. PMID:29200537
A Bayesian Account of Vocal Adaptation to Pitch-Shifted Auditory Feedback
Hahnloser, Richard H. R.
2017-01-01
Motor systems are highly adaptive. Both birds and humans compensate for synthetically induced shifts in the pitch (fundamental frequency) of auditory feedback stemming from their vocalizations. Pitch-shift compensation is partial in the sense that large shifts lead to smaller relative compensatory adjustments of vocal pitch than small shifts. Also, compensation is larger in subjects with high motor variability. To formulate a mechanistic description of these findings, we adapt a Bayesian model of error relevance. We assume that vocal-auditory feedback loops in the brain cope optimally with known sensory and motor variability. Based on measurements of motor variability, optimal compensatory responses in our model provide accurate fits to published experimental data. Optimal compensation correctly predicts sensory acuity, which has been estimated in psychophysical experiments as just-noticeable pitch differences. Our model extends the utility of Bayesian approaches to adaptive vocal behaviors. PMID:28135267
Organism-level models: When mechanisms and statistics fail us
NASA Astrophysics Data System (ADS)
Phillips, M. H.; Meyer, J.; Smith, W. P.; Rockhill, J. K.
2014-03-01
Purpose: To describe the unique characteristics of models that represent the entire course of radiation therapy at the organism level and to highlight the uses to which such models can be put. Methods: At the level of an organism, traditional model-building runs into severe difficulties. We do not have sufficient knowledge to devise a complete biochemistry-based model. Statistical model-building fails due to the vast number of variables and the inability to control many of them in any meaningful way. Finally, building surrogate models, such as animal-based models, can result in excluding some of the most critical variables. Bayesian probabilistic models (Bayesian networks) provide a useful alternative that have the advantages of being mathematically rigorous, incorporating the knowledge that we do have, and being practical. Results: Bayesian networks representing radiation therapy pathways for prostate cancer and head & neck cancer were used to highlight the important aspects of such models and some techniques of model-building. A more specific model representing the treatment of occult lymph nodes in head & neck cancer were provided as an example of how such a model can inform clinical decisions. A model of the possible role of PET imaging in brain cancer was used to illustrate the means by which clinical trials can be modelled in order to come up with a trial design that will have meaningful outcomes. Conclusions: Probabilistic models are currently the most useful approach to representing the entire therapy outcome process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios, E-mail: georgios.karagiannis@pnnl.gov; Lin, Guang, E-mail: guang.lin@pnnl.gov
2014-02-15
Generalized polynomial chaos (gPC) expansions allow us to represent the solution of a stochastic system using a series of polynomial chaos basis functions. The number of gPC terms increases dramatically as the dimension of the random input variables increases. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs when the corresponding deterministic solver is computationally expensive, evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solutions, in both spatial and random domains, bymore » coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spatial points, via (1) the Bayesian model average (BMA) or (2) the median probability model, and their construction as spatial functions on the spatial domain via spline interpolation. The former accounts for the model uncertainty and provides Bayes-optimal predictions; while the latter provides a sparse representation of the stochastic solutions by evaluating the expansion on a subset of dominating gPC bases. Moreover, the proposed methods quantify the importance of the gPC bases in the probabilistic sense through inclusion probabilities. We design a Markov chain Monte Carlo (MCMC) sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed methods are suitable for, but not restricted to, problems whose stochastic solutions are sparse in the stochastic space with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the accuracy and performance of the proposed methods and make comparisons with other approaches on solving elliptic SPDEs with 1-, 14- and 40-random dimensions.« less
Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering.
Gao, Shan; Guo, Guibing; Li, Runzhi; Wang, Zongmin
2017-01-01
Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users' preference by exploiting explicit feedbacks (numerical ratings), or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks). Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users' actions), based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users' other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app.
Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering
2017-01-01
Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users' preference by exploiting explicit feedbacks (numerical ratings), or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks). Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users' actions), based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users' other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app. PMID:29118963
Bayesian methods for outliers detection in GNSS time series
NASA Astrophysics Data System (ADS)
Qianqian, Zhang; Qingming, Gui
2013-07-01
This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.
NASA Astrophysics Data System (ADS)
Yu, Jianbo
2015-12-01
Prognostics is much efficient to achieve zero-downtime performance, maximum productivity and proactive maintenance of machines. Prognostics intends to assess and predict the time evolution of machine health degradation so that machine failures can be predicted and prevented. A novel prognostics system is developed based on the data-model-fusion scheme using the Bayesian inference-based self-organizing map (SOM) and an integration of logistic regression (LR) and high-order particle filtering (HOPF). In this prognostics system, a baseline SOM is constructed to model the data distribution space of healthy machine under an assumption that predictable fault patterns are not available. Bayesian inference-based probability (BIP) derived from the baseline SOM is developed as a quantification indication of machine health degradation. BIP is capable of offering failure probability for the monitored machine, which has intuitionist explanation related to health degradation state. Based on those historic BIPs, the constructed LR and its modeling noise constitute a high-order Markov process (HOMP) to describe machine health propagation. HOPF is used to solve the HOMP estimation to predict the evolution of the machine health in the form of a probability density function (PDF). An on-line model update scheme is developed to adapt the Markov process changes to machine health dynamics quickly. The experimental results on a bearing test-bed illustrate the potential applications of the proposed system as an effective and simple tool for machine health prognostics.
A Monte Carlo approach to the inverse problem of diffuse pollution risk in agricultural catchments
NASA Astrophysics Data System (ADS)
Milledge, D.; Lane, S. N.; Heathwaite, A. L.; Reaney, S.
2012-04-01
The hydrological and biogeochemical processes that operate in catchments influence the ecological quality of freshwater systems through delivery of fine sediment, nutrients and organic matter. As an alternative to the, often complex, reductionist models we outline a - data-driven - approach based on 'inverse modelling'. We invert SCIMAP, a parsimonious risk based model that has an explicit treatment of hydrological connectivity, and use a Bayesian approach to determine the risk that must be assigned to different land uses in a catchment in order to explain the spatial patterns of measured in-stream nutrient concentrations. First, we apply the model to a set of eleven UK catchments to show that: 1) some land use generates a consistently high or low risk of diffuse nitrate (N) and Phosphate (P) pollution; but 2) the risks associated with different land uses vary both between catchments and between P and N delivery; and 3) that the dominant sources of P and N risk in the catchment are often a function of the spatial configuration of land uses. These results suggest that on a case by case basis, inverse modelling may be used to help prioritise the focus of interventions to reduce diffuse pollution risk for freshwater ecosystems. However, a key uncertainty in this approach is the extent to which it can recover the 'true' risks associated with a land cover given error in both the input parameters and equifinality in model outcomes. We test this using a set of synthetic scenarios in which the true risks can be pre-assigned then compared with those recovered from the inverse model. We use these scenarios to identify the number of simulations and observations required to optimize recovery of the true weights, then explore the conditions under which the inverse model becomes equifinal (hampering recovery of the true weights) We find that this is strongly dependent on the covariance in land covers between subcatchments, introducing the possibility that instream sampling could be designed or subsampled to maximize identifiability of the risks associated with a given land cover.
Hot news recommendation system from heterogeneous websites based on bayesian model.
Xia, Zhengyou; Xu, Shengwu; Liu, Ningzhong; Zhao, Zhengkang
2014-01-01
The most current news recommendations are suitable for news which comes from a single news website, not for news from different heterogeneous news websites. Previous researches about news recommender systems based on different strategies have been proposed to provide news personalization services for online news readers. However, little research work has been reported on utilizing hundreds of heterogeneous news websites to provide top hot news services for group customers (e.g., government staffs). In this paper, we propose a hot news recommendation model based on Bayesian model, which is from hundreds of different news websites. In the model, we determine whether the news is hot news by calculating the joint probability of the news. We evaluate and compare our proposed recommendation model with the results of human experts on the real data sets. Experimental results demonstrate the reliability and effectiveness of our method. We also implement this model in hot news recommendation system of Hangzhou city government in year 2013, which achieves very good results.
Hot News Recommendation System from Heterogeneous Websites Based on Bayesian Model
Xia, Zhengyou; Xu, Shengwu; Liu, Ningzhong; Zhao, Zhengkang
2014-01-01
The most current news recommendations are suitable for news which comes from a single news website, not for news from different heterogeneous news websites. Previous researches about news recommender systems based on different strategies have been proposed to provide news personalization services for online news readers. However, little research work has been reported on utilizing hundreds of heterogeneous news websites to provide top hot news services for group customers (e.g., government staffs). In this paper, we propose a hot news recommendation model based on Bayesian model, which is from hundreds of different news websites. In the model, we determine whether the news is hot news by calculating the joint probability of the news. We evaluate and compare our proposed recommendation model with the results of human experts on the real data sets. Experimental results demonstrate the reliability and effectiveness of our method. We also implement this model in hot news recommendation system of Hangzhou city government in year 2013, which achieves very good results. PMID:25093207
Evaluating the uncertainty of input quantities in measurement models
NASA Astrophysics Data System (ADS)
Possolo, Antonio; Elster, Clemens
2014-06-01
The Guide to the Expression of Uncertainty in Measurement (GUM) gives guidance about how values and uncertainties should be assigned to the input quantities that appear in measurement models. This contribution offers a concrete proposal for how that guidance may be updated in light of the advances in the evaluation and expression of measurement uncertainty that were made in the course of the twenty years that have elapsed since the publication of the GUM, and also considering situations that the GUM does not yet contemplate. Our motivation is the ongoing conversation about a new edition of the GUM. While generally we favour a Bayesian approach to uncertainty evaluation, we also recognize the value that other approaches may bring to the problems considered here, and focus on methods for uncertainty evaluation and propagation that are widely applicable, including to cases that the GUM has not yet addressed. In addition to Bayesian methods, we discuss maximum-likelihood estimation, robust statistical methods, and measurement models where values of nominal properties play the same role that input quantities play in traditional models. We illustrate these general-purpose techniques in concrete examples, employing data sets that are realistic but that also are of conveniently small sizes. The supplementary material available online lists the R computer code that we have used to produce these examples (stacks.iop.org/Met/51/3/339/mmedia). Although we strive to stay close to clause 4 of the GUM, which addresses the evaluation of uncertainty for input quantities, we depart from it as we review the classes of measurement models that we believe are generally useful in contemporary measurement science. We also considerably expand and update the treatment that the GUM gives to Type B evaluations of uncertainty: reviewing the state-of-the-art, disciplined approach to the elicitation of expert knowledge, and its encapsulation in probability distributions that are usable in uncertainty propagation exercises. In this we deviate markedly and emphatically from the GUM Supplement 1, which gives pride of place to the Principle of Maximum Entropy as a means to assign probability distributions to input quantities.
Link, William; Sauer, John R.
2016-01-01
The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.
THREAT ANTICIPATION AND DECEPTIVE REASONING USING BAYESIAN BELIEF NETWORKS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allgood, Glenn O; Olama, Mohammed M; Lake, Joe E
Recent events highlight the need for tools to anticipate threats posed by terrorists. Assessing these threats requires combining information from disparate data sources such as analytic models, simulations, historical data, sensor networks, and user judgments. These disparate data can be combined in a coherent, analytically defensible, and understandable manner using a Bayesian belief network (BBN). In this paper, we develop a BBN threat anticipatory model based on a deceptive reasoning algorithm using a network engineering process that treats the probability distributions of the BBN nodes within the broader context of the system development process.
Drug delivery optimization through Bayesian networks.
Bellazzi, R.
1992-01-01
This paper describes how Bayesian Networks can be used in combination with compartmental models to plan Recombinant Human Erythropoietin (r-HuEPO) delivery in the treatment of anemia of chronic uremic patients. Past measurements of hematocrit or hemoglobin concentration in a patient during the therapy can be exploited to adjust the parameters of a compartmental model of the erythropoiesis. This adaptive process allows more accurate patient-specific predictions, and hence a more rational dosage planning. We describe a drug delivery optimization protocol, based on our approach. Some results obtained on real data are presented. PMID:1482938
NASA Astrophysics Data System (ADS)
Shafii, M.; Tolson, B.; Matott, L. S.
2012-04-01
Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Stochastic Modeling of the Environmental Impacts of the Mingtang Tunneling Project
NASA Astrophysics Data System (ADS)
Li, Xiaojun; Li, Yandong; Chang, Ching-Fu; Chen, Ziyang; Tan, Benjamin Zhi Wen; Sege, Jon; Wang, Changhong; Rubin, Yoram
2017-04-01
This paper investigates the environmental impacts of a major tunneling project in China. Of particular interest is the drawdown of the water table, due to its potential impacts on ecosystem health and on agricultural activity. Due to scarcity of data, the study pursues a Bayesian stochastic approach, which is built around a numerical model. We adopted the Bayesian approach with the goal of deriving the posterior distributions of the dependent variables conditional on local data. The choice of the Bayesian approach for this study is somewhat non-trivial because of the scarcity of in-situ measurements. The thought guiding this selection is that prior distributions for the model input variables are valuable tools even if that all inputs are available, the Bayesian approach could provide a good starting point for further updates as and if additional data becomes available. To construct effective priors, a systematic approach was developed and implemented for constructing informative priors based on other, well-documented sites which bear geological and hydrological similarity to the target site (the Mingtang tunneling project). The approach is built around two classes of similarity criteria: a physically-based set of criteria and an additional set covering epistemic criteria. The prior construction strategy was implemented for the hydraulic conductivity of various types of rocks at the site (Granite and Gneiss) and for modeling the geometry and conductivity of the fault zones. Additional elements of our strategy include (1) modeling the water table through bounding surfaces representing upper and lower limits, and (2) modeling the effective conductivity as a random variable (varying between realizations, not in space). The approach was tested successfully against its ability to predict the tunnel infiltration fluxes and against observations of drying soils.
pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa
NASA Astrophysics Data System (ADS)
Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.
2011-07-01
Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.
Bayesian structural equation modeling: a more flexible representation of substantive theory.
Muthén, Bengt; Asparouhov, Tihomir
2012-09-01
This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed Bayesian approach is particularly beneficial in applications where parameters are added to a conventional model such that a nonidentified model is obtained if maximum-likelihood estimation is applied. This approach is useful for measurement aspects of latent variable modeling, such as with confirmatory factor analysis, and the measurement part of structural equation modeling. Two application areas are studied, cross-loadings and residual correlations in confirmatory factor analysis. An example using a full structural equation model is also presented, showing an efficient way to find model misspecification. The approach encompasses 3 elements: model testing using posterior predictive checking, model estimation, and model modification. Monte Carlo simulations and real data are analyzed using Mplus. The real-data analyses use data from Holzinger and Swineford's (1939) classic mental abilities study, Big Five personality factor data from a British survey, and science achievement data from the National Educational Longitudinal Study of 1988.
Bayesian analysis of rare events
NASA Astrophysics Data System (ADS)
Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.
Yu-Kang, Tu
2016-12-01
Network meta-analysis for multiple treatment comparisons has been a major development in evidence synthesis methodology. The validity of a network meta-analysis, however, can be threatened by inconsistency in evidence within the network. One particular issue of inconsistency is how to directly evaluate the inconsistency between direct and indirect evidence with regard to the effects difference between two treatments. A Bayesian node-splitting model was first proposed and a similar frequentist side-splitting model has been put forward recently. Yet, assigning the inconsistency parameter to one or the other of the two treatments or splitting the parameter symmetrically between the two treatments can yield different results when multi-arm trials are involved in the evaluation. We aimed to show that a side-splitting model can be viewed as a special case of design-by-treatment interaction model, and different parameterizations correspond to different design-by-treatment interactions. We demonstrated how to evaluate the side-splitting model using the arm-based generalized linear mixed model, and an example data set was used to compare results from the arm-based models with those from the contrast-based models. The three parameterizations of side-splitting make slightly different assumptions: the symmetrical method assumes that both treatments in a treatment contrast contribute to inconsistency between direct and indirect evidence, whereas the other two parameterizations assume that only one of the two treatments contributes to this inconsistency. With this understanding in mind, meta-analysts can then make a choice about how to implement the side-splitting method for their analysis. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R(2) value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R(2) (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R(2) of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Uncertainty Analysis and Parameter Estimation For Nearshore Hydrodynamic Models
NASA Astrophysics Data System (ADS)
Ardani, S.; Kaihatu, J. M.
2012-12-01
Numerical models represent deterministic approaches used for the relevant physical processes in the nearshore. Complexity of the physics of the model and uncertainty involved in the model inputs compel us to apply a stochastic approach to analyze the robustness of the model. The Bayesian inverse problem is one powerful way to estimate the important input model parameters (determined by apriori sensitivity analysis) and can be used for uncertainty analysis of the outputs. Bayesian techniques can be used to find the range of most probable parameters based on the probability of the observed data and the residual errors. In this study, the effect of input data involving lateral (Neumann) boundary conditions, bathymetry and off-shore wave conditions on nearshore numerical models are considered. Monte Carlo simulation is applied to a deterministic numerical model (the Delft3D modeling suite for coupled waves and flow) for the resulting uncertainty analysis of the outputs (wave height, flow velocity, mean sea level and etc.). Uncertainty analysis of outputs is performed by random sampling from the input probability distribution functions and running the model as required until convergence to the consistent results is achieved. The case study used in this analysis is the Duck94 experiment, which was conducted at the U.S. Army Field Research Facility at Duck, North Carolina, USA in the fall of 1994. The joint probability of model parameters relevant for the Duck94 experiments will be found using the Bayesian approach. We will further show that, by using Bayesian techniques to estimate the optimized model parameters as inputs and applying them for uncertainty analysis, we can obtain more consistent results than using the prior information for input data which means that the variation of the uncertain parameter will be decreased and the probability of the observed data will improve as well. Keywords: Monte Carlo Simulation, Delft3D, uncertainty analysis, Bayesian techniques, MCMC
Dembo, Mana; Radovčić, Davorka; Garvin, Heather M; Laird, Myra F; Schroeder, Lauren; Scott, Jill E; Brophy, Juliet; Ackermann, Rebecca R; Musiba, Chares M; de Ruiter, Darryl J; Mooers, Arne Ø; Collard, Mark
2016-08-01
Homo naledi is a recently discovered species of fossil hominin from South Africa. A considerable amount is already known about H. naledi but some important questions remain unanswered. Here we report a study that addressed two of them: "Where does H. naledi fit in the hominin evolutionary tree?" and "How old is it?" We used a large supermatrix of craniodental characters for both early and late hominin species and Bayesian phylogenetic techniques to carry out three analyses. First, we performed a dated Bayesian analysis to generate estimates of the evolutionary relationships of fossil hominins including H. naledi. Then we employed Bayes factor tests to compare the strength of support for hypotheses about the relationships of H. naledi suggested by the best-estimate trees. Lastly, we carried out a resampling analysis to assess the accuracy of the age estimate for H. naledi yielded by the dated Bayesian analysis. The analyses strongly supported the hypothesis that H. naledi forms a clade with the other Homo species and Australopithecus sediba. The analyses were more ambiguous regarding the position of H. naledi within the (Homo, Au. sediba) clade. A number of hypotheses were rejected, but several others were not. Based on the available craniodental data, Homo antecessor, Asian Homo erectus, Homo habilis, Homo floresiensis, Homo sapiens, and Au. sediba could all be the sister taxon of H. naledi. According to the dated Bayesian analysis, the most likely age for H. naledi is 912 ka. This age estimate was supported by the resampling analysis. Our findings have a number of implications. Most notably, they support the assignment of the new specimens to Homo, cast doubt on the claim that H. naledi is simply a variant of H. erectus, and suggest H. naledi is younger than has been previously proposed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir
2014-01-01
Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia. PMID:24899832
Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir
2014-01-01
Phylogenetic relationships among Malaysia's long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo's population was distinguished from Peninsula's population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia's M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.
ERIC Educational Resources Information Center
Wu, Haiyan
2013-01-01
General diagnostic models (GDMs) and Bayesian networks are mathematical frameworks that cover a wide variety of psychometric models. Both extend latent class models, and while GDMs also extend item response theory (IRT) models, Bayesian networks can be parameterized using discretized IRT. The purpose of this study is to examine similarities and…
NASA Astrophysics Data System (ADS)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less
Varughese, Eunice A.; Brinkman, Nichole E; Anneken, Emily M; Cashdollar, Jennifer S; Fout, G. Shay; Furlong, Edward T.; Kolpin, Dana W.; Glassmeyer, Susan T.; Keely, Scott P
2017-01-01
incorporated into a Bayesian model to more accurately determine viral load in both source and treated water. Results of the Bayesian model indicated that viruses are present in source water and treated water. By using a Bayesian framework that incorporates inhibition, as well as many other parameters that affect viral detection, this study offers an approach for more accurately estimating the occurrence of viral pathogens in environmental waters.
Specifying and Refining a Measurement Model for a Simulation-Based Assessment. CSE Report 619.
ERIC Educational Resources Information Center
Levy, Roy; Mislevy, Robert J.
2004-01-01
The challenges of modeling students' performance in simulation-based assessments include accounting for multiple aspects of knowledge and skill that arise in different situations and the conditional dependencies among multiple aspects of performance in a complex assessment. This paper describes a Bayesian approach to modeling and estimating…
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
A Bayesian nonparametric approach to dynamical noise reduction
NASA Astrophysics Data System (ADS)
Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2018-06-01
We propose a Bayesian nonparametric approach for the noise reduction of a given chaotic time series contaminated by dynamical noise, based on Markov Chain Monte Carlo methods. The underlying unknown noise process (possibly) exhibits heavy tailed behavior. We introduce the Dynamic Noise Reduction Replicator model with which we reconstruct the unknown dynamic equations and in parallel we replicate the dynamics under reduced noise level dynamical perturbations. The dynamic noise reduction procedure is demonstrated specifically in the case of polynomial maps. Simulations based on synthetic time series are presented.
NASA Astrophysics Data System (ADS)
Blaauw, Maarten; Christen, J. Andrés; Bennett, K. D.; Reimer, Paula J.
2018-05-01
Reliable chronologies are essential for most Quaternary studies, but little is known about how age-depth model choice, as well as dating density and quality, affect the precision and accuracy of chronologies. A meta-analysis suggests that most existing late-Quaternary studies contain fewer than one date per millennium, and provide millennial-scale precision at best. We use existing and simulated sediment cores to estimate what dating density and quality are required to obtain accurate chronologies at a desired precision. For many sites, a doubling in dating density would significantly improve chronologies and thus their value for reconstructing and interpreting past environmental changes. Commonly used classical age-depth models stop becoming more precise after a minimum dating density is reached, but the precision of Bayesian age-depth models which take advantage of chronological ordering continues to improve with more dates. Our simulations show that classical age-depth models severely underestimate uncertainty and are inaccurate at low dating densities, and also perform poorly at high dating densities. On the other hand, Bayesian age-depth models provide more realistic precision estimates, including at low to average dating densities, and are much more robust against dating scatter and outliers. Indeed, Bayesian age-depth models outperform classical ones at all tested dating densities, qualities and time-scales. We recommend that chronologies should be produced using Bayesian age-depth models taking into account chronological ordering and based on a minimum of 2 dates per millennium.
Bayesian structured additive regression modeling of epidemic data: application to cholera
2012-01-01
Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662
Models and simulation of 3D neuronal dendritic trees using Bayesian networks.
López-Cruz, Pedro L; Bielza, Concha; Larrañaga, Pedro; Benavides-Piccione, Ruth; DeFelipe, Javier
2011-12-01
Neuron morphology is crucial for neuronal connectivity and brain information processing. Computational models are important tools for studying dendritic morphology and its role in brain function. We applied a class of probabilistic graphical models called Bayesian networks to generate virtual dendrites from layer III pyramidal neurons from three different regions of the neocortex of the mouse. A set of 41 morphological variables were measured from the 3D reconstructions of real dendrites and their probability distributions used in a machine learning algorithm to induce the model from the data. A simulation algorithm is also proposed to obtain new dendrites by sampling values from Bayesian networks. The main advantage of this approach is that it takes into account and automatically locates the relationships between variables in the data instead of using predefined dependencies. Therefore, the methodology can be applied to any neuronal class while at the same time exploiting class-specific properties. Also, a Bayesian network was defined for each part of the dendrite, allowing the relationships to change in the different sections and to model heterogeneous developmental factors or spatial influences. Several univariate statistical tests and a novel multivariate test based on Kullback-Leibler divergence estimation confirmed that virtual dendrites were similar to real ones. The analyses of the models showed relationships that conform to current neuroanatomical knowledge and support model correctness. At the same time, studying the relationships in the models can help to identify new interactions between variables related to dendritic morphology.
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
NASA Technical Reports Server (NTRS)
Kulkarni, Chetan S.; Celaya, Jose R.; Goebel, Kai; Biswas, Gautam
2012-01-01
Electrolytic capacitors are used in several applications ranging from power supplies for safety critical avionics equipment to power drivers for electro-mechanical actuator. Past experiences show that capacitors tend to degrade and fail faster when subjected to high electrical or thermal stress conditions during operations. This makes them good candidates for prognostics and health management. Model-based prognostics captures system knowledge in the form of physics-based models of components in order to obtain accurate predictions of end of life based on their current state of heal th and their anticipated future use and operational conditions. The focus of this paper is on deriving first principles degradation models for thermal stress conditions and implementing Bayesian framework for making remaining useful life predictions. Data collected from simultaneous experiments are used to validate the models. Our overall goal is to derive accurate models of capacitor degradation, and use them to remaining useful life in DC-DC converters.
Bayesian Models for Astrophysical Data Using R, JAGS, Python, and Stan
NASA Astrophysics Data System (ADS)
Hilbe, Joseph M.; de Souza, Rafael S.; Ishida, Emille E. O.
2017-05-01
This comprehensive guide to Bayesian methods in astronomy enables hands-on work by supplying complete R, JAGS, Python, and Stan code, to use directly or to adapt. It begins by examining the normal model from both frequentist and Bayesian perspectives and then progresses to a full range of Bayesian generalized linear and mixed or hierarchical models, as well as additional types of models such as ABC and INLA. The book provides code that is largely unavailable elsewhere and includes details on interpreting and evaluating Bayesian models. Initial discussions offer models in synthetic form so that readers can easily adapt them to their own data; later the models are applied to real astronomical data. The consistent focus is on hands-on modeling, analysis of data, and interpretations that address scientific questions. A must-have for astronomers, its concrete approach will also be attractive to researchers in the sciences more generally.
Awad, Lara; Fady, Bruno; Khater, Carla; Roig, Anne; Cheddadi, Rachid
2014-01-01
The threatened conifer Abies cilicica currently persists in Lebanon in geographically isolated forest patches. The impact of demographic and evolutionary processes on population genetic diversity and structure were assessed using 10 nuclear microsatellite loci. All remnant 15 local populations revealed a low genetic variation but a high recent effective population size. FST-based measures of population genetic differentiation revealed a low spatial genetic structure, but Bayesian analysis of population structure identified a significant Northeast-Southwest population structure. Populations showed significant but weak isolation-by-distance, indicating non-equilibrium conditions between dispersal and genetic drift. Bayesian assignment tests detected an asymmetric Northeast-Southwest migration involving some long-distance dispersal events. We suggest that the persistence and Northeast-Southwest geographic structure of Abies cilicica in Lebanon is the result of at least two demographic processes during its recent evolutionary history: (1) recent migration to currently marginal populations and (2) local persistence through altitudinal shifts along a mountainous topography. These results might help us better understand the mechanisms involved in the species response to expected climate change. PMID:24587219
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes.
Schwartenbeck, Philipp; FitzGerald, Thomas H B; Mathys, Christoph; Dolan, Ray; Friston, Karl
2015-10-01
Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a "limited offer" game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing. © The Author 2014. Published by Oxford University Press.
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes
Schwartenbeck, Philipp; FitzGerald, Thomas H. B.; Mathys, Christoph; Dolan, Ray; Friston, Karl
2015-01-01
Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a “limited offer” game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing. PMID:25056572
Zador, Zsolt; Sperrin, Matthew; King, Andrew T
2016-01-01
Traumatic brain injury remains a global health problem. Understanding the relative importance of outcome predictors helps optimize our treatment strategies by informing assessment protocols, clinical decisions and trial designs. In this study we establish importance ranking for outcome predictors based on receiver operating indices to identify key predictors of outcome and create simple predictive models. We then explore the associations between key outcome predictors using Bayesian networks to gain further insight into predictor importance. We analyzed the corticosteroid randomization after significant head injury (CRASH) trial database of 10008 patients and included patients for whom demographics, injury characteristics, computer tomography (CT) findings and Glasgow Outcome Scale (GCS) were recorded (total of 13 predictors, which would be available to clinicians within a few hours following the injury in 6945 patients). Predictions of clinical outcome (death or severe disability at 6 months) were performed using logistic regression models with 5-fold cross validation. Predictive performance was measured using standardized partial area (pAUC) under the receiver operating curve (ROC) and we used Delong test for comparisons. Variable importance ranking was based on pAUC targeted at specificity (pAUCSP) and sensitivity (pAUCSE) intervals of 90-100%. Probabilistic associations were depicted using Bayesian networks. Complete AUC analysis showed very good predictive power (AUC = 0.8237, 95% CI: 0.8138-0.8336) for the complete model. Specificity focused importance ranking highlighted age, pupillary, motor responses, obliteration of basal cisterns/3rd ventricle and midline shift. Interestingly when targeting model sensitivity, the highest-ranking variables were age, severe extracranial injury, verbal response, hematoma on CT and motor response. Simplified models, which included only these key predictors, had similar performance (pAUCSP = 0.6523, 95% CI: 0.6402-0.6641 and pAUCSE = 0.6332, 95% CI: 0.62-0.6477) compared to the complete models (pAUCSP = 0.6664, 95% CI: 0.6543-0.679, pAUCSE = 0.6436, 95% CI: 0.6289-0.6585, de Long p value 0.1165 and 0.3448 respectively). Bayesian networks showed the predictors that did not feature in the simplified models were associated with those that did. We demonstrate that importance based variable selection allows simplified predictive models to be created while maintaining prediction accuracy. Variable selection targeting specificity confirmed key components of clinical assessment in TBI whereas sensitivity based ranking suggested extracranial injury as one of the important predictors. These results help refine our approach to head injury assessment, decision-making and outcome prediction targeted at model sensitivity and specificity. Bayesian networks proved to be a comprehensive tool for depicting probabilistic associations for key predictors giving insight into why the simplified model has maintained accuracy.
Integrated Software Health Management for Aircraft GN and C
NASA Technical Reports Server (NTRS)
Schumann, Johann; Mengshoel, Ole
2011-01-01
Modern aircraft rely heavily on dependable operation of many safety-critical software components. Despite careful design, verification and validation (V&V), on-board software can fail with disastrous consequences if it encounters problematic software/hardware interaction or must operate in an unexpected environment. We are using a Bayesian approach to monitor the software and its behavior during operation and provide up-to-date information about the health of the software and its components. The powerful reasoning mechanism provided by our model-based Bayesian approach makes reliable diagnosis of the root causes possible and minimizes the number of false alarms. Compilation of the Bayesian model into compact arithmetic circuits makes SWHM feasible even on platforms with limited CPU power. We show initial results of SWHM on a small simulator of an embedded aircraft software system, where software and sensor faults can be injected.
Bockman, Alexander; Fackler, Cameron; Xiang, Ning
2015-04-01
Acoustic performance for an interior requires an accurate description of the boundary materials' surface acoustic impedance. Analytical methods may be applied to a small class of test geometries, but inverse numerical methods provide greater flexibility. The parameter estimation problem requires minimizing prediction vice observed acoustic field pressure. The Bayesian-network sampling approach presented here mitigates other methods' susceptibility to noise inherent to the experiment, model, and numerics. A geometry agnostic method is developed here and its parameter estimation performance is demonstrated for an air-backed micro-perforated panel in an impedance tube. Good agreement is found with predictions from the ISO standard two-microphone, impedance-tube method, and a theoretical model for the material. Data by-products exclusive to a Bayesian approach are analyzed to assess sensitivity of the method to nuisance parameters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
La Russa, D
Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less
Page, Lindsay C
2012-04-01
Results from MDRC's longitudinal, random-assignment evaluation of career-academy high schools reveal that several years after high-school completion, those randomized to receive the academy opportunity realized a $175 (11%) increase in monthly earnings, on average. In this paper, I investigate the impact of duration of actual academy enrollment, as nearly half of treatment group students either never enrolled or participated for only a portion of high school. I capitalize on data from this experimental evaluation and utilize a principal stratification framework and Bayesian inference to investigate the causal impact of academy participation. This analysis focuses on a sample of 1,306 students across seven sites in the MDRC evaluation. Participation is measured by number of years of academy enrollment, and the outcome of interest is average monthly earnings in the period of four to eight years after high school graduation. I estimate an average causal effect of treatment assignment on subsequent monthly earnings of approximately $588 among males who remained enrolled in an academy throughout high school and more modest impacts among those who participated only partially. Different from an instrumental variables approach to treatment non-compliance, which allows for the estimation of linear returns to treatment take-up, the more general framework of principal stratification allows for the consideration of non-linear returns, although at the expense of additional model-based assumptions.
Sironi, Emanuele; Taroni, Franco; Baldinotti, Claudio; Nardi, Cosimo; Norelli, Gian-Aristide; Gallidabino, Matteo; Pinchi, Vilma
2017-11-14
The present study aimed to investigate the performance of a Bayesian method in the evaluation of dental age-related evidence collected by means of a geometrical approximation procedure of the pulp chamber volume. Measurement of this volume was based on three-dimensional cone beam computed tomography images. The Bayesian method was applied by means of a probabilistic graphical model, namely a Bayesian network. Performance of that method was investigated in terms of accuracy and bias of the decisional outcomes. Influence of an informed elicitation of the prior belief of chronological age was also studied by means of a sensitivity analysis. Outcomes in terms of accuracy were adequate with standard requirements for forensic adult age estimation. Findings also indicated that the Bayesian method does not show a particular tendency towards under- or overestimation of the age variable. Outcomes of the sensitivity analysis showed that results on estimation are improved with a ration elicitation of the prior probabilities of age.
Artificial and Bayesian Neural Networks
Korhani Kangi, Azam; Bahrampour, Abbas
2018-02-26
Introduction and purpose: In recent years the use of neural networks without any premises for investigation of prognosis in analyzing survival data has increased. Artificial neural networks (ANN) use small processors with a continuous network to solve problems inspired by the human brain. Bayesian neural networks (BNN) constitute a neural-based approach to modeling and non-linearization of complex issues using special algorithms and statistical methods. Gastric cancer incidence is the first and third ranking for men and women in Iran, respectively. The aim of the present study was to assess the value of an artificial neural network and a Bayesian neural network for modeling and predicting of probability of gastric cancer patient death. Materials and Methods: In this study, we used information on 339 patients aged from 20 to 90 years old with positive gastric cancer, referred to Afzalipoor and Shahid Bahonar Hospitals in Kerman City from 2001 to 2015. The three layers perceptron neural network (ANN) and the Bayesian neural network (BNN) were used for predicting the probability of mortality using the available data. To investigate differences between the models, sensitivity, specificity, accuracy and the area under receiver operating characteristic curves (AUROCs) were generated. Results: In this study, the sensitivity and specificity of the artificial neural network and Bayesian neural network models were 0.882, 0.903 and 0.954, 0.909, respectively. Prediction accuracy and the area under curve ROC for the two models were 0.891, 0.944 and 0.935, 0.961. The age at diagnosis of gastric cancer was most important for predicting survival, followed by tumor grade, morphology, gender, smoking history, opium consumption, receiving chemotherapy, presence of metastasis, tumor stage, receiving radiotherapy, and being resident in a village. Conclusion: The findings of the present study indicated that the Bayesian neural network is preferable to an artificial neural network for predicting survival of gastric cancer patients in Iran. Creative Commons Attribution License
Bayesian Forecasting Tool to Predict the Need for Antidote in Acute Acetaminophen Overdose.
Desrochers, Julie; Wojciechowski, Jessica; Klein-Schwartz, Wendy; Gobburu, Jogarao V S; Gopalakrishnan, Mathangi
2017-08-01
Acetaminophen (APAP) overdose is the leading cause of acute liver injury in the United States. Patients with elevated plasma acetaminophen concentrations (PACs) require hepatoprotective treatment with N-acetylcysteine (NAC). These patients have been primarily risk-stratified using the Rumack-Matthew nomogram. Previous studies of acute APAP overdoses found that the nomogram failed to accurately predict the need for the antidote. The objectives of this study were to develop a population pharmacokinetic (PK) model for APAP following acute overdose and evaluate the utility of population PK model-based Bayesian forecasting in NAC administration decisions. Limited APAP concentrations from a retrospective cohort of acute overdosed subjects from the Maryland Poison Center were used to develop the population PK model and to investigate the effect of type of APAP products and other prognostic factors. The externally validated population PK model was used a prior for Bayesian forecasting to predict the individual PK profile when one or two observed PACs were available. The utility of Bayesian forecasted APAP concentration-time profiles inferred from one (first) or two (first and second) PAC observations were also tested in their ability to predict the observed NAC decisions. A one-compartment model with first-order absorption and elimination adequately described the data with single activated charcoal and APAP products as significant covariates on absorption and bioavailability. The Bayesian forecasted individual concentration-time profiles had acceptable bias (6.2% and 9.8%) and accuracy (40.5% and 41.9%) when either one or two PACs were considered, respectively. The sensitivity and negative predictive value of the Bayesian forecasted NAC decisions using one PAC were 84% and 92.6%, respectively. The population PK analysis provided a platform for acceptably predicting an individual's concentration-time profile following acute APAP overdose with at least one PAC, and the individual's covariate profile, and can potentially be used for making early NAC administration decisions. © 2017 Pharmacotherapy Publications, Inc.
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
The Bayesian reader: explaining word recognition as an optimal Bayesian decision process.
Norris, Dennis
2006-04-01
This article presents a theory of visual word recognition that assumes that, in the tasks of word identification, lexical decision, and semantic categorization, human readers behave as optimal Bayesian decision makers. This leads to the development of a computational model of word recognition, the Bayesian reader. The Bayesian reader successfully simulates some of the most significant data on human reading. The model accounts for the nature of the function relating word frequency to reaction time and identification threshold, the effects of neighborhood density and its interaction with frequency, and the variation in the pattern of neighborhood density effects seen in different experimental tasks. Both the general behavior of the model and the way the model predicts different patterns of results in different tasks follow entirely from the assumption that human readers approximate optimal Bayesian decision makers. ((c) 2006 APA, all rights reserved).
Stanley, Clayton; Byrne, Michael D
2016-12-01
The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Pérez, B.; Brouwer, R.; Beckers, J.; Paradis, D.; Balseiro, C.; Lyons, K.; Cure, M.; Sotillo, M. G.; Hackett, B.; Verlaan, M.; Fanjul, E. A.
2012-03-01
ENSURF (Ensemble SURge Forecast) is a multi-model application for sea level forecast that makes use of several storm surge or circulation models and near-real time tide gauge data in the region, with the following main goals: 1. providing easy access to existing forecasts, as well as to its performance and model validation, by means of an adequate visualization tool; 2. generation of better forecasts of sea level, including confidence intervals, by means of the Bayesian Model Average technique (BMA). The Bayesian Model Average technique generates an overall forecast probability density function (PDF) by making a weighted average of the individual forecasts PDF's; the weights represent the Bayesian likelihood that a model will give the correct forecast and are continuously updated based on the performance of the models during a recent training period. This implies the technique needs the availability of sea level data from tide gauges in near-real time. The system was implemented for the European Atlantic facade (IBIROOS region) and Western Mediterranean coast based on the MATROOS visualization tool developed by Deltares. Results of validation of the different models and BMA implementation for the main harbours are presented for these regions where this kind of activity is performed for the first time. The system is currently operational at Puertos del Estado and has proved to be useful in the detection of calibration problems in some of the circulation models, in the identification of the systematic differences between baroclinic and barotropic models for sea level forecasts and to demonstrate the feasibility of providing an overall probabilistic forecast, based on the BMA method.
Uncertainty plus prior equals rational bias: an intuitive Bayesian probability weighting function.
Fennell, John; Baddeley, Roland
2012-10-01
Empirical research has shown that when making choices based on probabilistic options, people behave as if they overestimate small probabilities, underestimate large probabilities, and treat positive and negative outcomes differently. These distortions have been modeled using a nonlinear probability weighting function, which is found in several nonexpected utility theories, including rank-dependent models and prospect theory; here, we propose a Bayesian approach to the probability weighting function and, with it, a psychological rationale. In the real world, uncertainty is ubiquitous and, accordingly, the optimal strategy is to combine probability statements with prior information using Bayes' rule. First, we show that any reasonable prior on probabilities leads to 2 of the observed effects; overweighting of low probabilities and underweighting of high probabilities. We then investigate 2 plausible kinds of priors: informative priors based on previous experience and uninformative priors of ignorance. Individually, these priors potentially lead to large problems of bias and inefficiency, respectively; however, when combined using Bayesian model comparison methods, both forms of prior can be applied adaptively, gaining the efficiency of empirical priors and the robustness of ignorance priors. We illustrate this for the simple case of generic good and bad options, using Internet blogs to estimate the relevant priors of inference. Given this combined ignorant/informative prior, the Bayesian probability weighting function is not only robust and efficient but also matches all of the major characteristics of the distortions found in empirical research. PsycINFO Database Record (c) 2012 APA, all rights reserved.
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.
Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G
2016-02-01
Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Vaughan, Adam S; Kramer, Michael R; Waller, Lance A; Schieb, Linda J; Greer, Sophia; Casper, Michele
2015-05-01
To demonstrate the implications of choosing analytical methods for quantifying spatiotemporal trends, we compare the assumptions, implementation, and outcomes of popular methods using county-level heart disease mortality in the United States between 1973 and 2010. We applied four regression-based approaches (joinpoint regression, both aspatial and spatial generalized linear mixed models, and Bayesian space-time model) and compared resulting inferences for geographic patterns of local estimates of annual percent change and associated uncertainty. The average local percent change in heart disease mortality from each method was -4.5%, with the Bayesian model having the smallest range of values. The associated uncertainty in percent change differed markedly across the methods, with the Bayesian space-time model producing the narrowest range of variance (0.0-0.8). The geographic pattern of percent change was consistent across methods with smaller declines in the South Central United States and larger declines in the Northeast and Midwest. However, the geographic patterns of uncertainty differed markedly between methods. The similarity of results, including geographic patterns, for magnitude of percent change across these methods validates the underlying spatial pattern of declines in heart disease mortality. However, marked differences in degree of uncertainty indicate that Bayesian modeling offers substantially more precise estimates. Copyright © 2015 Elsevier Inc. All rights reserved.
Ennouri, Karim; Ayed, Rayda Ben; Hassen, Hanen Ben; Mazzarello, Maura; Ottaviani, Ennio
2015-12-01
Bacillus thuringiensis (Bt) is a Gram-positive bacterium. The entomopathogenic activity of Bt is related to the existence of the crystal consisting of protoxins, also called delta-endotoxins. In order to optimize and explain the production of delta-endotoxins of Bacillus thuringiensis kurstaki, we studied seven medium components: soybean meal, starch, KH₂PO₄, K₂HPO₄, FeSO₄, MnSO₄, and MgSO₄and their relationships with the concentration of delta-endotoxins using an experimental design (Plackett-Burman design) and Bayesian networks modelling. The effects of the ingredients of the culture medium on delta-endotoxins production were estimated. The developed model showed that different medium components are important for the Bacillus thuringiensis fermentation. The most important factors influenced the production of delta-endotoxins are FeSO₄, K2HPO₄, starch and soybean meal. Indeed, it was found that soybean meal, K₂HPO₄, KH₂PO₄and starch also showed positive effect on the delta-endotoxins production. However, FeSO4 and MnSO4 expressed opposite effect. The developed model, based on Bayesian techniques, can automatically learn emerging models in data to serve in the prediction of delta-endotoxins concentrations. The constructed model in the present study implies that experimental design (Plackett-Burman design) joined with Bayesian networks method could be used for identification of effect variables on delta-endotoxins variation.
NASA Astrophysics Data System (ADS)
Baresel, Björn; Bucher, Hugo; Brosse, Morgane; Cordey, Fabrice; Guodun, Kuang; Schaltegger, Urs
2017-03-01
This study is based on zircon U-Pb ages of 12 volcanic ash layers and volcanogenic sandstones from two deep water sections with conformable and continuous formational Permian-Triassic boundaries (PTBs) in the Nanpanjiang Basin (South China). Our dates of single, thermally annealed and chemically abraded zircons bracket the PTB in Dongpan and Penglaitan and provide the basis for a first proof-of-concept study utilizing a Bayesian chronology model comparing the three sections of Dongpan, Penglaitan and the Global Stratotype Section and Point (GSSP) at Meishan. Our Bayesian modeling demonstrates that the formational boundaries in Dongpan (251.939 ± 0.030 Ma), Penglaitan (251.984 ± 0.031 Ma) and Meishan (251.956 ± 0.035 Ma) are synchronous within analytical uncertainty of ˜ 40 ka. It also provides quantitative evidence that the ages of the paleontologically defined boundaries, based on conodont unitary association zones in Meishan and on macrofaunas in Dongpan, are identical and coincide with the age of the formational boundaries. The age model also confirms the extreme condensation around the PTB in Meishan, which distorts the projection of any stratigraphic points or intervals onto other more expanded sections by means of Bayesian age-depth models. Dongpan and Penglaitan possess significantly higher sediment accumulation rates and thus offer a greater potential for high-resolution studies of environmental proxies and correlations around the PTB than Meishan. This study highlights the power of high-resolution radio-isotopic ages that allow a robust intercalibration of patterns of biotic changes and fluctuating environmental proxies and will help recognizing their global, regional or local significance.
Saha, Dibakar; Alluri, Priyanka; Gan, Albert
2017-01-01
The Highway Safety Manual (HSM) presents statistical models to quantitatively estimate an agency's safety performance. The models were developed using data from only a few U.S. states. To account for the effects of the local attributes and temporal factors on crash occurrence, agencies are required to calibrate the HSM-default models for crash predictions. The manual suggests updating calibration factors every two to three years, or preferably on an annual basis. Given that the calibration process involves substantial time, effort, and resources, a comprehensive analysis of the required calibration factor update frequency is valuable to the agencies. Accordingly, the objective of this study is to evaluate the HSM's recommendation and determine the required frequency of calibration factor updates. A robust Bayesian estimation procedure is used to assess the variation between calibration factors computed annually, biennially, and triennially using data collected from over 2400 miles of segments and over 700 intersections on urban and suburban facilities in Florida. Bayesian model yields a posterior distribution of the model parameters that give credible information to infer whether the difference between calibration factors computed at specified intervals is credibly different from the null value which represents unaltered calibration factors between the comparison years or in other words, zero difference. The concept of the null value is extended to include the range of values that are practically equivalent to zero. Bayesian inference shows that calibration factors based on total crash frequency are required to be updated every two years in cases where the variations between calibration factors are not greater than 0.01. When the variations are between 0.01 and 0.05, calibration factors based on total crash frequency could be updated every three years. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bayesian population receptive field modelling.
Zeidman, Peter; Silson, Edward Harry; Schwarzkopf, Dietrich Samuel; Baker, Chris Ian; Penny, Will
2017-09-08
We introduce a probabilistic (Bayesian) framework and associated software toolbox for mapping population receptive fields (pRFs) based on fMRI data. This generic approach is intended to work with stimuli of any dimension and is demonstrated and validated in the context of 2D retinotopic mapping. The framework enables the experimenter to specify generative (encoding) models of fMRI timeseries, in which experimental stimuli enter a pRF model of neural activity, which in turns drives a nonlinear model of neurovascular coupling and Blood Oxygenation Level Dependent (BOLD) response. The neuronal and haemodynamic parameters are estimated together on a voxel-by-voxel or region-of-interest basis using a Bayesian estimation algorithm (variational Laplace). This offers several novel contributions to receptive field modelling. The variance/covariance of parameters are estimated, enabling receptive fields to be plotted while properly representing uncertainty about pRF size and location. Variability in the haemodynamic response across the brain is accounted for. Furthermore, the framework introduces formal hypothesis testing to pRF analysis, enabling competing models to be evaluated based on their log model evidence (approximated by the variational free energy), which represents the optimal tradeoff between accuracy and complexity. Using simulations and empirical data, we found that parameters typically used to represent pRF size and neuronal scaling are strongly correlated, which is taken into account by the Bayesian methods we describe when making inferences. We used the framework to compare the evidence for six variants of pRF model using 7 T functional MRI data and we found a circular Difference of Gaussians (DoG) model to be the best explanation for our data overall. We hope this framework will prove useful for mapping stimulus spaces with any number of dimensions onto the anatomy of the brain. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Evidence-based Controls for Epidemics Using Spatio-temporal Stochastic Model as a Bayesian Framwork
USDA-ARS?s Scientific Manuscript database
The control of highly infectious diseases of agricultural and plantation crops and livestock represents a key challenge in epidemiological and ecological modelling, with implemented control strategies often being controversial. Mathematical models, including the spatio-temporal stochastic models con...
A novel approach for pilot error detection using Dynamic Bayesian Networks.
Saada, Mohamad; Meng, Qinggang; Huang, Tingwen
2014-06-01
In the last decade Dynamic Bayesian Networks (DBNs) have become one type of the most attractive probabilistic modelling framework extensions of Bayesian Networks (BNs) for working under uncertainties from a temporal perspective. Despite this popularity not many researchers have attempted to study the use of these networks in anomaly detection or the implications of data anomalies on the outcome of such models. An abnormal change in the modelled environment's data at a given time, will cause a trailing chain effect on data of all related environment variables in current and consecutive time slices. Albeit this effect fades with time, it still can have an ill effect on the outcome of such models. In this paper we propose an algorithm for pilot error detection, using DBNs as the modelling framework for learning and detecting anomalous data. We base our experiments on the actions of an aircraft pilot, and a flight simulator is created for running the experiments. The proposed anomaly detection algorithm has achieved good results in detecting pilot errors and effects on the whole system.
Bayesian flood forecasting methods: A review
NASA Astrophysics Data System (ADS)
Han, Shasha; Coulibaly, Paulin
2017-08-01
Over the past few decades, floods have been seen as one of the most common and largely distributed natural disasters in the world. If floods could be accurately forecasted in advance, then their negative impacts could be greatly minimized. It is widely recognized that quantification and reduction of uncertainty associated with the hydrologic forecast is of great importance for flood estimation and rational decision making. Bayesian forecasting system (BFS) offers an ideal theoretic framework for uncertainty quantification that can be developed for probabilistic flood forecasting via any deterministic hydrologic model. It provides suitable theoretical structure, empirically validated models and reasonable analytic-numerical computation method, and can be developed into various Bayesian forecasting approaches. This paper presents a comprehensive review on Bayesian forecasting approaches applied in flood forecasting from 1999 till now. The review starts with an overview of fundamentals of BFS and recent advances in BFS, followed with BFS application in river stage forecasting and real-time flood forecasting, then move to a critical analysis by evaluating advantages and limitations of Bayesian forecasting methods and other predictive uncertainty assessment approaches in flood forecasting, and finally discusses the future research direction in Bayesian flood forecasting. Results show that the Bayesian flood forecasting approach is an effective and advanced way for flood estimation, it considers all sources of uncertainties and produces a predictive distribution of the river stage, river discharge or runoff, thus gives more accurate and reliable flood forecasts. Some emerging Bayesian forecasting methods (e.g. ensemble Bayesian forecasting system, Bayesian multi-model combination) were shown to overcome limitations of single model or fixed model weight and effectively reduce predictive uncertainty. In recent years, various Bayesian flood forecasting approaches have been developed and widely applied, but there is still room for improvements. Future research in the context of Bayesian flood forecasting should be on assimilation of various sources of newly available information and improvement of predictive performance assessment methods.
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.
Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
Bayesian statistics in medicine: a 25 year review.
Ashby, Deborah
2006-11-15
This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.
2016-01-01
This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
Dediu, Dan
2009-08-07
The recent Bayesian approaches to language evolution and change seem to suggest that genetic biases can impact on the characteristics of language, but, at the same time, that its cultural transmission can partially free it from these same genetic constraints. One of the current debates centres on the striking differences between sampling and a posteriori maximising Bayesian learners, with the first converging on the prior bias while the latter allows a certain freedom to language evolution. The present paper shows that this difference disappears if populations more complex than a single teacher and a single learner are considered, with the resulting behaviours more similar to the sampler. This suggests that generalisations based on the language produced by Bayesian agents in such homogeneous single agent chains are not warranted. It is not clear which of the assumptions in such models are responsible, but these findings seem to support the rising concerns on the validity of the "acquisitionist" assumption, whereby the locus of language change and evolution is taken to be the first language acquirers (children) as opposed to the competent language users (the adults).
Odegård, J; Jensen, J; Madsen, P; Gianola, D; Klemetsdal, G; Heringstad, B
2003-11-01
The distribution of somatic cell scores could be regarded as a mixture of at least two components depending on a cow's udder health status. A heteroscedastic two-component Bayesian normal mixture model with random effects was developed and implemented via Gibbs sampling. The model was evaluated using datasets consisting of simulated somatic cell score records. Somatic cell score was simulated as a mixture representing two alternative udder health statuses ("healthy" or "diseased"). Animals were assigned randomly to the two components according to the probability of group membership (Pm). Random effects (additive genetic and permanent environment), when included, had identical distributions across mixture components. Posterior probabilities of putative mastitis were estimated for all observations, and model adequacy was evaluated using measures of sensitivity, specificity, and posterior probability of misclassification. Fitting different residual variances in the two mixture components caused some bias in estimation of parameters. When the components were difficult to disentangle, so were their residual variances, causing bias in estimation of Pm and of location parameters of the two underlying distributions. When all variance components were identical across mixture components, the mixture model analyses returned parameter estimates essentially without bias and with a high degree of precision. Including random effects in the model increased the probability of correct classification substantially. No sizable differences in probability of correct classification were found between models in which a single cow effect (ignoring relationships) was fitted and models where this effect was split into genetic and permanent environmental components, utilizing relationship information. When genetic and permanent environmental effects were fitted, the between-replicate variance of estimates of posterior means was smaller because the model accounted for random genetic drift.
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832
Modelling maximum river flow by using Bayesian Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Cheong, R. Y.; Gabda, D.
2017-09-01
Analysis of flood trends is vital since flooding threatens human living in terms of financial, environment and security. The data of annual maximum river flows in Sabah were fitted into generalized extreme value (GEV) distribution. Maximum likelihood estimator (MLE) raised naturally when working with GEV distribution. However, previous researches showed that MLE provide unstable results especially in small sample size. In this study, we used different Bayesian Markov Chain Monte Carlo (MCMC) based on Metropolis-Hastings algorithm to estimate GEV parameters. Bayesian MCMC method is a statistical inference which studies the parameter estimation by using posterior distribution based on Bayes’ theorem. Metropolis-Hastings algorithm is used to overcome the high dimensional state space faced in Monte Carlo method. This approach also considers more uncertainty in parameter estimation which then presents a better prediction on maximum river flow in Sabah.
Learning Negotiation Policies Using IB3 and Bayesian Networks
NASA Astrophysics Data System (ADS)
Nalepa, Gislaine M.; Ávila, Bráulio C.; Enembreck, Fabrício; Scalabrin, Edson E.
This paper presents an intelligent offer policy in a negotiation environment, in which each agent involved learns the preferences of its opponent in order to improve its own performance. Each agent must also be able to detect drifts in the opponent's preferences so as to quickly adjust itself to their new offer policy. For this purpose, two simple learning techniques were first evaluated: (i) based on instances (IB3) and (ii) based on Bayesian Networks. Additionally, as its known that in theory group learning produces better results than individual/single learning, the efficiency of IB3 and Bayesian classifier groups were also analyzed. Finally, each decision model was evaluated in moments of concept drift, being the drift gradual, moderate or abrupt. Results showed that both groups of classifiers were able to effectively detect drifts in the opponent's preferences.
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
Degen, Bernd; Blanc-Jolivet, Céline; Stierand, Katrin; Gillet, Elizabeth
2017-03-01
During the past decade, the use of DNA for forensic applications has been extensively implemented for plant and animal species, as well as in humans. Tracing back the geographical origin of an individual usually requires genetic assignment analysis. These approaches are based on reference samples that are grouped into populations or other aggregates and intend to identify the most likely group of origin. Often this grouping does not have a biological but rather a historical or political justification, such as "country of origin". In this paper, we present a new nearest neighbour approach to individual assignment or classification within a given but potentially imperfect grouping of reference samples. This method, which is based on the genetic distance between individuals, functions better in many cases than commonly used methods. We demonstrate the operation of our assignment method using two data sets. One set is simulated for a large number of trees distributed in a 120km by 120km landscape with individual genotypes at 150 SNPs, and the other set comprises experimental data of 1221 individuals of the African tropical tree species Entandrophragma cylindricum (Sapelli) genotyped at 61 SNPs. Judging by the level of correct self-assignment, our approach outperformed the commonly used frequency and Bayesian approaches by 15% for the simulated data set and by 5-7% for the Sapelli data set. Our new approach is less sensitive to overlapping sources of genetic differentiation, such as genetic differences among closely-related species, phylogeographic lineages and isolation by distance, and thus operates better even for suboptimal grouping of individuals. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H
2017-01-01
In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.
Kling, Daniel; Egeland, Thore; Mostad, Petter
2012-01-01
In a number of applications there is a need to determine the most likely pedigree for a group of persons based on genetic markers. Adequate models are needed to reach this goal. The markers used to perform the statistical calculations can be linked and there may also be linkage disequilibrium (LD) in the population. The purpose of this paper is to present a graphical Bayesian Network framework to deal with such data. Potential LD is normally ignored and it is important to verify that the resulting calculations are not biased. Even if linkage does not influence results for regular paternity cases, it may have substantial impact on likelihood ratios involving other, more extended pedigrees. Models for LD influence likelihoods for all pedigrees to some degree and an initial estimate of the impact of ignoring LD and/or linkage is desirable, going beyond mere rules of thumb based on marker distance. Furthermore, we show how one can readily include a mutation model in the Bayesian Network; extending other programs or formulas to include such models may require considerable amounts of work and will in many case not be practical. As an example, we consider the two STR markers vWa and D12S391. We estimate probabilities for population haplotypes to account for LD using a method based on data from trios, while an estimate for the degree of linkage is taken from the literature. The results show that accounting for haplotype frequencies is unnecessary in most cases for this specific pair of markers. When doing calculations on regular paternity cases, the markers can be considered statistically independent. In more complex cases of disputed relatedness, for instance cases involving siblings or so-called deficient cases, or when small differences in the LR matter, independence should not be assumed. (The networks are freely available at http://arken.umb.no/~dakl/BayesianNetworks.) PMID:22984448
Semiparametric modeling: Correcting low-dimensional model error in parametric models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, Tyrus, E-mail: thb11@psu.edu; Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, 503 Walker Building, University Park, PA 16802-5013
2016-03-01
In this paper, a semiparametric modeling approach is introduced as a paradigm for addressing model error arising from unresolved physical phenomena. Our approach compensates for model error by learning an auxiliary dynamical model for the unknown parameters. Practically, the proposed approach consists of the following steps. Given a physics-based model and a noisy data set of historical observations, a Bayesian filtering algorithm is used to extract a time-series of the parameter values. Subsequently, the diffusion forecast algorithm is applied to the retrieved time-series in order to construct the auxiliary model for the time evolving parameters. The semiparametric forecasting algorithm consistsmore » of integrating the existing physics-based model with an ensemble of parameters sampled from the probability density function of the diffusion forecast. To specify initial conditions for the diffusion forecast, a Bayesian semiparametric filtering method that extends the Kalman-based filtering framework is introduced. In difficult test examples, which introduce chaotically and stochastically evolving hidden parameters into the Lorenz-96 model, we show that our approach can effectively compensate for model error, with forecasting skill comparable to that of the perfect model.« less
Staatz, Christine E; Tett, Susan E
2011-12-01
This review seeks to summarize the available data about Bayesian estimation of area under the plasma concentration-time curve (AUC) and dosage prediction for mycophenolic acid (MPA) and evaluate whether sufficient evidence is available for routine use of Bayesian dosage prediction in clinical practice. A literature search identified 14 studies that assessed the predictive performance of maximum a posteriori Bayesian estimation of MPA AUC and one report that retrospectively evaluated how closely dosage recommendations based on Bayesian forecasting achieved targeted MPA exposure. Studies to date have mostly been undertaken in renal transplant recipients, with limited investigation in patients treated with MPA for autoimmune disease or haematopoietic stem cell transplantation. All of these studies have involved use of the mycophenolate mofetil (MMF) formulation of MPA, rather than the enteric-coated mycophenolate sodium (EC-MPS) formulation. Bias associated with estimation of MPA AUC using Bayesian forecasting was generally less than 10%. However some difficulties with imprecision was evident, with values ranging from 4% to 34% (based on estimation involving two or more concentration measurements). Evaluation of whether MPA dosing decisions based on Bayesian forecasting (by the free website service https://pharmaco.chu-limoges.fr) achieved target drug exposure has only been undertaken once. When MMF dosage recommendations were applied by clinicians, a higher proportion (72-80%) of subsequent estimated MPA AUC values were within the 30-60 mg · h/L target range, compared with when dosage recommendations were not followed (only 39-57% within target range). Such findings provide evidence that Bayesian dosage prediction is clinically useful for achieving target MPA AUC. This study, however, was retrospective and focussed only on adult renal transplant recipients. Furthermore, in this study, Bayesian-generated AUC estimations and dosage predictions were not compared with a later full measured AUC but rather with a further AUC estimate based on a second Bayesian analysis. This study also provided some evidence that a useful monitoring schedule for MPA AUC following adult renal transplant would be every 2 weeks during the first month post-transplant, every 1-3 months between months 1 and 12, and each year thereafter. It will be interesting to see further validations in different patient groups using the free website service. In summary, the predictive performance of Bayesian estimation of MPA, comparing estimated with measured AUC values, has been reported in several studies. However, the next step of predicting dosages based on these Bayesian-estimated AUCs, and prospectively determining how closely these predicted dosages give drug exposure matching targeted AUCs, remains largely unaddressed. Further prospective studies are required, particularly in non-renal transplant patients and with the EC-MPS formulation. Other important questions remain to be answered, such as: do Bayesian forecasting methods devised to date use the best population pharmacokinetic models or most accurate algorithms; are the methods simple to use for routine clinical practice; do the algorithms actually improve dosage estimations beyond empirical recommendations in all groups that receive MPA therapy; and, importantly, do the dosage predictions, when followed, improve patient health outcomes?
Law, Jane
2016-01-01
Intrinsic conditional autoregressive modeling in a Bayeisan hierarchical framework has been increasingly applied in small-area ecological studies. This study explores the specifications of spatial structure in this Bayesian framework in two aspects: adjacency, i.e., the set of neighbor(s) for each area; and (spatial) weight for each pair of neighbors. Our analysis was based on a small-area study of falling injuries among people age 65 and older in Ontario, Canada, that was aimed to estimate risks and identify risk factors of such falls. In the case study, we observed incorrect adjacencies information caused by deficiencies in the digital map itself. Further, when equal weights was replaced by weights based on a variable of expected count, the range of estimated risks increased, the number of areas with probability of estimated risk greater than one at different probability thresholds increased, and model fit improved. More importantly, significance of a risk factor diminished. Further research to thoroughly investigate different methods of variable weights; quantify the influence of specifications of spatial weights; and develop strategies for better defining spatial structure of a map in small-area analysis in Bayesian hierarchical spatial modeling is recommended. PMID:29546147
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
Bayesian Learning and the Psychology of Rule Induction
ERIC Educational Resources Information Center
Endress, Ansgar D.
2013-01-01
In recent years, Bayesian learning models have been applied to an increasing variety of domains. While such models have been criticized on theoretical grounds, the underlying assumptions and predictions are rarely made concrete and tested experimentally. Here, I use Frank and Tenenbaum's (2011) Bayesian model of rule-learning as a case study to…
Properties of the Bayesian Knowledge Tracing Model
ERIC Educational Resources Information Center
van de Sande, Brett
2013-01-01
Bayesian Knowledge Tracing is used very widely to model student learning. It comes in two different forms: The first form is the Bayesian Knowledge Tracing "hidden Markov model" which predicts the probability of correct application of a skill as a function of the number of previous opportunities to apply that skill and the model…
Bayesian Analysis of Longitudinal Data Using Growth Curve Models
ERIC Educational Resources Information Center
Zhang, Zhiyong; Hamagami, Fumiaki; Wang, Lijuan Lijuan; Nesselroade, John R.; Grimm, Kevin J.
2007-01-01
Bayesian methods for analyzing longitudinal data in social and behavioral research are recommended for their ability to incorporate prior information in estimating simple and complex models. We first summarize the basics of Bayesian methods before presenting an empirical example in which we fit a latent basis growth curve model to achievement data…
Testing students' e-learning via Facebook through Bayesian structural equation modeling.
Salarzadeh Jenatabadi, Hashem; Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students' intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods' results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated.
Testing students’ e-learning via Facebook through Bayesian structural equation modeling
Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students’ intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods’ results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated. PMID:28886019
Bayesian analysis of rare events
DOE Office of Scientific and Technical Information (OSTI.GOV)
Straub, Daniel, E-mail: straub@tum.de; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into themore » probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.« less
A Flexible Hierarchical Bayesian Modeling Technique for Risk Analysis of Major Accidents.
Yu, Hongyang; Khan, Faisal; Veitch, Brian
2017-09-01
Safety analysis of rare events with potentially catastrophic consequences is challenged by data scarcity and uncertainty. Traditional causation-based approaches, such as fault tree and event tree (used to model rare event), suffer from a number of weaknesses. These include the static structure of the event causation, lack of event occurrence data, and need for reliable prior information. In this study, a new hierarchical Bayesian modeling based technique is proposed to overcome these drawbacks. The proposed technique can be used as a flexible technique for risk analysis of major accidents. It enables both forward and backward analysis in quantitative reasoning and the treatment of interdependence among the model parameters. Source-to-source variability in data sources is also taken into account through a robust probabilistic safety analysis. The applicability of the proposed technique has been demonstrated through a case study in marine and offshore industry. © 2017 Society for Risk Analysis.
Bayesian Integration of Information in Hippocampal Place Cells
Madl, Tamas; Franklin, Stan; Chen, Ke; Montaldi, Daniela; Trappl, Robert
2014-01-01
Accurate spatial localization requires a mechanism that corrects for errors, which might arise from inaccurate sensory information or neuronal noise. In this paper, we propose that Hippocampal place cells might implement such an error correction mechanism by integrating different sources of information in an approximately Bayes-optimal fashion. We compare the predictions of our model with physiological data from rats. Our results suggest that useful predictions regarding the firing fields of place cells can be made based on a single underlying principle, Bayesian cue integration, and that such predictions are possible using a remarkably small number of model parameters. PMID:24603429
A Prior for Neural Networks utilizing Enclosing Spheres for Normalization
NASA Astrophysics Data System (ADS)
v. Toussaint, U.; Gori, S.; Dose, V.
2004-11-01
Neural Networks are famous for their advantageous flexibility for problems when there is insufficient knowledge to set up a proper model. On the other hand this flexibility can cause over-fitting and can hamper the generalization properties of neural networks. Many approaches to regularize NN have been suggested but most of them based on ad-hoc arguments. Employing the principle of transformation invariance we derive a general prior in accordance with the Bayesian probability theory for a class of feedforward networks. Optimal networks are determined by Bayesian model comparison verifying the applicability of this approach.
Zhang, J L; Li, Y P; Huang, G H; Baetz, B W; Liu, J
2017-06-01
In this study, a Bayesian estimation-based simulation-optimization modeling approach (BESMA) is developed for identifying effluent trading strategies. BESMA incorporates nutrient fate modeling with soil and water assessment tool (SWAT), Bayesian estimation, and probabilistic-possibilistic interval programming with fuzzy random coefficients (PPI-FRC) within a general framework. Based on the water quality protocols provided by SWAT, posterior distributions of parameters can be analyzed through Bayesian estimation; stochastic characteristic of nutrient loading can be investigated which provides the inputs for the decision making. PPI-FRC can address multiple uncertainties in the form of intervals with fuzzy random boundaries and the associated system risk through incorporating the concept of possibility and necessity measures. The possibility and necessity measures are suitable for optimistic and pessimistic decision making, respectively. BESMA is applied to a real case of effluent trading planning in the Xiangxihe watershed, China. A number of decision alternatives can be obtained under different trading ratios and treatment rates. The results can not only facilitate identification of optimal effluent-trading schemes, but also gain insight into the effects of trading ratio and treatment rate on decision making. The results also reveal that decision maker's preference towards risk would affect decision alternatives on trading scheme as well as system benefit. Compared with the conventional optimization methods, it is proved that BESMA is advantageous in (i) dealing with multiple uncertainties associated with randomness and fuzziness in effluent-trading planning within a multi-source, multi-reach and multi-period context; (ii) reflecting uncertainties existing in nutrient transport behaviors to improve the accuracy in water quality prediction; and (iii) supporting pessimistic and optimistic decision making for effluent trading as well as promoting diversity of decision alternatives. Copyright © 2017 Elsevier Ltd. All rights reserved.
Physiologically based pharmacokinetic (PBPK) models are compartmental models that describe the uptake and distribution of drugs and chemicals throughout the body. They can be structured so that model parameters (i.e., physiological and chemical-specific) reflect biological charac...
Learning abstract visual concepts via probabilistic program induction in a Language of Thought.
Overlan, Matthew C; Jacobs, Robert A; Piantadosi, Steven T
2017-11-01
The ability to learn abstract concepts is a powerful component of human cognition. It has been argued that variable binding is the key element enabling this ability, but the computational aspects of variable binding remain poorly understood. Here, we address this shortcoming by formalizing the Hierarchical Language of Thought (HLOT) model of rule learning. Given a set of data items, the model uses Bayesian inference to infer a probability distribution over stochastic programs that implement variable binding. Because the model makes use of symbolic variables as well as Bayesian inference and programs with stochastic primitives, it combines many of the advantages of both symbolic and statistical approaches to cognitive modeling. To evaluate the model, we conducted an experiment in which human subjects viewed training items and then judged which test items belong to the same concept as the training items. We found that the HLOT model provides a close match to human generalization patterns, significantly outperforming two variants of the Generalized Context Model, one variant based on string similarity and the other based on visual similarity using features from a deep convolutional neural network. Additional results suggest that variable binding happens automatically, implying that binding operations do not add complexity to peoples' hypothesized rules. Overall, this work demonstrates that a cognitive model combining symbolic variables with Bayesian inference and stochastic program primitives provides a new perspective for understanding people's patterns of generalization. Copyright © 2017 Elsevier B.V. All rights reserved.
Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.
Rothenhoefer, Kathryn M; Costa, Vincent D; Bartolo, Ramón; Vicario-Feliciano, Raquel; Murray, Elisabeth A; Averbeck, Bruno B
2017-07-19
Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls. Using a Bayesian model to parse the groups' learning strategies, we also found that VS lesion monkeys defaulted to an action-based choice strategy. Therefore, the VS is involved specifically in learning the value of stimuli, not actions. SIGNIFICANCE STATEMENT Reinforcement learning models of the ventral striatum (VS) often assume that it maintains an estimate of state value. This suggests that it plays a general role in learning whether rewards are assigned based on a chosen action or stimulus. In the present experiment, we examined the effects of VS lesions on monkeys' ability to learn that choosing a particular action or stimulus was more likely to lead to reward. We found that VS lesions caused a specific deficit in the monkeys' ability to discriminate between images with different values, whereas their ability to discriminate between actions with different values remained intact. Our results therefore suggest that the VS plays a specific role in learning to select rewarded stimuli. Copyright © 2017 the authors 0270-6474/17/376902-13$15.00/0.
Salvi, Daniele; Macali, Armando; Mariottini, Paolo
2014-01-01
The bivalve family Ostreidae has a worldwide distribution and includes species of high economic importance. Phylogenetics and systematic of oysters based on morphology have proved difficult because of their high phenotypic plasticity. In this study we explore the phylogenetic information of the DNA sequence and secondary structure of the nuclear, fast-evolving, ITS2 rRNA and the mitochondrial 16S rRNA genes from the Ostreidae and we implemented a multi-locus framework based on four loci for oyster phylogenetics and systematics. Sequence-structure rRNA models aid sequence alignment and improved accuracy and nodal support of phylogenetic trees. In agreement with previous molecular studies, our phylogenetic results indicate that none of the currently recognized subfamilies, Crassostreinae, Ostreinae, and Lophinae, is monophyletic. Single gene trees based on Maximum likelihood (ML) and Bayesian (BA) methods and on sequence-structure ML were congruent with multilocus trees based on a concatenated (ML and BA) and coalescent based (BA) approaches and consistently supported three main clades: (i) Crassostrea, (ii) Saccostrea, and (iii) an Ostreinae-Lophinae lineage. Therefore, the subfamily Crassotreinae (including Crassostrea), Saccostreinae subfam. nov. (including Saccostrea and tentatively Striostrea) and Ostreinae (including Ostreinae and Lophinae taxa) are recognized. Based on phylogenetic and biogeographical evidence the Asian species of Crassostrea from the Pacific Ocean are assigned to Magallana gen. nov., whereas an integrative taxonomic revision is required for the genera Ostrea and Dendostrea. This study pointed out the suitability of the ITS2 marker for DNA barcoding of oyster and the relevance of using sequence-structure rRNA models and features of the ITS2 folding in molecular phylogenetics and taxonomy. The multilocus approach allowed inferring a robust phylogeny of Ostreidae providing a broad molecular perspective on their systematics. PMID:25250663
Salvi, Daniele; Macali, Armando; Mariottini, Paolo
2014-01-01
The bivalve family Ostreidae has a worldwide distribution and includes species of high economic importance. Phylogenetics and systematic of oysters based on morphology have proved difficult because of their high phenotypic plasticity. In this study we explore the phylogenetic information of the DNA sequence and secondary structure of the nuclear, fast-evolving, ITS2 rRNA and the mitochondrial 16S rRNA genes from the Ostreidae and we implemented a multi-locus framework based on four loci for oyster phylogenetics and systematics. Sequence-structure rRNA models aid sequence alignment and improved accuracy and nodal support of phylogenetic trees. In agreement with previous molecular studies, our phylogenetic results indicate that none of the currently recognized subfamilies, Crassostreinae, Ostreinae, and Lophinae, is monophyletic. Single gene trees based on Maximum likelihood (ML) and Bayesian (BA) methods and on sequence-structure ML were congruent with multilocus trees based on a concatenated (ML and BA) and coalescent based (BA) approaches and consistently supported three main clades: (i) Crassostrea, (ii) Saccostrea, and (iii) an Ostreinae-Lophinae lineage. Therefore, the subfamily Crassostreinae (including Crassostrea), Saccostreinae subfam. nov. (including Saccostrea and tentatively Striostrea) and Ostreinae (including Ostreinae and Lophinae taxa) are recognized [corrected]. Based on phylogenetic and biogeographical evidence the Asian species of Crassostrea from the Pacific Ocean are assigned to Magallana gen. nov., whereas an integrative taxonomic revision is required for the genera Ostrea and Dendostrea. This study pointed out the suitability of the ITS2 marker for DNA barcoding of oyster and the relevance of using sequence-structure rRNA models and features of the ITS2 folding in molecular phylogenetics and taxonomy. The multilocus approach allowed inferring a robust phylogeny of Ostreidae providing a broad molecular perspective on their systematics.
Probabilistic models in human sensorimotor control
Wolpert, Daniel M.
2009-01-01
Sensory and motor uncertainty form a fundamental constraint on human sensorimotor control. Bayesian decision theory (BDT) has emerged as a unifying framework to understand how the central nervous system performs optimal estimation and control in the face of such uncertainty. BDT has two components: Bayesian statistics and decision theory. Here we review Bayesian statistics and show how it applies to estimating the state of the world and our own body. Recent results suggest that when learning novel tasks we are able to learn the statistical properties of both the world and our own sensory apparatus so as to perform estimation using Bayesian statistics. We review studies which suggest that humans can combine multiple sources of information to form maximum likelihood estimates, can incorporate prior beliefs about possible states of the world so as to generate maximum a posteriori estimates and can use Kalman filter-based processes to estimate time-varying states. Finally, we review Bayesian decision theory in motor control and how the central nervous system processes errors to determine loss functions and optimal actions. We review results that suggest we plan movements based on statistics of our actions that result from signal-dependent noise on our motor outputs. Taken together these studies provide a statistical framework for how the motor system performs in the presence of uncertainty. PMID:17628731
Inference of epidemiological parameters from household stratified data
Walker, James N.; Ross, Joshua V.
2017-01-01
We consider a continuous-time Markov chain model of SIR disease dynamics with two levels of mixing. For this so-called stochastic households model, we provide two methods for inferring the model parameters—governing within-household transmission, recovery, and between-household transmission—from data of the day upon which each individual became infectious and the household in which each infection occurred, as might be available from First Few Hundred studies. Each method is a form of Bayesian Markov Chain Monte Carlo that allows us to calculate a joint posterior distribution for all parameters and hence the household reproduction number and the early growth rate of the epidemic. The first method performs exact Bayesian inference using a standard data-augmentation approach; the second performs approximate Bayesian inference based on a likelihood approximation derived from branching processes. These methods are compared for computational efficiency and posteriors from each are compared. The branching process is shown to be a good approximation and remains computationally efficient as the amount of data is increased. PMID:29045456