Sample records for model-based bayesian clustering

  1. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  2. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  3. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  4. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  5. Robust Bayesian clustering.

    PubMed

    Archambeau, Cédric; Verleysen, Michel

    2007-01-01

    A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.

  6. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    NASA Astrophysics Data System (ADS)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  7. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    USGS Publications Warehouse

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  8. Learning Bayesian Networks from Correlated Data

    NASA Astrophysics Data System (ADS)

    Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola

    2016-05-01

    Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

  9. Buried landmine detection using multivariate normal clustering

    NASA Astrophysics Data System (ADS)

    Duston, Brian M.

    2001-10-01

    A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

  10. Spatial cluster detection using dynamic programming.

    PubMed

    Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

    2012-03-25

    The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.

  11. Spatial cluster detection using dynamic programming

    PubMed Central

    2012-01-01

    Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

  12. On selecting a prior for the precision parameter of Dirichlet process mixture models

    USGS Publications Warehouse

    Dorazio, R.M.

    2009-01-01

    In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.

  13. Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

    DTIC Science & Technology

    2007-01-01

    including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA

  14. Bayesian hierarchical models for cost-effectiveness analyses that use data from cluster randomized trials.

    PubMed

    Grieve, Richard; Nixon, Richard; Thompson, Simon G

    2010-01-01

    Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.

  15. Modular analysis of the probabilistic genetic interaction network.

    PubMed

    Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting

    2011-03-15

    Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.

  16. Verification of Bayesian Clustering in Travel Behaviour Research – First Step to Macroanalysis of Travel Behaviour

    NASA Astrophysics Data System (ADS)

    Satra, P.; Carsky, J.

    2018-04-01

    Our research is looking at the travel behaviour from a macroscopic view, taking one municipality as a basic unit. The travel behaviour of one municipality as a whole is becoming one piece of a data in the research of travel behaviour of a larger area, perhaps a country. A data pre-processing is used to cluster the municipalities in groups, which show similarities in their travel behaviour. Such groups can be then researched for reasons of their prevailing pattern of travel behaviour without any distortion caused by municipalities with a different pattern. This paper deals with actual settings of the clustering process, which is based on Bayesian statistics, particularly the mixture model. An optimization of the settings parameters based on correlation of pointer model parameters and relative number of data in clusters is helpful, however not fully reliable method. Thus, method for graphic representation of clusters needs to be developed in order to check their quality. A training of the setting parameters in 2D has proven to be a beneficial method, because it allows visual control of the produced clusters. The clustering better be applied on separate groups of municipalities, where competition of only identical transport modes can be found.

  17. A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

    PubMed

    Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

    2018-05-31

    The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.

  18. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  19. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  20. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

    2015-03-01

    We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

  1. Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.

    PubMed

    Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib

    2017-03-01

    A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.

  2. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  3. Astrostatistical Analysis in Solar and Stellar Physics

    NASA Astrophysics Data System (ADS)

    Stenning, David Craig

    This dissertation focuses on developing statistical models and methods to address data-analytic challenges in astrostatistics---a growing interdisciplinary field fostering collaborations between statisticians and astrophysicists. The astrostatistics projects we tackle can be divided into two main categories: modeling solar activity and Bayesian analysis of stellar evolution. These categories from Part I and Part II of this dissertation, respectively. The first line of research we pursue involves classification and modeling of evolving solar features. Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. To analyze massive streams of solar image data, we develop a science-driven dimension reduction methodology to extract scientifically meaningful features from images. This methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in solar "active regions'' that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a more informative manner than existing schemes (i.e., manual classification schemes), and (iii) is amenable to sophisticated statistical analyses. In a related line of research, we perform a Bayesian analysis of the solar cycle using multiple proxy variables, such as sunspot numbers. We take advantage of patterns and correlations among the proxy variables to model solar activity using data from proxies that have become available more recently, while also taking advantage of the long history of observations of sunspot numbers. This model is an extension of the Yu et al. (2012) Bayesian hierarchical model for the solar cycle that used the sunspot numbers alone. Since proxies have different temporal coverage, we devise a multiple imputation scheme to account for missing data. We find that incorporating multiple proxies reveals important features of the solar cycle that are missed when the model is fit using only the sunspot numbers. In Part II of this dissertation we focus on two related lines of research involving Bayesian analysis of stellar evolution. We first focus on modeling multiple stellar populations in star clusters. It has long been assumed that all star clusters are comprised of single stellar populations---stars that formed at roughly the same time from a common molecular cloud. However, recent studies have produced evidence that some clusters host multiple populations, which has far-reaching scientific implications. We develop a Bayesian hierarchical model for multiple-population star clusters, extending earlier statistical models of stellar evolution (e.g., van Dyk et al. 2009, Stein et al. 2013). We also devise an adaptive Markov chain Monte Carlo algorithm to explore the complex posterior distribution. We use numerical studies to demonstrate that our method can recover parameters of multiple-population clusters, and also show how model misspecification can be diagnosed. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We also explore statistical properties of the estimators and determine that the influence of the prior distribution does not diminish with larger sample sizes, leading to non-standard asymptotics. In a final line of research, we present the first-ever attempt to estimate the carbon fraction of white dwarfs. This quantity has important implications for both astrophysics and fundamental nuclear physics, but is currently unknown. We use a numerical study to demonstrate that assuming an incorrect value for the carbon fraction leads to incorrect white-dwarf ages of star clusters. Finally, we present our attempt to estimate the carbon fraction of the white dwarfs in the well-studied star cluster 47 Tucanae.

  4. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  5. Bayesian Multi-Trait Analysis Reveals a Useful Tool to Increase Oil Concentration and to Decrease Toxicity in Jatropha curcas L.

    PubMed Central

    Silva Junqueira, Vinícius; de Azevedo Peixoto, Leonardo; Galvêas Laviola, Bruno; Lopes Bhering, Leonardo; Mendonça, Simone; Agostini Costa, Tania da Silveira; Antoniassi, Rosemar

    2016-01-01

    The biggest challenge for jatropha breeding is to identify superior genotypes that present high seed yield and seed oil content with reduced toxicity levels. Therefore, the objective of this study was to estimate genetic parameters for three important traits (weight of 100 seed, oil seed content, and phorbol ester concentration), and to select superior genotypes to be used as progenitors in jatropha breeding. Additionally, the genotypic values and the genetic parameters estimated under the Bayesian multi-trait approach were used to evaluate different selection indices scenarios of 179 half-sib families. Three different scenarios and economic weights were considered. It was possible to simultaneously reduce toxicity and increase seed oil content and weight of 100 seed by using index selection based on genotypic value estimated by the Bayesian multi-trait approach. Indeed, we identified two families that present these characteristics by evaluating genetic diversity using the Ward clustering method, which suggested nine homogenous clusters. Future researches must integrate the Bayesian multi-trait methods with realized relationship matrix, aiming to build accurate selection indices models. PMID:27281340

  6. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    PubMed Central

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  7. Combining Computational Methods for Hit to Lead Optimization in Mycobacterium tuberculosis Drug Discovery

    PubMed Central

    Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C

    2013-01-01

    Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686

  8. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  9. A Bayesian cluster analysis method for single-molecule localization microscopy data.

    PubMed

    Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick

    2016-12-01

    Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

  10. Latent structure modeling underlying theophylline tablet formulations using a Bayesian network based on a self-organizing map clustering.

    PubMed

    Yasuda, Akihito; Onuki, Yoshinori; Obata, Yasuko; Takayama, Kozo

    2015-01-01

    The "quality by design" concept in pharmaceutical formulation development requires the establishment of a science-based rationale and design space. In this article, we integrate thin-plate spline (TPS) interpolation, Kohonen's self-organizing map (SOM) and a Bayesian network (BN) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline tablets were prepared using a standard formulation. We measured the tensile strength and disintegration time as response variables and the compressibility, cohesion and dispersibility of the pretableting blend as latent variables. We predicted these variables quantitatively using nonlinear TPS, generated a large amount of data on pretableting blends and tablets and clustered these data into several clusters using a SOM. Our results show that we are able to predict the experimental values of the latent and response variables with a high degree of accuracy and are able to classify the tablet data into several distinct clusters. In addition, to visualize the latent structure between the causal and latent factors and the response variables, we applied a BN method to the SOM clustering results. We found that despite having inserted latent variables between the causal factors and response variables, their relation is equivalent to the results for the SOM clustering, and thus we are able to explain the underlying latent structure. Consequently, this technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline tablet formulation.

  11. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    NASA Astrophysics Data System (ADS)

    Frank, Philipp; Jasche, Jens; Enßlin, Torsten A.

    2016-11-01

    This work describes the implementation and application of a correlation determination method based on self organizing maps and Bayesian inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the self organizing map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian information criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide applications of our method to cosmological data. In particular, we present results of a correlation analysis between galaxy and active galactic nucleus (AGN) properties provided by the SDSS catalog with the cosmic large-scale-structure (LSS). The results indicate that the combined galaxy and LSS dataset indeed is clustered into several sub-samples of data with different average properties (for example different stellar masses or web-type classifications). The majority of data clusters appear to have a similar correlation structure between galaxy properties and the LSS. In particular we revealed a positive and linear dependency between the stellar mass, the absolute magnitude and the color of a galaxy with the corresponding cosmic density field. A remaining subset of data shows inverted correlations, which might be an artifact of non-linear redshift distortions.

  12. Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach.

    PubMed Central

    Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria

    2003-01-01

    OBJECTIVE: To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. METHODS: Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. FINDINGS: A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. CONCLUSION: Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic. PMID:12973640

  13. Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach.

    PubMed

    Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria

    2003-01-01

    To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.

  14. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  15. Bayesian Decision Theoretical Framework for Clustering

    ERIC Educational Resources Information Center

    Chen, Mo

    2011-01-01

    In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…

  16. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  17. Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering

    DTIC Science & Technology

    2005-08-04

    describe a four-band magnetic resonance image (MRI) consisting of 23,712 pixels of a brain with a tumor 2. Because of the size of the dataset, it is not...the Royal Statistical Society, Series B 56, 363–375. Figueiredo, M. A. T. and A. K. Jain (2002). Unsupervised learning of finite mixture models. IEEE...20 5.4 Brain MRI

  18. Hierarchical Bayesian modeling of heterogeneous variances in average daily weight gain of commercial feedlot cattle.

    PubMed

    Cernicchiaro, N; Renter, D G; Xiang, S; White, B J; Bello, N M

    2013-06-01

    Variability in ADG of feedlot cattle can affect profits, thus making overall returns more unstable. Hence, knowledge of the factors that contribute to heterogeneity of variances in animal performance can help feedlot managers evaluate risks and minimize profit volatility when making managerial and economic decisions in commercial feedlots. The objectives of the present study were to evaluate heteroskedasticity, defined as heterogeneity of variances, in ADG of cohorts of commercial feedlot cattle, and to identify cattle demographic factors at feedlot arrival as potential sources of variance heterogeneity, accounting for cohort- and feedlot-level information in the data structure. An operational dataset compiled from 24,050 cohorts from 25 U. S. commercial feedlots in 2005 and 2006 was used for this study. Inference was based on a hierarchical Bayesian model implemented with Markov chain Monte Carlo, whereby cohorts were modeled at the residual level and feedlot-year clusters were modeled as random effects. Forward model selection based on deviance information criteria was used to screen potentially important explanatory variables for heteroskedasticity at cohort- and feedlot-year levels. The Bayesian modeling framework was preferred as it naturally accommodates the inherently hierarchical structure of feedlot data whereby cohorts are nested within feedlot-year clusters. Evidence for heterogeneity of variance components of ADG was substantial and primarily concentrated at the cohort level. Feedlot-year specific effects were, by far, the greatest contributors to ADG heteroskedasticity among cohorts, with an estimated ∼12-fold change in dispersion between most and least extreme feedlot-year clusters. In addition, identifiable demographic factors associated with greater heterogeneity of cohort-level variance included smaller cohort sizes, fewer days on feed, and greater arrival BW, as well as feedlot arrival during summer months. These results support that heterogeneity of variances in ADG is prevalent in feedlot performance and indicate potential sources of heteroskedasticity. Further investigation of factors associated with heteroskedasticity in feedlot performance is warranted to increase consistency and uniformity in commercial beef cattle production and subsequent profitability.

  19. Bayesian analysis of volcanic eruptions

    NASA Astrophysics Data System (ADS)

    Ho, Chih-Hsiang

    1990-10-01

    The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.

  20. Decentralized cooperative TOA/AOA target tracking for hierarchical wireless sensor networks.

    PubMed

    Chen, Ying-Chih; Wen, Chih-Yu

    2012-11-08

    This paper proposes a distributed method for cooperative target tracking in hierarchical wireless sensor networks. The concept of leader-based information processing is conducted to achieve object positioning, considering a cluster-based network topology. Random timers and local information are applied to adaptively select a sub-cluster for the localization task. The proposed energy-efficient tracking algorithm allows each sub-cluster member to locally estimate the target position with a Bayesian filtering framework and a neural networking model, and further performs estimation fusion in the leader node with the covariance intersection algorithm. This paper evaluates the merits and trade-offs of the protocol design towards developing more efficient and practical algorithms for object position estimation.

  1. A practical Bayesian stepped wedge design for community-based cluster-randomized clinical trials: The British Columbia Telehealth Trial.

    PubMed

    Cunanan, Kristen M; Carlin, Bradley P; Peterson, Kevin A

    2016-12-01

    Many clinical trial designs are impractical for community-based clinical intervention trials. Stepped wedge trial designs provide practical advantages, but few descriptions exist of their clinical implementational features, statistical design efficiencies, and limitations. Enhance efficiency of stepped wedge trial designs by evaluating the impact of design characteristics on statistical power for the British Columbia Telehealth Trial. The British Columbia Telehealth Trial is a community-based, cluster-randomized, controlled clinical trial in rural and urban British Columbia. To determine the effect of an Internet-based telehealth intervention on healthcare utilization, 1000 subjects with an existing diagnosis of congestive heart failure or type 2 diabetes will be enrolled from 50 clinical practices. Hospital utilization is measured using a composite of disease-specific hospital admissions and emergency visits. The intervention comprises online telehealth data collection and counseling provided to support a disease-specific action plan developed by the primary care provider. The planned intervention is sequentially introduced across all participating practices. We adopt a fully Bayesian, Markov chain Monte Carlo-driven statistical approach, wherein we use simulation to determine the effect of cluster size, sample size, and crossover interval choice on type I error and power to evaluate differences in hospital utilization. For our Bayesian stepped wedge trial design, simulations suggest moderate decreases in power when crossover intervals from control to intervention are reduced from every 3 to 2 weeks, and dramatic decreases in power as the numbers of clusters decrease. Power and type I error performance were not notably affected by the addition of nonzero cluster effects or a temporal trend in hospitalization intensity. Stepped wedge trial designs that intervene in small clusters across longer periods can provide enhanced power to evaluate comparative effectiveness, while offering practical implementation advantages in geographic stratification, temporal change, use of existing data, and resource distribution. Current population estimates were used; however, models may not reflect actual event rates during the trial. In addition, temporal or spatial heterogeneity can bias treatment effect estimates. © The Author(s) 2016.

  2. Simultaneous Force Regression and Movement Classification of Fingers via Surface EMG within a Unified Bayesian Framework.

    PubMed

    Baldacchino, Tara; Jacobs, William R; Anderson, Sean R; Worden, Keith; Rowson, Jennifer

    2018-01-01

    This contribution presents a novel methodology for myolectric-based control using surface electromyographic (sEMG) signals recorded during finger movements. A multivariate Bayesian mixture of experts (MoE) model is introduced which provides a powerful method for modeling force regression at the fingertips, while also performing finger movement classification as a by-product of the modeling algorithm. Bayesian inference of the model allows uncertainties to be naturally incorporated into the model structure. This method is tested using data from the publicly released NinaPro database which consists of sEMG recordings for 6 degree-of-freedom force activations for 40 intact subjects. The results demonstrate that the MoE model achieves similar performance compared to the benchmark set by the authors of NinaPro for finger force regression. Additionally, inherent to the Bayesian framework is the inclusion of uncertainty in the model parameters, naturally providing confidence bounds on the force regression predictions. Furthermore, the integrated clustering step allows a detailed investigation into classification of the finger movements, without incurring any extra computational effort. Subsequently, a systematic approach to assessing the importance of the number of electrodes needed for accurate control is performed via sensitivity analysis techniques. A slight degradation in regression performance is observed for a reduced number of electrodes, while classification performance is unaffected.

  3. CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS

    PubMed Central

    McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.

    2014-01-01

    The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026

  4. The Gaia-ESO Survey: open clusters in Gaia-DR1 . A way forward to stellar age calibration

    NASA Astrophysics Data System (ADS)

    Randich, S.; Tognelli, E.; Jackson, R.; Jeffries, R. D.; Degl'Innocenti, S.; Pancino, E.; Re Fiorentin, P.; Spagna, A.; Sacco, G.; Bragaglia, A.; Magrini, L.; Prada Moroni, P. G.; Alfaro, E.; Franciosini, E.; Morbidelli, L.; Roccatagliata, V.; Bouy, H.; Bravi, L.; Jiménez-Esteban, F. M.; Jordi, C.; Zari, E.; Tautvaišiene, G.; Drazdauskas, A.; Mikolaitis, S.; Gilmore, G.; Feltzing, S.; Vallenari, A.; Bensby, T.; Koposov, S.; Korn, A.; Lanzafame, A.; Smiljanic, R.; Bayo, A.; Carraro, G.; Costado, M. T.; Heiter, U.; Hourihane, A.; Jofré, P.; Lewis, J.; Monaco, L.; Prisinzano, L.; Sbordone, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.

    2018-05-01

    Context. Determination and calibration of the ages of stars, which heavily rely on stellar evolutionary models, are very challenging, while representing a crucial aspect in many astrophysical areas. Aims: We describe the methodologies that, taking advantage of Gaia-DR1 and the Gaia-ESO Survey data, enable the comparison of observed open star cluster sequences with stellar evolutionary models. The final, long-term goal is the exploitation of open clusters as age calibrators. Methods: We perform a homogeneous analysis of eight open clusters using the Gaia-DR1 TGAS catalogue for bright members and information from the Gaia-ESO Survey for fainter stars. Cluster membership probabilities for the Gaia-ESO Survey targets are derived based on several spectroscopic tracers. The Gaia-ESO Survey also provides the cluster chemical composition. We obtain cluster parallaxes using two methods. The first one relies on the astrometric selection of a sample of bona fide members, while the other one fits the parallax distribution of a larger sample of TGAS sources. Ages and reddening values are recovered through a Bayesian analysis using the 2MASS magnitudes and three sets of standard models. Lithium depletion boundary (LDB) ages are also determined using literature observations and the same models employed for the Bayesian analysis. Results: For all but one cluster, parallaxes derived by us agree with those presented in Gaia Collaboration (2017, A&A, 601, A19), while a discrepancy is found for NGC 2516; we provide evidence supporting our own determination. Inferred cluster ages are robust against models and are generally consistent with literature values. Conclusions: The systematic parallax errors inherent in the Gaia DR1 data presently limit the precision of our results. Nevertheless, we have been able to place these eight clusters onto the same age scale for the first time, with good agreement between isochronal and LDB ages where there is overlap. Our approach appears promising and demonstrates the potential of combining Gaia and ground-based spectroscopic datasets. Based on observations collected with the FLAMES instrument at VLT/UT2 telescope (Paranal Observatory, ESO, Chile), for the Gaia-ESO Large Public Spectroscopic Survey (188.B-3002, 193.B-0936).Additional tables are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A99

  5. A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

    PubMed

    Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

    2009-02-10

    Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

  6. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.

    PubMed

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A

    2018-01-30

    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  7. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  8. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  9. A two step Bayesian approach for genomic prediction of breeding values.

    PubMed

    Shariati, Mohammad M; Sørensen, Peter; Janss, Luc

    2012-05-21

    In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.

  10. Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies.

    PubMed

    Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam

    2017-10-27

    Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.

  11. Quantitative comparison of alternative methods for coarse-graining biological networks

    PubMed Central

    Bowman, Gregory R.; Meng, Luming; Huang, Xuhui

    2013-01-01

    Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results. PMID:24089717

  12. Interactive classification and content-based retrieval of tissue images

    NASA Astrophysics Data System (ADS)

    Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

    2002-11-01

    We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.

  13. Clustering and Bayesian hierarchical modeling for the definition of informative prior distributions in hydrogeology

    NASA Astrophysics Data System (ADS)

    Cucchi, K.; Kawa, N.; Hesse, F.; Rubin, Y.

    2017-12-01

    In order to reduce uncertainty in the prediction of subsurface flow and transport processes, practitioners should use all data available. However, classic inverse modeling frameworks typically only make use of information contained in in-situ field measurements to provide estimates of hydrogeological parameters. Such hydrogeological information about an aquifer is difficult and costly to acquire. In this data-scarce context, the transfer of ex-situ information coming from previously investigated sites can be critical for improving predictions by better constraining the estimation procedure. Bayesian inverse modeling provides a coherent framework to represent such ex-situ information by virtue of the prior distribution and combine them with in-situ information from the target site. In this study, we present an innovative data-driven approach for defining such informative priors for hydrogeological parameters at the target site. Our approach consists in two steps, both relying on statistical and machine learning methods. The first step is data selection; it consists in selecting sites similar to the target site. We use clustering methods for selecting similar sites based on observable hydrogeological features. The second step is data assimilation; it consists in assimilating data from the selected similar sites into the informative prior. We use a Bayesian hierarchical model to account for inter-site variability and to allow for the assimilation of multiple types of site-specific data. We present the application and validation of the presented methods on an established database of hydrogeological parameters. Data and methods are implemented in the form of an open-source R-package and therefore facilitate easy use by other practitioners.

  14. Evaluating Spatial Variability in Sediment and Phosphorus Concentration-Discharge Relationships Using Bayesian Inference and Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.

    2017-12-01

    Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.

  15. Bayesian Nonparametric Inference – Why and How

    PubMed Central

    Müller, Peter; Mitra, Riten

    2013-01-01

    We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932

  16. A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

    ERIC Educational Resources Information Center

    DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

    2004-01-01

    This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…

  17. Determining open cluster membership. A Bayesian framework for quantitative member classification

    NASA Astrophysics Data System (ADS)

    Stott, Jonathan J.

    2018-01-01

    Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.

  18. Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

    PubMed

    Jolani, Shahab

    2018-03-01

    In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Efficient Matrix Models for Relational Learning

    DTIC Science & Technology

    2009-10-01

    74 4.5.3 Comparison to pLSI- pHITS . . . . . . . . . . . . . . . . . . . . 76 5 Hierarchical Bayesian Collective...Behaviour of Newton vs. Stochastic Newton on a three-factor model. 4.5.3 Comparison to pLSI- pHITS Caveat: Collective Matrix Factorization makes no guarantees...leads to better results; and another where a co-clustering model, pLSI- pHITS , has the advantage. pLSI- pHITS [24] is a relational clustering technique

  20. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  1. Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.

    PubMed

    Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen

    2017-09-25

    In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.

  2. Triadic split-merge sampler

    NASA Astrophysics Data System (ADS)

    van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap

    2018-04-01

    In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.

  3. The Gaia-ESO Survey: dynamical models of flattened, rotating globular clusters

    NASA Astrophysics Data System (ADS)

    Jeffreson, S. M. R.; Sanders, J. L.; Evans, N. W.; Williams, A. A.; Gilmore, G. F.; Bayo, A.; Bragaglia, A.; Casey, A. R.; Flaccomio, E.; Franciosini, E.; Hourihane, A.; Jackson, R. J.; Jeffries, R. D.; Jofré, P.; Koposov, S.; Lardo, C.; Lewis, J.; Magrini, L.; Morbidelli, L.; Pancino, E.; Randich, S.; Sacco, G. G.; Worley, C. C.; Zaggia, S.

    2017-08-01

    We present a family of self-consistent axisymmetric rotating globular cluster models which are fitted to spectroscopic data for NGC 362, NGC 1851, NGC 2808, NGC 4372, NGC 5927 and NGC 6752 to provide constraints on their physical and kinematic properties, including their rotation signals. They are constructed by flattening Modified Plummer profiles, which have the same asymptotic behaviour as classical Plummer models, but can provide better fits to young clusters due to a slower turnover in the density profile. The models are in dynamical equilibrium as they depend solely on the action variables. We employ a fully Bayesian scheme to investigate the uncertainty in our model parameters (including mass-to-light ratios and inclination angles) and evaluate the Bayesian evidence ratio for rotating to non-rotating models. We find convincing levels of rotation only in NGC 2808. In the other clusters, there is just a hint of rotation (in particular, NGC 4372 and NGC 5927), as the data quality does not allow us to draw strong conclusions. Where rotation is present, we find that it is confined to the central regions, within radii of R ≤ 2rh. As part of this work, we have developed a novel q-Gaussian basis expansion of the line-of-sight velocity distributions, from which general models can be constructed via interpolation on the basis coefficients.

  4. Accuracy of latent-variable estimation in Bayesian semi-supervised learning.

    PubMed

    Yamazaki, Keisuke

    2015-09-01

    Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Bayesian Hierarchical Grouping: perceptual grouping as mixture estimation

    PubMed Central

    Froyen, Vicky; Feldman, Jacob; Singh, Manish

    2015-01-01

    We propose a novel framework for perceptual grouping based on the idea of mixture models, called Bayesian Hierarchical Grouping (BHG). In BHG we assume that the configuration of image elements is generated by a mixture of distinct objects, each of which generates image elements according to some generative assumptions. Grouping, in this framework, means estimating the number and the parameters of the mixture components that generated the image, including estimating which image elements are “owned” by which objects. We present a tractable implementation of the framework, based on the hierarchical clustering approach of Heller and Ghahramani (2005). We illustrate it with examples drawn from a number of classical perceptual grouping problems, including dot clustering, contour integration, and part decomposition. Our approach yields an intuitive hierarchical representation of image elements, giving an explicit decomposition of the image into mixture components, along with estimates of the probability of various candidate decompositions. We show that BHG accounts well for a diverse range of empirical data drawn from the literature. Because BHG provides a principled quantification of the plausibility of grouping interpretations over a wide range of grouping problems, we argue that it provides an appealing unifying account of the elusive Gestalt notion of Prägnanz. PMID:26322548

  6. SLUG - stochastically lighting up galaxies - III. A suite of tools for simulated photometry, spectroscopy, and Bayesian inference with stochastic stellar populations

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Fumagalli, Michele; da Silva, Robert L.; Rendahl, Theodore; Parra, Jonathan

    2015-09-01

    Stellar population synthesis techniques for predicting the observable light emitted by a stellar population have extensive applications in numerous areas of astronomy. However, accurate predictions for small populations of young stars, such as those found in individual star clusters, star-forming dwarf galaxies, and small segments of spiral galaxies, require that the population be treated stochastically. Conversely, accurate deductions of the properties of such objects also require consideration of stochasticity. Here we describe a comprehensive suite of modular, open-source software tools for tackling these related problems. These include the following: a greatly-enhanced version of the SLUG code introduced by da Silva et al., which computes spectra and photometry for stochastically or deterministically sampled stellar populations with nearly arbitrary star formation histories, clustering properties, and initial mass functions; CLOUDY_SLUG, a tool that automatically couples SLUG-computed spectra with the CLOUDY radiative transfer code in order to predict stochastic nebular emission; BAYESPHOT, a general-purpose tool for performing Bayesian inference on the physical properties of stellar systems based on unresolved photometry; and CLUSTER_SLUG and SFR_SLUG, a pair of tools that use BAYESPHOT on a library of SLUG models to compute the mass, age, and extinction of mono-age star clusters, and the star formation rate of galaxies, respectively. The latter two tools make use of an extensive library of pre-computed stellar population models, which are included in the software. The complete package is available at http://www.slugsps.com.

  7. Conformational Transition Pathways of Epidermal Growth Factor Receptor Kinase Domain from Multiple Molecular Dynamics Simulations and Bayesian Clustering.

    PubMed

    Li, Yan; Li, Xiang; Ma, Weiya; Dong, Zigang

    2014-08-12

    The epidermal growth factor receptor (EGFR) is aberrantly activated in various cancer cells and an important target for cancer treatment. Deep understanding of EGFR conformational changes between the active and inactive states is of pharmaceutical interest. Here we present a strategy combining multiply targeted molecular dynamics simulations, unbiased molecular dynamics simulations, and Bayesian clustering to investigate transition pathways during the activation/inactivation process of EGFR kinase domain. Two distinct pathways between the active and inactive forms are designed, explored, and compared. Based on Bayesian clustering and rough two-dimensional free energy surfaces, the energy-favorable pathway is recognized, though DFG-flip happens in both pathways. In addition, another pathway with different intermediate states appears in our simulations. Comparison of distinct pathways also indicates that disruption of the Lys745-Glu762 interaction is critically important in DFG-flip while movement of the A-loop significantly facilitates the conformational change. Our simulations yield new insights into EGFR conformational transitions. Moreover, our results verify that this approach is valid and efficient in sampling of protein conformational changes and comparison of distinct pathways.

  8. The improved business valuation model for RFID company based on the community mining method.

    PubMed

    Li, Shugang; Yu, Zhaoxu

    2017-01-01

    Nowadays, the appetite for the investment and mergers and acquisitions (M&A) activity in RFID companies is growing rapidly. Although the huge number of papers have addressed the topic of business valuation models based on statistical methods or neural network methods, only a few are dedicated to constructing a general framework for business valuation that improves the performance with network graph (NG) and the corresponding community mining (CM) method. In this study, an NG based business valuation model is proposed, where real options approach (ROA) integrating CM method is designed to predict the company's net profit as well as estimate the company value. Three improvements are made in the proposed valuation model: Firstly, our model figures out the credibility of the node belonging to each community and clusters the network according to the evolutionary Bayesian method. Secondly, the improved bacterial foraging optimization algorithm (IBFOA) is adopted to calculate the optimized Bayesian posterior probability function. Finally, in IBFOA, bi-objective method is used to assess the accuracy of prediction, and these two objectives are combined into one objective function using a new Pareto boundary method. The proposed method returns lower forecasting error than 10 well-known forecasting models on 3 different time interval valuing tasks for the real-life simulation of RFID companies.

  9. The improved business valuation model for RFID company based on the community mining method

    PubMed Central

    Li, Shugang; Yu, Zhaoxu

    2017-01-01

    Nowadays, the appetite for the investment and mergers and acquisitions (M&A) activity in RFID companies is growing rapidly. Although the huge number of papers have addressed the topic of business valuation models based on statistical methods or neural network methods, only a few are dedicated to constructing a general framework for business valuation that improves the performance with network graph (NG) and the corresponding community mining (CM) method. In this study, an NG based business valuation model is proposed, where real options approach (ROA) integrating CM method is designed to predict the company’s net profit as well as estimate the company value. Three improvements are made in the proposed valuation model: Firstly, our model figures out the credibility of the node belonging to each community and clusters the network according to the evolutionary Bayesian method. Secondly, the improved bacterial foraging optimization algorithm (IBFOA) is adopted to calculate the optimized Bayesian posterior probability function. Finally, in IBFOA, bi-objective method is used to assess the accuracy of prediction, and these two objectives are combined into one objective function using a new Pareto boundary method. The proposed method returns lower forecasting error than 10 well-known forecasting models on 3 different time interval valuing tasks for the real-life simulation of RFID companies. PMID:28459815

  10. Slicing cluster mass functions with a Bayesian razor

    NASA Astrophysics Data System (ADS)

    Sealfon, C. D.

    2010-08-01

    We apply a Bayesian ``razor" to forecast Bayes factors between different parameterizations of the galaxy cluster mass function. To demonstrate this approach, we calculate the minimum size N-body simulation needed for strong evidence favoring a two-parameter mass function over one-parameter mass functions and visa versa, as a function of the minimum cluster mass.

  11. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    PubMed

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  12. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  13. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  14. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  15. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  16. Bayesian investigation of isochrone consistency using the old open cluster NGC 188

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hills, Shane; Courteau, Stéphane; Von Hippel, Ted

    2015-03-01

    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities thatmore » enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.« less

  17. Developing a new Bayesian Risk Index for risk evaluation of soil contamination.

    PubMed

    Albuquerque, M T D; Gerassis, S; Sierra, C; Taboada, J; Martín, J E; Antunes, I M H R; Gallego, J R

    2017-12-15

    Industrial and agricultural activities heavily constrain soil quality. Potentially Toxic Elements (PTEs) are a threat to public health and the environment alike. In this regard, the identification of areas that require remediation is crucial. In the herein research a geochemical dataset (230 samples) comprising 14 elements (Cu, Pb, Zn, Ag, Ni, Mn, Fe, As, Cd, V, Cr, Ti, Al and S) was gathered throughout eight different zones distinguished by their main activity, namely, recreational, agriculture/livestock and heavy industry in the Avilés Estuary (North of Spain). Then a stratified systematic sampling method was used at short, medium, and long distances from each zone to obtain a representative picture of the total variability of the selected attributes. The information was then combined in four risk classes (Low, Moderate, High, Remediation) following reference values from several sediment quality guidelines (SQGs). A Bayesian analysis, inferred for each zone, allowed the characterization of PTEs correlations, the unsupervised learning network technique proving to be the best fit. Based on the Bayesian network structure obtained, Pb, As and Mn were selected as key contamination parameters. For these 3 elements, the conditional probability obtained was allocated to each observed point, and a simple, direct index (Bayesian Risk Index-BRI) was constructed as a linear rating of the pre-defined risk classes weighted by the previously obtained probability. Finally, the BRI underwent geostatistical modeling. One hundred Sequential Gaussian Simulations (SGS) were computed. The Mean Image and the Standard Deviation maps were obtained, allowing the definition of High/Low risk clusters (Local G clustering) and the computation of spatial uncertainty. High-risk clusters are mainly distributed within the area with the highest altitude (agriculture/livestock) showing an associated low spatial uncertainty, clearly indicating the need for remediation. Atmospheric emissions, mainly derived from the metallurgical industry, contribute to soil contamination by PTEs. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Quantile regression and Bayesian cluster detection to identify radon prone areas.

    PubMed

    Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio

    2016-11-01

    Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Spatial clustering of average risks and risk trends in Bayesian disease mapping.

    PubMed

    Anderson, Craig; Lee, Duncan; Dean, Nema

    2017-01-01

    Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network

    PubMed Central

    2011-01-01

    Background Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. Results We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism’s metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. Conclusions After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis. PMID:22784571

  1. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network.

    PubMed

    Kim, Hyun Uk; Kim, Tae Yong; Lee, Sang Yup

    2011-01-01

    Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism's metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis.

  2. A Probabilistic Model of Social Working Memory for Information Retrieval in Social Interactions.

    PubMed

    Li, Liyuan; Xu, Qianli; Gan, Tian; Tan, Cheston; Lim, Joo-Hwee

    2018-05-01

    Social working memory (SWM) plays an important role in navigating social interactions. Inspired by studies in psychology, neuroscience, cognitive science, and machine learning, we propose a probabilistic model of SWM to mimic human social intelligence for personal information retrieval (IR) in social interactions. First, we establish a semantic hierarchy as social long-term memory to encode personal information. Next, we propose a semantic Bayesian network as the SWM, which integrates the cognitive functions of accessibility and self-regulation. One subgraphical model implements the accessibility function to learn the social consensus about IR-based on social information concept, clustering, social context, and similarity between persons. Beyond accessibility, one more layer is added to simulate the function of self-regulation to perform the personal adaptation to the consensus based on human personality. Two learning algorithms are proposed to train the probabilistic SWM model on a raw dataset of high uncertainty and incompleteness. One is an efficient learning algorithm of Newton's method, and the other is a genetic algorithm. Systematic evaluations show that the proposed SWM model is able to learn human social intelligence effectively and outperforms the baseline Bayesian cognitive model. Toward real-world applications, we implement our model on Google Glass as a wearable assistant for social interaction.

  3. A Poisson nonnegative matrix factorization method with parameter subspace clustering constraint for endmember extraction in hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Sun, Weiwei; Ma, Jun; Yang, Gang; Du, Bo; Zhang, Liangpei

    2017-06-01

    A new Bayesian method named Poisson Nonnegative Matrix Factorization with Parameter Subspace Clustering Constraint (PNMF-PSCC) has been presented to extract endmembers from Hyperspectral Imagery (HSI). First, the method integrates the liner spectral mixture model with the Bayesian framework and it formulates endmember extraction into a Bayesian inference problem. Second, the Parameter Subspace Clustering Constraint (PSCC) is incorporated into the statistical program to consider the clustering of all pixels in the parameter subspace. The PSCC could enlarge differences among ground objects and helps finding endmembers with smaller spectrum divergences. Meanwhile, the PNMF-PSCC method utilizes the Poisson distribution as the prior knowledge of spectral signals to better explain the quantum nature of light in imaging spectrometer. Third, the optimization problem of PNMF-PSCC is formulated into maximizing the joint density via the Maximum A Posterior (MAP) estimator. The program is finally solved by iteratively optimizing two sub-problems via the Alternating Direction Method of Multipliers (ADMM) framework and the FURTHESTSUM initialization scheme. Five state-of-the art methods are implemented to make comparisons with the performance of PNMF-PSCC on both the synthetic and real HSI datasets. Experimental results show that the PNMF-PSCC outperforms all the five methods in Spectral Angle Distance (SAD) and Root-Mean-Square-Error (RMSE), and especially it could identify good endmembers for ground objects with smaller spectrum divergences.

  4. Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

    NASA Technical Reports Server (NTRS)

    Chapman, G. M. (Principal Investigator); Carnes, J. G.

    1981-01-01

    Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.

  5. Modelling Inter-relationships among water, governance, human development variables in developing countries with Bayesian networks.

    NASA Astrophysics Data System (ADS)

    Dondeynaz, C.; Lopez-Puga, J.; Carmona-Moreno, C.

    2012-04-01

    Improving Water and Sanitation Services (WSS), being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation). This inter-dependency has been recognised with the adoption of the "Integrated Water Resources Management" principles that push for the integration of these various dimensions involved in WSS delivery to ensure an efficient and sustainable management. The understanding of these interrelations appears as crucial for decision makers in the water sector in particular in developing countries where WSS still represent an important leverage for livelihood improvement. In this framework, the Joint Research Centre of the European Commission has developed a coherent database (WatSan4Dev database) containing 29 indicators from environmental, socio-economic, governance and financial aid flows data focusing on developing countries (Celine et al, 2011 under publication). The aim of this work is to model the WatSan4Dev dataset using probabilistic models to identify the key variables influencing or being influenced by the water supply and sanitation access levels. Bayesian Network Models are suitable to map the conditional dependencies between variables and also allows ordering variables by level of influence on the dependent variable. Separated models have been built for water supply and for sanitation because of different behaviour. The models are validated if complying with statistical criteria but either with scientific knowledge and literature. A two steps approach has been adopted to build the structure of the model; Bayesian network is first built for each thematic cluster of variables (e.g governance, agricultural pressure, or human development) keeping a detailed level for interpretation later one. A global model is then built based on significant indicators of each cluster being previously modelled. The structure of the relationships between variable are set a priori according to literature and/or experience in the field (expert knowledge). The statistical validation is verified according to error rate of classification, and the significance of the variables. Sensibility analysis has also been performed to characterise the relative influence of every single variable in the model. Once validated, the models allow the estimation of impact of each variable on the behaviour of the water supply or sanitation providing an interesting mean to test scenarios and predict variables behaviours. The choices made, methods and description of the various models, for each cluster as well as the global model for water supply and sanitation will be presented. Key results and interpretation of the relationships depicted by the models will be detailed during the conference.

  6. Bayesian Framework for Water Quality Model Uncertainty Estimation and Risk Management

    EPA Science Inventory

    A formal Bayesian methodology is presented for integrated model calibration and risk-based water quality management using Bayesian Monte Carlo simulation and maximum likelihood estimation (BMCML). The primary focus is on lucid integration of model calibration with risk-based wat...

  7. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  8. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods.

    PubMed

    Pometti, Carolina L; Bessega, Cecilia F; Saidman, Beatriz O; Vilardi, Juan C

    2014-03-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other.

  9. Spatial heterogeneity and risk factors for stunting among children under age five in Ethiopia: A Bayesian geo-statistical model.

    PubMed

    Hagos, Seifu; Hailemariam, Damen; WoldeHanna, Tasew; Lindtjørn, Bernt

    2017-01-01

    Understanding the spatial distribution of stunting and underlying factors operating at meso-scale is of paramount importance for intervention designing and implementations. Yet, little is known about the spatial distribution of stunting and some discrepancies are documented on the relative importance of reported risk factors. Therefore, the present study aims at exploring the spatial distribution of stunting at meso- (district) scale, and evaluates the effect of spatial dependency on the identification of risk factors and their relative contribution to the occurrence of stunting and severe stunting in a rural area of Ethiopia. A community based cross sectional study was conducted to measure the occurrence of stunting and severe stunting among children aged 0-59 months. Additionally, we collected relevant information on anthropometric measures, dietary habits, parent and child-related demographic and socio-economic status. Latitude and longitude of surveyed households were also recorded. Local Anselin Moran's I was calculated to investigate the spatial variation of stunting prevalence and identify potential local pockets (hotspots) of high prevalence. Finally, we employed a Bayesian geo-statistical model, which accounted for spatial dependency structure in the data, to identify potential risk factors for stunting in the study area. Overall, the prevalence of stunting and severe stunting in the district was 43.7% [95%CI: 40.9, 46.4] and 21.3% [95%CI: 19.5, 23.3] respectively. We identified statistically significant clusters of high prevalence of stunting (hotspots) in the eastern part of the district and clusters of low prevalence (cold spots) in the western. We found out that the inclusion of spatial structure of the data into the Bayesian model has shown to improve the fit for stunting model. The Bayesian geo-statistical model indicated that the risk of stunting increased as the child's age increased (OR 4.74; 95% Bayesian credible interval [BCI]:3.35-6.58) and among boys (OR 1.28; 95%BCI; 1.12-1.45). However, maternal education and household food security were found to be protective against stunting and severe stunting. Stunting prevalence may vary across space at different scale. For this, it's important that nutrition studies and, more importantly, control interventions take into account this spatial heterogeneity in the distribution of nutritional deficits and their underlying associated factors. The findings of this study also indicated that interventions integrating household food insecurity in nutrition programs in the district might help to avert the burden of stunting.

  10. Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards

    PubMed Central

    Tsai, Tsung-Heng; Tadesse, Mahlet G.; Di Poto, Cristina; Pannell, Lewis K.; Mechref, Yehia; Wang, Yue; Ressom, Habtom W.

    2013-01-01

    Motivation: Liquid chromatography-mass spectrometry (LC-MS) has been widely used for profiling expression levels of biomolecules in various ‘-omic’ studies including proteomics, metabolomics and glycomics. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time (RT) alignment, which is required to ensure that ion intensity measurements among multiple LC-MS runs are comparable, is one of the most important yet challenging preprocessing steps. Current alignment approaches estimate RT variability using either single chromatograms or detected peaks, but do not simultaneously take into account the complementary information embedded in the entire LC-MS data. Results: We propose a Bayesian alignment model for LC-MS data analysis. The alignment model provides estimates of the RT variability along with uncertainty measures. The model enables integration of multiple sources of information including internal standards and clustered chromatograms in a mathematically rigorous framework. We apply the model to LC-MS metabolomic, proteomic and glycomic data. The performance of the model is evaluated based on ground-truth data, by measuring correlation of variation, RT difference across runs and peak-matching performance. We demonstrate that Bayesian alignment model improves significantly the RT alignment performance through appropriate integration of relevant information. Availability and implementation: MATLAB code, raw and preprocessed LC-MS data are available at http://omics.georgetown.edu/alignLCMS.html Contact: hwr@georgetown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24013927

  11. Multi-angle backscatter classification and sub-bottom profiling for improved seafloor characterization

    NASA Astrophysics Data System (ADS)

    Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens

    2018-06-01

    This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.

  12. Effect of Clustering Algorithm on Establishing Markov State Model for Molecular Dynamics Simulations.

    PubMed

    Li, Yan; Dong, Zigang

    2016-06-27

    Recently, the Markov state model has been applied for kinetic analysis of molecular dynamics simulations. However, discretization of the conformational space remains a primary challenge in model building, and it is not clear how the space decomposition by distinct clustering strategies exerts influence on the model output. In this work, different clustering algorithms are employed to partition the conformational space sampled in opening and closing of fatty acid binding protein 4 as well as inactivation and activation of the epidermal growth factor receptor. Various classifications are achieved, and Markov models are set up accordingly. On the basis of the models, the total net flux and transition rate are calculated between two distinct states. Our results indicate that geometric and kinetic clustering perform equally well. The construction and outcome of Markov models are heavily dependent on the data traits. Compared to other methods, a combination of Bayesian and hierarchical clustering is feasible in identification of metastable states.

  13. Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

    NASA Astrophysics Data System (ADS)

    Al Sobhi, Mashail M.

    2015-02-01

    Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.

  14. Nonlinear inversion of electrical resistivity imaging using pruning Bayesian neural networks

    NASA Astrophysics Data System (ADS)

    Jiang, Fei-Bo; Dai, Qian-Wei; Dong, Li

    2016-06-01

    Conventional artificial neural networks used to solve electrical resistivity imaging (ERI) inversion problem suffer from overfitting and local minima. To solve these problems, we propose to use a pruning Bayesian neural network (PBNN) nonlinear inversion method and a sample design method based on the K-medoids clustering algorithm. In the sample design method, the training samples of the neural network are designed according to the prior information provided by the K-medoids clustering results; thus, the training process of the neural network is well guided. The proposed PBNN, based on Bayesian regularization, is used to select the hidden layer structure by assessing the effect of each hidden neuron to the inversion results. Then, the hyperparameter α k , which is based on the generalized mean, is chosen to guide the pruning process according to the prior distribution of the training samples under the small-sample condition. The proposed algorithm is more efficient than other common adaptive regularization methods in geophysics. The inversion of synthetic data and field data suggests that the proposed method suppresses the noise in the neural network training stage and enhances the generalization. The inversion results with the proposed method are better than those of the BPNN, RBFNN, and RRBFNN inversion methods as well as the conventional least squares inversion.

  15. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.

    PubMed

    Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E

    2016-06-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.

  16. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

    PubMed Central

    Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161

  17. Estimating relative risks in multicenter studies with a small number of centers - which methods to use? A simulation study.

    PubMed

    Pedroza, Claudia; Truong, Van Thi Thanh

    2017-11-02

    Analyses of multicenter studies often need to account for center clustering to ensure valid inference. For binary outcomes, it is particularly challenging to properly adjust for center when the number of centers or total sample size is small, or when there are few events per center. Our objective was to evaluate the performance of generalized estimating equation (GEE) log-binomial and Poisson models, generalized linear mixed models (GLMMs) assuming binomial and Poisson distributions, and a Bayesian binomial GLMM to account for center effect in these scenarios. We conducted a simulation study with few centers (≤30) and 50 or fewer subjects per center, using both a randomized controlled trial and an observational study design to estimate relative risk. We compared the GEE and GLMM models with a log-binomial model without adjustment for clustering in terms of bias, root mean square error (RMSE), and coverage. For the Bayesian GLMM, we used informative neutral priors that are skeptical of large treatment effects that are almost never observed in studies of medical interventions. All frequentist methods exhibited little bias, and the RMSE was very similar across the models. The binomial GLMM had poor convergence rates, ranging from 27% to 85%, but performed well otherwise. The results show that both GEE models need to use small sample corrections for robust SEs to achieve proper coverage of 95% CIs. The Bayesian GLMM had similar convergence rates but resulted in slightly more biased estimates for the smallest sample sizes. However, it had the smallest RMSE and good coverage across all scenarios. These results were very similar for both study designs. For the analyses of multicenter studies with a binary outcome and few centers, we recommend adjustment for center with either a GEE log-binomial or Poisson model with appropriate small sample corrections or a Bayesian binomial GLMM with informative priors.

  18. Bayesian techniques for analyzing group differences in the Iowa Gambling Task: A case study of intuitive and deliberate decision-makers.

    PubMed

    Steingroever, Helen; Pachur, Thorsten; Šmíra, Martin; Lee, Michael D

    2018-06-01

    The Iowa Gambling Task (IGT) is one of the most popular experimental paradigms for comparing complex decision-making across groups. Most commonly, IGT behavior is analyzed using frequentist tests to compare performance across groups, and to compare inferred parameters of cognitive models developed for the IGT. Here, we present a Bayesian alternative based on Bayesian repeated-measures ANOVA for comparing performance, and a suite of three complementary model-based methods for assessing the cognitive processes underlying IGT performance. The three model-based methods involve Bayesian hierarchical parameter estimation, Bayes factor model comparison, and Bayesian latent-mixture modeling. We illustrate these Bayesian methods by applying them to test the extent to which differences in intuitive versus deliberate decision style are associated with differences in IGT performance. The results show that intuitive and deliberate decision-makers behave similarly on the IGT, and the modeling analyses consistently suggest that both groups of decision-makers rely on similar cognitive processes. Our results challenge the notion that individual differences in intuitive and deliberate decision styles have a broad impact on decision-making. They also highlight the advantages of Bayesian methods, especially their ability to quantify evidence in favor of the null hypothesis, and that they allow model-based analyses to incorporate hierarchical and latent-mixture structures.

  19. Automated flow cytometric analysis across large numbers of samples and cell types.

    PubMed

    Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno

    2015-04-01

    Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.

  20. A comprehensive comparative test of seven widely used spectral synthesis models against multi-band photometry of young massive-star clusters

    NASA Astrophysics Data System (ADS)

    Wofford, A.; Charlot, S.; Bruzual, G.; Eldridge, J. J.; Calzetti, D.; Adamo, A.; Cignoni, M.; de Mink, S. E.; Gouliermis, D. A.; Grasha, K.; Grebel, E. K.; Lee, J. C.; Östlin, G.; Smith, L. J.; Ubeda, L.; Zackrisson, E.

    2016-04-01

    We test the predictions of spectral synthesis models based on seven different massive-star prescriptions against Legacy ExtraGalactic UV Survey (LEGUS) observations of eight young massive clusters in two local galaxies, NGC 1566 and NGC 5253, chosen because predictions of all seven models are available at the published galactic metallicities. The high angular resolution, extensive cluster inventory, and full near-ultraviolet to near-infrared photometric coverage make the LEGUS data set excellent for this study. We account for both stellar and nebular emission in the models and try two different prescriptions for attenuation by dust. From Bayesian fits of model libraries to the observations, we find remarkably low dispersion in the median E(B - V) (˜0.03 mag), stellar masses (˜104 M⊙), and ages (˜1 Myr) derived for individual clusters using different models, although maximum discrepancies in these quantities can reach 0.09 mag and factors of 2.8 and 2.5, respectively. This is for ranges in median properties of 0.05-0.54 mag, 1.8-10 × 104 M⊙, and 1.6-40 Myr spanned by the clusters in our sample. In terms of best fit, the observations are slightly better reproduced by models with interacting binaries and least well reproduced by models with single rotating stars. Our study provides a first quantitative estimate of the accuracies and uncertainties of the most recent spectral synthesis models of young stellar populations, demonstrates the good progress of models in fitting high-quality observations, and highlights the needs for a larger cluster sample and more extensive tests of the model parameter space.

  1. Fundamentals and Recent Developments in Approximate Bayesian Computation

    PubMed Central

    Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka

    2017-01-01

    Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922

  2. Predation-related costs and benefits of conspecific attraction in songbirds--an agent-based approach.

    PubMed

    Szymkowiak, Jakub; Kuczyński, Lechosław

    2015-01-01

    Songbirds that follow a conspecific attraction strategy in the habitat selection process prefer to settle in habitat patches already occupied by other individuals. This largely affects the patterns of their spatio-temporal distribution and leads to clustered breeding. Although making informed settlement decisions is expected to be beneficial for individuals, such territory clusters may potentially provide additional fitness benefits (e.g., through the dilution effect) or costs (e.g., possibly facilitating nest localization if predators respond functionally to prey distribution). Thus, we hypothesized that the fitness consequences of following a conspecific attraction strategy may largely depend on the composition of the predator community. We developed an agent-based model in which we simulated the settling behavior of birds that use a conspecific attraction strategy and breed in a multi-predator landscape with predators that exhibited different foraging strategies. Moreover, we investigated whether Bayesian updating of prior settlement decisions according to the perceived predation risk may improve the fitness of birds that rely on conspecific cues. Our results provide evidence that the fitness consequences of conspecific attraction are predation-related. We found that in landscapes dominated by predators able to respond functionally to prey distribution, clustered breeding led to fitness costs. However, this cost could be reduced if birds performed Bayesian updating of prior settlement decisions and perceived nesting with too many neighbors as a threat. Our results did not support the hypothesis that in landscapes dominated by incidental predators, clustered breeding as a byproduct of conspecific attraction provides fitness benefits through the dilution effect. We suggest that this may be due to the spatial scale of songbirds' aggregative behavior. In general, we provide evidence that when considering the fitness consequences of conspecific attraction for songbirds, one should expect a trade-off between the benefits of making informed decisions and the costs of clustering.

  3. Predation-Related Costs and Benefits of Conspecific Attraction in Songbirds—An Agent-Based Approach

    PubMed Central

    Szymkowiak, Jakub; Kuczyński, Lechosław

    2015-01-01

    Songbirds that follow a conspecific attraction strategy in the habitat selection process prefer to settle in habitat patches already occupied by other individuals. This largely affects the patterns of their spatio-temporal distribution and leads to clustered breeding. Although making informed settlement decisions is expected to be beneficial for individuals, such territory clusters may potentially provide additional fitness benefits (e.g., through the dilution effect) or costs (e.g., possibly facilitating nest localization if predators respond functionally to prey distribution). Thus, we hypothesized that the fitness consequences of following a conspecific attraction strategy may largely depend on the composition of the predator community. We developed an agent-based model in which we simulated the settling behavior of birds that use a conspecific attraction strategy and breed in a multi-predator landscape with predators that exhibited different foraging strategies. Moreover, we investigated whether Bayesian updating of prior settlement decisions according to the perceived predation risk may improve the fitness of birds that rely on conspecific cues. Our results provide evidence that the fitness consequences of conspecific attraction are predation-related. We found that in landscapes dominated by predators able to respond functionally to prey distribution, clustered breeding led to fitness costs. However, this cost could be reduced if birds performed Bayesian updating of prior settlement decisions and perceived nesting with too many neighbors as a threat. Our results did not support the hypothesis that in landscapes dominated by incidental predators, clustered breeding as a byproduct of conspecific attraction provides fitness benefits through the dilution effect. We suggest that this may be due to the spatial scale of songbirds’ aggregative behavior. In general, we provide evidence that when considering the fitness consequences of conspecific attraction for songbirds, one should expect a trade-off between the benefits of making informed decisions and the costs of clustering. PMID:25790479

  4. A Bayesian alternative for multi-objective ecohydrological model specification

    NASA Astrophysics Data System (ADS)

    Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori

    2018-01-01

    Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.

  5. Calibrating the Planck cluster mass scale with CLASH

    NASA Astrophysics Data System (ADS)

    Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.

    2017-08-01

    We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.

  6. Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods

    PubMed Central

    Pometti, Carolina L.; Bessega, Cecilia F.; Saidman, Beatriz O.; Vilardi, Juan C.

    2014-01-01

    Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other. PMID:24688293

  7. Modeling Diagnostic Assessments with Bayesian Networks

    ERIC Educational Resources Information Center

    Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego

    2007-01-01

    This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…

  8. Moving beyond qualitative evaluations of Bayesian models of cognition.

    PubMed

    Hemmer, Pernille; Tauber, Sean; Steyvers, Mark

    2015-06-01

    Bayesian models of cognition provide a powerful way to understand the behavior and goals of individuals from a computational point of view. Much of the focus in the Bayesian cognitive modeling approach has been on qualitative model evaluations, where predictions from the models are compared to data that is often averaged over individuals. In many cognitive tasks, however, there are pervasive individual differences. We introduce an approach to directly infer individual differences related to subjective mental representations within the framework of Bayesian models of cognition. In this approach, Bayesian data analysis methods are used to estimate cognitive parameters and motivate the inference process within a Bayesian cognitive model. We illustrate this integrative Bayesian approach on a model of memory. We apply the model to behavioral data from a memory experiment involving the recall of heights of people. A cross-validation analysis shows that the Bayesian memory model with inferred subjective priors predicts withheld data better than a Bayesian model where the priors are based on environmental statistics. In addition, the model with inferred priors at the individual subject level led to the best overall generalization performance, suggesting that individual differences are important to consider in Bayesian models of cognition.

  9. Invited commentary: Lost in estimation--searching for alternatives to markov chains to fit complex Bayesian models.

    PubMed

    Molitor, John

    2012-03-01

    Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.

  10. New insights into the classification and nomenclature of cortical GABAergic interneurons.

    PubMed

    DeFelipe, Javier; López-Cruz, Pedro L; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R; Huang, Josh; Jones, Edward G; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A; Marín, Oscar; Markram, Henry; McBain, Chris J; Meyer, Hanno S; Monyer, Hannah; Nelson, Sacha B; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L R; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M; Sherwood, Chet C; Staiger, Jochen F; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A

    2013-03-01

    A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus.

  11. New insights into the classification and nomenclature of cortical GABAergic interneurons

    PubMed Central

    DeFelipe, Javier; López-Cruz, Pedro L.; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F.; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R.; Huang, Josh; Jones, Edward G.; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A.; Marín, Oscar; Markram, Henry; McBain, Chris J.; Meyer, Hanno S.; Monyer, Hannah; Nelson, Sacha B.; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L. R.; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M.; Sherwood, Chet C.; Staiger, Jochen F.; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A.

    2013-01-01

    A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts’ assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus. PMID:23385869

  12. Bayesian Analysis and Characterization of Multiple Populations in Galactic Globular Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, Rachel A.; Stenning, David; Sarajedini, Ata; von Hippel, Ted; van Dyk, David A.; Robinson, Elliot; Stein, Nathan; Jefferys, William H.; BASE-9, HST UVIS Globular Cluster Treasury Program

    2017-01-01

    Globular clusters have long been important tools to unlock the early history of galaxies. Thus, it is crucial we understand the formation and characteristics of the globular clusters (GCs) themselves. Historically, GCs were thought to be simple and largely homogeneous populations, formed via collapse of a single molecular cloud. However, this classical view has been overwhelmingly invalidated by recent work. It is now clear that the vast majority of globular clusters in our Galaxy host two or more chemically distinct populations of stars, with variations in helium and light elements at discrete abundance levels. No coherent story has arisen that is able to fully explain the formation of multiple populations in globular clusters nor the mechanisms that drive stochastic variations from cluster to cluster.We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of 0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster. We also find that the proportion of the first population of stars increases with mass. Our results are examined in the context of proposed globular cluster formation scenarios.

  13. Evolutionary History of Wild Barley (Hordeum vulgare subsp. spontaneum) Analyzed Using Multilocus Sequence Data and Paleodistribution Modeling

    PubMed Central

    Jakob, Sabine S.; Rödder, Dennis; Engler, Jan O.; Shaaf, Salar; Özkan, Hakan; Blattner, Frank R.; Kilian, Benjamin

    2014-01-01

    Studies of Hordeum vulgare subsp. spontaneum, the wild progenitor of cultivated barley, have mostly relied on materials collected decades ago and maintained since then ex situ in germplasm repositories. We analyzed spatial genetic variation in wild barley populations collected rather recently, exploring sequence variations at seven single-copy nuclear loci, and inferred the relationships among these populations and toward the genepool of the crop. The wild barley collection covers the whole natural distribution area from the Mediterranean to Middle Asia. In contrast to earlier studies, Bayesian assignment analyses revealed three population clusters, in the Levant, Turkey, and east of Turkey, respectively. Genetic diversity was exceptionally high in the Levant, while eastern populations were depleted of private alleles. Species distribution modeling based on climate parameters and extant occurrence points of the taxon inferred suitable habitat conditions during the ice-age, particularly in the Levant and Turkey. Together with the ecologically wide range of habitats, they might contribute to structured but long-term stable populations in this region and their high genetic diversity. For recently collected individuals, Bayesian assignment to geographic clusters was generally unambiguous, but materials from genebanks often showed accessions that were not placed according to their assumed geographic origin or showed traces of introgression from cultivated barley. We assign this to gene flow among accessions during ex situ maintenance. Evolutionary studies based on such materials might therefore result in wrong conclusions regarding the history of the species or the origin and mode of domestication of the crop, depending on the accessions included. PMID:24586028

  14. Effects of additional data on Bayesian clustering.

    PubMed

    Yamazaki, Keisuke

    2017-10-01

    Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  16. Inferring the Growth of Massive Galaxies Using Bayesian Spectral Synthesis Modeling

    NASA Astrophysics Data System (ADS)

    Stillman, Coley Michael; Poremba, Megan R.; Moustakas, John

    2018-01-01

    The most massive galaxies in the universe are typically found at the centers of massive galaxy clusters. Studying these galaxies can provide valuable insight into the hierarchical growth of massive dark matter halos. One of the key challenges of measuring the stellar mass growth of massive galaxies is converting the measured light profiles into stellar mass. We use Prospector, a state-of-the-art Bayesian spectral synthesis modeling code, to infer the total stellar masses of a pilot sample of massive central galaxies selected from the Sloan Digital Sky Survey. We compare our stellar mass estimates to previous measurements, and present some of the quantitative diagnostics provided by Prospector.

  17. Potential of SNP markers for the characterization of Brazilian cassava germplasm.

    PubMed

    de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte

    2014-06-01

    High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.

  18. When mechanism matters: Bayesian forecasting using models of ecological diffusion

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

    2017-01-01

    Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.

  19. Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837

    ERIC Educational Resources Information Center

    Levy, Roy

    2014-01-01

    Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…

  20. Parallelized Bayesian inversion for three-dimensional dental X-ray imaging.

    PubMed

    Kolehmainen, Ville; Vanne, Antti; Siltanen, Samuli; Järvenpää, Seppo; Kaipio, Jari P; Lassas, Matti; Kalke, Martti

    2006-02-01

    Diagnostic and operational tasks based on dental radiology often require three-dimensional (3-D) information that is not available in a single X-ray projection image. Comprehensive 3-D information about tissues can be obtained by computerized tomography (CT) imaging. However, in dental imaging a conventional CT scan may not be available or practical because of high radiation dose, low-resolution or the cost of the CT scanner equipment. In this paper, we consider a novel type of 3-D imaging modality for dental radiology. We consider situations in which projection images of the teeth are taken from a few sparsely distributed projection directions using the dentist's regular (digital) X-ray equipment and the 3-D X-ray attenuation function is reconstructed. A complication in these experiments is that the reconstruction of the 3-D structure based on a few projection images becomes an ill-posed inverse problem. Bayesian inversion is a well suited framework for reconstruction from such incomplete data. In Bayesian inversion, the ill-posed reconstruction problem is formulated in a well-posed probabilistic form in which a priori information is used to compensate for the incomplete information of the projection data. In this paper we propose a Bayesian method for 3-D reconstruction in dental radiology. The method is partially based on Kolehmainen et al. 2003. The prior model for dental structures consist of a weighted l1 and total variation (TV)-prior together with the positivity prior. The inverse problem is stated as finding the maximum a posteriori (MAP) estimate. To make the 3-D reconstruction computationally feasible, a parallelized version of an optimization algorithm is implemented for a Beowulf cluster computer. The method is tested with projection data from dental specimens and patient data. Tomosynthetic reconstructions are given as reference for the proposed method.

  1. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  2. Evolution of the Selfing Syndrome in Arabis alpina (Brassicaceae).

    PubMed

    Tedder, Andrew; Carleial, Samuel; Gołębiewska, Martyna; Kappel, Christian; Shimizu, Kentaro K; Stift, Marc

    2015-01-01

    The transition from cross-fertilisation (outcrossing) to self-fertilisation (selfing) frequently coincides with changes towards a floral morphology that optimises self-pollination, the selfing syndrome. Population genetic studies have reported the existence of both outcrossing and selfing populations in Arabis alpina (Brassicaceae), which is an emerging model species for studying the molecular basis of perenniality and local adaptation. It is unknown whether its selfing populations have evolved a selfing syndrome. Using macro-photography, microscopy and automated cell counting, we compared floral syndromes (size, herkogamy, pollen and ovule numbers) between three outcrossing populations from the Apuan Alps and three selfing populations from the Western and Central Alps (Maritime Alps and Dolomites). In addition, we genotyped the plants for 12 microsatellite loci to confirm previous measures of diversity and inbreeding coefficients based on allozymes, and performed Bayesian clustering. Plants from the three selfing populations had markedly smaller flowers, less herkogamy and lower pollen production than plants from the three outcrossing populations, whereas pistil length and ovule number have remained constant. Compared to allozymes, microsatellite variation was higher, but revealed similar patterns of low diversity and high Fis in selfing populations. Bayesian clustering revealed two clusters. The first cluster contained the three outcrossing populations from the Apuan Alps, the second contained the three selfing populations from the Maritime Alps and Dolomites. We conclude that in comparison to three outcrossing populations, three populations with high selfing rates are characterised by a flower morphology that is closer to the selfing syndrome. The presence of outcrossing and selfing floral syndromes within a single species will facilitate unravelling the genetic basis of the selfing syndrome, and addressing which selective forces drive its evolution.

  3. Star Cluster Properties in Two LEGUS Galaxies Computed with Stochastic Stellar Population Synthesis Models

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik

    2015-10-01

    We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.

  4. Self-optimized construction of transition rate matrices from accelerated atomistic simulations with Bayesian uncertainty quantification

    NASA Astrophysics Data System (ADS)

    Swinburne, Thomas D.; Perez, Danny

    2018-05-01

    A massively parallel method to build large transition rate matrices from temperature-accelerated molecular dynamics trajectories is presented. Bayesian Markov model analysis is used to estimate the expected residence time in the known state space, providing crucial uncertainty quantification for higher-scale simulation schemes such as kinetic Monte Carlo or cluster dynamics. The estimators are additionally used to optimize where exploration is performed and the degree of temperature acceleration on the fly, giving an autonomous, optimal procedure to explore the state space of complex systems. The method is tested against exactly solvable models and used to explore the dynamics of C15 interstitial defects in iron. Our uncertainty quantification scheme allows for accurate modeling of the evolution of these defects over timescales of several seconds.

  5. Bayesian multimodel inference for dose-response studies

    USGS Publications Warehouse

    Link, W.A.; Albers, P.H.

    2007-01-01

    Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.

  6. Development of uncertainty-based work injury model using Bayesian structural equation modelling.

    PubMed

    Chatterjee, Snehamoy

    2014-01-01

    This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.

  7. Inductive Approaches to Improving Diagnosis and Design for Diagnosability

    NASA Technical Reports Server (NTRS)

    Fisher, Douglas H. (Principal Investigator)

    1995-01-01

    The first research area under this grant addresses the problem of classifying time series according to their morphological features in the time domain. A supervised learning system called CALCHAS, which induces a classification procedure for signatures from preclassified examples, was developed. For each of several signature classes, the system infers a model that captures the class's morphological features using Bayesian model induction and the minimum message length approach to assign priors. After induction, a time series (signature) is classified in one of the classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. A second area of research assumes two sources of information about a system: a model or domain theory that encodes aspects of the system under study and data from actual system operations over time. A model, when it exists, represents strong prior expectations about how a system will perform. Our work with a diagnostic model of the RCS (Reaction Control System) of the Space Shuttle motivated the development of SIG, a system which combines information from a model (or domain theory) and data. As it tracks RCS behavior, the model computes quantitative and qualitative values. Induction is then performed over the data represented by both the 'raw' features and the model-computed high-level features. Finally, work on clustering for operating mode discovery motivated some important extensions to the clustering strategy we had used. One modification appends an iterative optimization technique onto the clustering system; this optimization strategy appears to be novel in the clustering literature. A second modification improves the noise tolerance of the clustering system. In particular, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus making post-clustering analysis easier.

  8. Evaluating for a geospatial relationship between radon levels and thyroid cancer in Pennsylvania.

    PubMed

    Goyal, Neerav; Camacho, Fabian; Mangano, Joseph; Goldenberg, David

    2015-01-01

    To determine whether there is an association between radon levels and the rise in incidence of thyroid cancer in Pennsylvania. Epidemiological study of the state of Pennsylvania. We used information from the Pennsylvania Cancer Registry and the Pennsylvania Department of Energy. From the registry, information regarding thyroid incidence by county and zip code was recorded. Information regarding radon levels per county was recorded from the state. Poisson regression models were fit predicting county-level thyroid incidence and change as a function of radon/lagged radon levels. To account for measurement error in the radon levels, a Bayesian Model extending the Poisson models was fit. Geospatial clustering analysis was also performed. No association was noted between cumulative radon levels and thyroid incidence. In the Poisson modeling, no significant association was noted between county radon level and thyroid cancer incidence (P = .23). Looking for a lag between the radon level and its effect, no significant effect was seen with a lag of 0 to 6 years between exposure and effect (P = .063 to P = .59). The Bayesian models also failed to show a statistically significant association. A cluster of high thyroid cancer incidence was found in western Pennsylvania. Through a variety of models, no association was elicited between annual radon levels recorded in Pennsylvania and the rising incidence of thyroid cancer. However, a cluster of thyroid cancer incidence was found in western Pennsylvania. Further studies may be helpful in looking for other exposures or associations. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.

  9. A comprehensive probabilistic analysis model of oil pipelines network based on Bayesian network

    NASA Astrophysics Data System (ADS)

    Zhang, C.; Qin, T. X.; Jiang, B.; Huang, C.

    2018-02-01

    Oil pipelines network is one of the most important facilities of energy transportation. But oil pipelines network accident may result in serious disasters. Some analysis models for these accidents have been established mainly based on three methods, including event-tree, accident simulation and Bayesian network. Among these methods, Bayesian network is suitable for probabilistic analysis. But not all the important influencing factors are considered and the deployment rule of the factors has not been established. This paper proposed a probabilistic analysis model of oil pipelines network based on Bayesian network. Most of the important influencing factors, including the key environment condition and emergency response are considered in this model. Moreover, the paper also introduces a deployment rule for these factors. The model can be used in probabilistic analysis and sensitive analysis of oil pipelines network accident.

  10. Bayesian seismic tomography by parallel interacting Markov chains

    NASA Astrophysics Data System (ADS)

    Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas

    2014-05-01

    The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.

  11. Infinite von Mises-Fisher Mixture Modeling of Whole Brain fMRI Data.

    PubMed

    Røge, Rasmus E; Madsen, Kristoffer H; Schmidt, Mikkel N; Mørup, Morten

    2017-10-01

    Cluster analysis of functional magnetic resonance imaging (fMRI) data is often performed using gaussian mixture models, but when the time series are standardized such that the data reside on a hypersphere, this modeling assumption is questionable. The consequences of ignoring the underlying spherical manifold are rarely analyzed, in part due to the computational challenges imposed by directional statistics. In this letter, we discuss a Bayesian von Mises-Fisher (vMF) mixture model for data on the unit hypersphere and present an efficient inference procedure based on collapsed Markov chain Monte Carlo sampling. Comparing the vMF and gaussian mixture models on synthetic data, we demonstrate that the vMF model has a slight advantage inferring the true underlying clustering when compared to gaussian-based models on data generated from both a mixture of vMFs and a mixture of gaussians subsequently normalized. Thus, when performing model selection, the two models are not in agreement. Analyzing multisubject whole brain resting-state fMRI data from healthy adult subjects, we find that the vMF mixture model is considerably more reliable than the gaussian mixture model when comparing solutions across models trained on different groups of subjects, and again we find that the two models disagree on the optimal number of components. The analysis indicates that the fMRI data support more than a thousand clusters, and we confirm this is not a result of overfitting by demonstrating better prediction on data from held-out subjects. Our results highlight the utility of using directional statistics to model standardized fMRI data and demonstrate that whole brain segmentation of fMRI data requires a very large number of functional units in order to adequately account for the discernible statistical patterns in the data.

  12. A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model

    NASA Astrophysics Data System (ADS)

    Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor

    2018-02-01

    Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.

  13. Analysis of Genetic Diversity and Structure Pattern of Indigofera Pseudotinctoria in Karst Habitats of the Wushan Mountains Using AFLP Markers.

    PubMed

    Fan, Yan; Zhang, Chenglin; Wu, Wendan; He, Wei; Zhang, Li; Ma, Xiao

    2017-10-16

    Indigofera pseudotinctoria Mats is an agronomically and economically important perennial legume shrub with a high forage yield, protein content and strong adaptability, which is subject to natural habitat fragmentation and serious human disturbance. Until now, our knowledge of the genetic relationships and intraspecific genetic diversity for its wild collections is still poor, especially at small spatial scales. Here amplified fragment length polymorphism (AFLP) technology was employed for analysis of genetic diversity, differentiation, and structure of 364 genotypes of I. pseudotinctoria from 15 natural locations in Wushan Montain, a highly structured mountain with typical karst landforms in Southwest China. We also tested whether eco-climate factors has affected genetic structure by correlating genetic diversity with habitat features. A total of 515 distinctly scoreable bands were generated, and 324 of them were polymorphic. The polymorphic information content (PIC) ranged from 0.694 to 0.890 with an average of 0.789 per primer pair. On species level, Nei's gene diversity ( H j ), the Bayesian genetic diversity index ( H B ) and the Shannon information index ( I ) were 0.2465, 0.2363 and 0.3772, respectively. The high differentiation among all sampling sites was detected ( F ST = 0.2217, G ST = 0.1746, G' ST = 0.2060, θ B = 0.1844), and instead, gene flow among accessions ( N m = 1.1819) was restricted. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. This structure pattern may indicate joint effects by the neutral evolution and natural selection. Restricted N m was observed across all accessions, and genetic barriers were detected between adjacent accessions due to specifically geographical landform.

  14. Model-based Bayesian inference for ROC data analysis

    NASA Astrophysics Data System (ADS)

    Lei, Tianhu; Bae, K. Ty

    2013-03-01

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.

  15. Estimating Tree Height-Diameter Models with the Bayesian Method

    PubMed Central

    Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei

    2014-01-01

    Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the “best” model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2. PMID:24711733

  16. Estimating tree height-diameter models with the Bayesian method.

    PubMed

    Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei

    2014-01-01

    Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.

  17. Bayesian informative dropout model for longitudinal binary data with random effects using conditional and joint modeling approaches.

    PubMed

    Chan, Jennifer S K

    2016-05-01

    Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop-out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user-friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop-out on parameter estimates is evaluated through simulation studies. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  19. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    PubMed

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  20. Sparse Event Modeling with Hierarchical Bayesian Kernel Methods

    DTIC Science & Technology

    2016-01-05

    SECURITY CLASSIFICATION OF: The research objective of this proposal was to develop a predictive Bayesian kernel approach to model count data based on...several predictive variables. Such an approach, which we refer to as the Poisson Bayesian kernel model , is able to model the rate of occurrence of...which adds specificity to the model and can make nonlinear data more manageable. Early results show that the 1. REPORT DATE (DD-MM-YYYY) 4. TITLE

  1. A Bayesian, generalized frailty model for comet assays.

    PubMed

    Ghebretinsae, Aklilu Habteab; Faes, Christel; Molenberghs, Geert; De Boeck, Marlies; Geys, Helena

    2013-05-01

    This paper proposes a flexible modeling approach for so-called comet assay data regularly encountered in preclinical research. While such data consist of non-Gaussian outcomes in a multilevel hierarchical structure, traditional analyses typically completely or partly ignore this hierarchical nature by summarizing measurements within a cluster. Non-Gaussian outcomes are often modeled using exponential family models. This is true not only for binary and count data, but also for, example, time-to-event outcomes. Two important reasons for extending this family are for (1) the possible occurrence of overdispersion, meaning that the variability in the data may not be adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of a hierarchical structure in the data, owing to clustering in the data. The first issue is dealt with through so-called overdispersion models. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. In the case of time-to-event data, one encounters, for example, the gamma frailty model (Duchateau and Janssen, 2007 ). While both of these issues may occur simultaneously, models combining both are uncommon. Molenberghs et al. ( 2010 ) proposed a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. Here, we use this method to model data from a comet assay with a three-level hierarchical structure. Although a conjugate gamma random effect is used for the overdispersion random effect, both gamma and normal random effects are considered for the hierarchical random effect. Apart from model formulation, we place emphasis on Bayesian estimation. Our proposed method has an upper hand over the traditional analysis in that it (1) uses the appropriate distribution stipulated in the literature; (2) deals with the complete hierarchical nature; and (3) uses all information instead of summary measures. The fit of the model to the comet assay is compared against the background of more conventional model fits. Results indicate the toxicity of 1,2-dimethylhydrazine dihydrochloride at different dose levels (low, medium, and high).

  2. MC 2: A Deeper Look at ZwCl 2341.1+0000 with Bayesian Galaxy Clustering and Weak Lensing Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benson, B.; Wittman, D. M.; Golovich, N.

    ZwCl 2341.1+0000, a merging galaxy cluster with disturbed X-ray morphology and widely separated (~3 Mpc) double radio relics, was thought to be an extremely massive (10 - 30 X 10 14M⊙) and complex system with little known about its merger history. We present JVLA 2-4 GHz observations of the cluster, along with new spectroscopy from our Keck/DEIMOS survey, and apply Gaussian Mixture Modeling to the three-dimensional distribution of 227 con rmed cluster galaxies. After adopting the Bayesian Information Criterion to avoid over tting, which we discover can bias total dynamical mass estimates high, we nd that a three-substructure model withmore » a total dynamical mass estimate of 9:39 ± 0:81 X 10 14M⊙ is favored. We also present deep Subaru imaging and perform the rst weak lensing analysis on this system, obtaining a weak lensing mass estimate of 5:57±2:47X10 14M⊙. This is a more robust estimate because it does not depend on the dynamical state of the system, which is disturbed due to the merger. Our results indicate that ZwCl 2341.1+0000 is a multiple merger system comprised of at least three substructures, with the main merger that produced the radio relics occurring near to the plane of the sky, and a younger merger in the North occurring closer to the line of sight. Dynamical modeling of the main merger reproduces observed quantities (relic positions and polarizations, subcluster separation and radial velocity difference), if the merger axis angle of ~10 +34 -6 degrees and the collision speed at pericenter is ~1900 +300 -200 km/s.« less

  3. MC 2: A Deeper Look at ZwCl 2341.1+0000 with Bayesian Galaxy Clustering and Weak Lensing Analyses

    DOE PAGES

    Benson, B.; Wittman, D. M.; Golovich, N.; ...

    2017-05-16

    ZwCl 2341.1+0000, a merging galaxy cluster with disturbed X-ray morphology and widely separated (~3 Mpc) double radio relics, was thought to be an extremely massive (10 - 30 X 10 14M⊙) and complex system with little known about its merger history. We present JVLA 2-4 GHz observations of the cluster, along with new spectroscopy from our Keck/DEIMOS survey, and apply Gaussian Mixture Modeling to the three-dimensional distribution of 227 con rmed cluster galaxies. After adopting the Bayesian Information Criterion to avoid over tting, which we discover can bias total dynamical mass estimates high, we nd that a three-substructure model withmore » a total dynamical mass estimate of 9:39 ± 0:81 X 10 14M⊙ is favored. We also present deep Subaru imaging and perform the rst weak lensing analysis on this system, obtaining a weak lensing mass estimate of 5:57±2:47X10 14M⊙. This is a more robust estimate because it does not depend on the dynamical state of the system, which is disturbed due to the merger. Our results indicate that ZwCl 2341.1+0000 is a multiple merger system comprised of at least three substructures, with the main merger that produced the radio relics occurring near to the plane of the sky, and a younger merger in the North occurring closer to the line of sight. Dynamical modeling of the main merger reproduces observed quantities (relic positions and polarizations, subcluster separation and radial velocity difference), if the merger axis angle of ~10 +34 -6 degrees and the collision speed at pericenter is ~1900 +300 -200 km/s.« less

  4. Exploring the Structure of Spatial Representations

    PubMed Central

    Madl, Tamas; Franklin, Stan; Chen, Ke; Trappl, Robert; Montaldi, Daniela

    2016-01-01

    It has been suggested that the map-like representations that support human spatial memory are fragmented into sub-maps with local reference frames, rather than being unitary and global. However, the principles underlying the structure of these ‘cognitive maps’ are not well understood. We propose that the structure of the representations of navigation space arises from clustering within individual psychological spaces, i.e. from a process that groups together objects that are close in these spaces. Building on the ideas of representational geometry and similarity-based representations in cognitive science, we formulate methods for learning dissimilarity functions (metrics) characterizing participants’ psychological spaces. We show that these learned metrics, together with a probabilistic model of clustering based on the Bayesian cognition paradigm, allow prediction of participants’ cognitive map structures in advance. Apart from insights into spatial representation learning in human cognition, these methods could facilitate novel computational tools capable of using human-like spatial concepts. We also compare several features influencing spatial memory structure, including spatial distance, visual similarity and functional similarity, and report strong correlations between these dimensions and the grouping probability in participants’ spatial representations, providing further support for clustering in spatial memory. PMID:27347681

  5. Effect of Bayesian Student Modeling on Academic Achievement in Foreign Language Teaching (University Level English Preparatory School Example)

    ERIC Educational Resources Information Center

    Aslan, Burak Galip; Öztürk, Özlem; Inceoglu, Mustafa Murat

    2014-01-01

    Considering the increasing importance of adaptive approaches in CALL systems, this study implemented a machine learning based student modeling middleware with Bayesian networks. The profiling approach of the student modeling system is based on Felder and Silverman's Learning Styles Model and Felder and Soloman's Index of Learning Styles…

  6. Development and comparison in uncertainty assessment based Bayesian modularization method in hydrological modeling

    NASA Astrophysics Data System (ADS)

    Li, Lu; Xu, Chong-Yu; Engeland, Kolbjørn

    2013-04-01

    SummaryWith respect to model calibration, parameter estimation and analysis of uncertainty sources, various regression and probabilistic approaches are used in hydrological modeling. A family of Bayesian methods, which incorporates different sources of information into a single analysis through Bayes' theorem, is widely used for uncertainty assessment. However, none of these approaches can well treat the impact of high flows in hydrological modeling. This study proposes a Bayesian modularization uncertainty assessment approach in which the highest streamflow observations are treated as suspect information that should not influence the inference of the main bulk of the model parameters. This study includes a comprehensive comparison and evaluation of uncertainty assessments by our new Bayesian modularization method and standard Bayesian methods using the Metropolis-Hastings (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions were used in combination with standard Bayesian method: the AR(1) plus Normal model independent of time (Model 1), the AR(1) plus Normal model dependent on time (Model 2) and the AR(1) plus Multi-normal model (Model 3). The results reveal that the Bayesian modularization method provides the most accurate streamflow estimates measured by the Nash-Sutcliffe efficiency and provide the best in uncertainty estimates for low, medium and entire flows compared to standard Bayesian methods. The study thus provides a new approach for reducing the impact of high flows on the discharge uncertainty assessment of hydrological models via Bayesian method.

  7. Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.

    PubMed

    Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong

    2015-11-01

    The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.

  8. Species delimitation in the Stenocereus griseus (Cactaceae) species complex reveals a new species, S. huastecorum

    PubMed Central

    Alvarado-Sizzo, Hernán; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian

    2018-01-01

    The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC. PMID:29342184

  9. Species delimitation in the Stenocereus griseus (Cactaceae) species complex reveals a new species, S. huastecorum.

    PubMed

    Alvarado-Sizzo, Hernán; Casas, Alejandro; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian

    2018-01-01

    The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC.

  10. Uncertainty aggregation and reduction in structure-material performance prediction

    NASA Astrophysics Data System (ADS)

    Hu, Zhen; Mahadevan, Sankaran; Ao, Dan

    2018-02-01

    An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.

  11. Hierarchical Bayesian Modeling of Fluid-Induced Seismicity

    NASA Astrophysics Data System (ADS)

    Broccardo, M.; Mignan, A.; Wiemer, S.; Stojadinovic, B.; Giardini, D.

    2017-11-01

    In this study, we present a Bayesian hierarchical framework to model fluid-induced seismicity. The framework is based on a nonhomogeneous Poisson process with a fluid-induced seismicity rate proportional to the rate of injected fluid. The fluid-induced seismicity rate model depends upon a set of physically meaningful parameters and has been validated for six fluid-induced case studies. In line with the vision of hierarchical Bayesian modeling, the rate parameters are considered as random variables. We develop both the Bayesian inference and updating rules, which are used to develop a probabilistic forecasting model. We tested the Basel 2006 fluid-induced seismic case study to prove that the hierarchical Bayesian model offers a suitable framework to coherently encode both epistemic uncertainty and aleatory variability. Moreover, it provides a robust and consistent short-term seismic forecasting model suitable for online risk quantification and mitigation.

  12. Classifying emotion in Twitter using Bayesian network

    NASA Astrophysics Data System (ADS)

    Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya

    2018-03-01

    Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.

  13. Assessment of Matrix Multiplication Learning with a Rule-Based Analytical Model--"A Bayesian Network Representation"

    ERIC Educational Resources Information Center

    Zhang, Zhidong

    2016-01-01

    This study explored an alternative assessment procedure to examine learning trajectories of matrix multiplication. It took rule-based analytical and cognitive task analysis methods specifically to break down operation rules for a given matrix multiplication. Based on the analysis results, a hierarchical Bayesian network, an assessment model,…

  14. A Gibbs sampler for Bayesian analysis of site-occupancy data

    USGS Publications Warehouse

    Dorazio, Robert M.; Rodriguez, Daniel Taylor

    2012-01-01

    1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.

  15. Bayesian Mass Estimates of the Milky Way: Including Measurement Uncertainties with Hierarchical Bayes

    NASA Astrophysics Data System (ADS)

    Eadie, Gwendolyn M.; Springford, Aaron; Harris, William E.

    2017-02-01

    We present a hierarchical Bayesian method for estimating the total mass and mass profile of the Milky Way Galaxy. The new hierarchical Bayesian approach further improves the framework presented by Eadie et al. and Eadie and Harris and builds upon the preliminary reports by Eadie et al. The method uses a distribution function f({ E },L) to model the Galaxy and kinematic data from satellite objects, such as globular clusters (GCs), to trace the Galaxy’s gravitational potential. A major advantage of the method is that it not only includes complete and incomplete data simultaneously in the analysis, but also incorporates measurement uncertainties in a coherent and meaningful way. We first test the hierarchical Bayesian framework, which includes measurement uncertainties, using the same data and power-law model assumed in Eadie and Harris and find the results are similar but more strongly constrained. Next, we take advantage of the new statistical framework and incorporate all possible GC data, finding a cumulative mass profile with Bayesian credible regions. This profile implies a mass within 125 kpc of 4.8× {10}11{M}⊙ with a 95% Bayesian credible region of (4.0{--}5.8)× {10}11{M}⊙ . Our results also provide estimates of the true specific energies of all the GCs. By comparing these estimated energies to the measured energies of GCs with complete velocity measurements, we observe that (the few) remote tracers with complete measurements may play a large role in determining a total mass estimate of the Galaxy. Thus, our study stresses the need for more remote tracers with complete velocity measurements.

  16. Development of dynamic Bayesian models for web application test management

    NASA Astrophysics Data System (ADS)

    Azarnova, T. V.; Polukhin, P. V.; Bondarenko, Yu V.; Kashirina, I. L.

    2018-03-01

    The mathematical apparatus of dynamic Bayesian networks is an effective and technically proven tool that can be used to model complex stochastic dynamic processes. According to the results of the research, mathematical models and methods of dynamic Bayesian networks provide a high coverage of stochastic tasks associated with error testing in multiuser software products operated in a dynamically changing environment. Formalized representation of the discrete test process as a dynamic Bayesian model allows us to organize the logical connection between individual test assets for multiple time slices. This approach gives an opportunity to present testing as a discrete process with set structural components responsible for the generation of test assets. Dynamic Bayesian network-based models allow us to combine in one management area individual units and testing components with different functionalities and a direct influence on each other in the process of comprehensive testing of various groups of computer bugs. The application of the proposed models provides an opportunity to use a consistent approach to formalize test principles and procedures, methods used to treat situational error signs, and methods used to produce analytical conclusions based on test results.

  17. Population Genetic Structure of the Tropical Two-Wing Flyingfish (Exocoetus volitans)

    PubMed Central

    Lewallen, Eric A.; Bohonak, Andrew J.; Bonin, Carolina A.; van Wijnen, Andre J.; Pitman, Robert L.; Lovejoy, Nathan R.

    2016-01-01

    Delineating populations of pantropical marine fish is a difficult process, due to widespread geographic ranges and complex life history traits in most species. Exocoetus volitans, a species of two-winged flyingfish, is a good model for understanding large-scale patterns of epipelagic fish population structure because it has a circumtropical geographic range and completes its entire life cycle in the epipelagic zone. Buoyant pelagic eggs should dictate high local dispersal capacity in this species, although a brief larval phase, small body size, and short lifespan may limit the dispersal of individuals over large spatial scales. Based on these biological features, we hypothesized that E. volitans would exhibit statistically and biologically significant population structure defined by recognized oceanographic barriers. We tested this hypothesis by analyzing cytochrome b mtDNA sequence data (1106 bps) from specimens collected in the Pacific, Atlantic and Indian oceans (n = 266). AMOVA, Bayesian, and coalescent analytical approaches were used to assess and interpret population-level genetic variability. A parsimony-based haplotype network did not reveal population subdivision among ocean basins, but AMOVA revealed limited, statistically significant population structure between the Pacific and Atlantic Oceans (ΦST = 0.035, p<0.001). A spatially-unbiased Bayesian approach identified two circumtropical population clusters north and south of the Equator (ΦST = 0.026, p<0.001), a previously unknown dispersal barrier for an epipelagic fish. Bayesian demographic modeling suggested the effective population size of this species increased by at least an order of magnitude ~150,000 years ago, to more than 1 billion individuals currently. Thus, high levels of genetic similarity observed in E. volitans can be explained by high rates of gene flow, a dramatic and recent population expansion, as well as extensive and consistent dispersal throughout the geographic range of the species. PMID:27736863

  18. Population Genetic Structure of the Tropical Two-Wing Flyingfish (Exocoetus volitans).

    PubMed

    Lewallen, Eric A; Bohonak, Andrew J; Bonin, Carolina A; van Wijnen, Andre J; Pitman, Robert L; Lovejoy, Nathan R

    2016-01-01

    Delineating populations of pantropical marine fish is a difficult process, due to widespread geographic ranges and complex life history traits in most species. Exocoetus volitans, a species of two-winged flyingfish, is a good model for understanding large-scale patterns of epipelagic fish population structure because it has a circumtropical geographic range and completes its entire life cycle in the epipelagic zone. Buoyant pelagic eggs should dictate high local dispersal capacity in this species, although a brief larval phase, small body size, and short lifespan may limit the dispersal of individuals over large spatial scales. Based on these biological features, we hypothesized that E. volitans would exhibit statistically and biologically significant population structure defined by recognized oceanographic barriers. We tested this hypothesis by analyzing cytochrome b mtDNA sequence data (1106 bps) from specimens collected in the Pacific, Atlantic and Indian oceans (n = 266). AMOVA, Bayesian, and coalescent analytical approaches were used to assess and interpret population-level genetic variability. A parsimony-based haplotype network did not reveal population subdivision among ocean basins, but AMOVA revealed limited, statistically significant population structure between the Pacific and Atlantic Oceans (ΦST = 0.035, p<0.001). A spatially-unbiased Bayesian approach identified two circumtropical population clusters north and south of the Equator (ΦST = 0.026, p<0.001), a previously unknown dispersal barrier for an epipelagic fish. Bayesian demographic modeling suggested the effective population size of this species increased by at least an order of magnitude ~150,000 years ago, to more than 1 billion individuals currently. Thus, high levels of genetic similarity observed in E. volitans can be explained by high rates of gene flow, a dramatic and recent population expansion, as well as extensive and consistent dispersal throughout the geographic range of the species.

  19. CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

    PubMed

    Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G

    2016-02-01

    Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  20. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  1. Construction of monitoring model and algorithm design on passenger security during shipping based on improved Bayesian network.

    PubMed

    Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng

    2014-01-01

    A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping.

  2. Construction of Monitoring Model and Algorithm Design on Passenger Security during Shipping Based on Improved Bayesian Network

    PubMed Central

    Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng

    2014-01-01

    A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping. PMID:25254227

  3. Using a web-based application to define the accuracy of diagnostic tests when the gold standard is imperfect.

    PubMed

    Lim, Cherry; Wannapinij, Prapass; White, Lisa; Day, Nicholas P J; Cooper, Ben S; Peacock, Sharon J; Limmathurotsakul, Direk

    2013-01-01

    Estimates of the sensitivity and specificity for new diagnostic tests based on evaluation against a known gold standard are imprecise when the accuracy of the gold standard is imperfect. Bayesian latent class models (LCMs) can be helpful under these circumstances, but the necessary analysis requires expertise in computational programming. Here, we describe open-access web-based applications that allow non-experts to apply Bayesian LCMs to their own data sets via a user-friendly interface. Applications for Bayesian LCMs were constructed on a web server using R and WinBUGS programs. The models provided (http://mice.tropmedres.ac) include two Bayesian LCMs: the two-tests in two-population model (Hui and Walter model) and the three-tests in one-population model (Walter and Irwig model). Both models are available with simplified and advanced interfaces. In the former, all settings for Bayesian statistics are fixed as defaults. Users input their data set into a table provided on the webpage. Disease prevalence and accuracy of diagnostic tests are then estimated using the Bayesian LCM, and provided on the web page within a few minutes. With the advanced interfaces, experienced researchers can modify all settings in the models as needed. These settings include correlation among diagnostic test results and prior distributions for all unknown parameters. The web pages provide worked examples with both models using the original data sets presented by Hui and Walter in 1980, and by Walter and Irwig in 1988. We also illustrate the utility of the advanced interface using the Walter and Irwig model on a data set from a recent melioidosis study. The results obtained from the web-based applications were comparable to those published previously. The newly developed web-based applications are open-access and provide an important new resource for researchers worldwide to evaluate new diagnostic tests.

  4. Social deprivation and population density are not associated with small area risk of amyotrophic lateral sclerosis.

    PubMed

    Rooney, James P K; Tobin, Katy; Crampsie, Arlene; Vajda, Alice; Heverin, Mark; McLaughlin, Russell; Staines, Anthony; Hardiman, Orla

    2015-10-01

    Evidence of an association between areal ALS risk and population density has been previously reported. We aim to examine ALS spatial incidence in Ireland using small areas, to compare this analysis with our previous analysis of larger areas and to examine the associations between population density, social deprivation and ALS incidence. Residential area social deprivation has not been previously investigated as a risk factor for ALS. Using the Irish ALS register, we included all cases of ALS diagnosed in Ireland from 1995-2013. 2006 census data was used to calculate age and sex standardised expected cases per small area. Social deprivation was assessed using the pobalHP deprivation index. Bayesian smoothing was used to calculate small area relative risk for ALS, whilst cluster analysis was performed using SaTScan. The effects of population density and social deprivation were tested in two ways: (1) as covariates in the Bayesian spatial model; (2) via post-Bayesian regression. 1701 cases were included. Bayesian smoothed maps of relative risk at small area resolution matched closely to our previous analysis at a larger area resolution. Cluster analysis identified two areas of significant low risk. These areas did not correlate with population density or social deprivation indices. Two areas showing low frequency of ALS have been identified in the Republic of Ireland. These areas do not correlate with population density or residential area social deprivation, indicating that other reasons, such as genetic admixture may account for the observed findings. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.

    PubMed

    Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

  6. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm

    PubMed Central

    Baig, Fahd; Little, Max A.

    2016-01-01

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525

  7. Estimation of parameter uncertainty for an activated sludge model using Bayesian inference: a comparison with the frequentist method.

    PubMed

    Zonta, Zivko J; Flotats, Xavier; Magrí, Albert

    2014-08-01

    The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.

  8. Understanding the Scalability of Bayesian Network Inference using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob

    2009-01-01

    Bayesian networks (BNs) are used to represent and efficiently compute with multi-variate probability distributions in a wide range of disciplines. One of the main approaches to perform computation in BNs is clique tree clustering and propagation. In this approach, BN computation consists of propagation in a clique tree compiled from a Bayesian network. There is a lack of understanding of how clique tree computation time, and BN computation time in more general, depends on variations in BN size and structure. On the one hand, complexity results tell us that many interesting BN queries are NP-hard or worse to answer, and it is not hard to find application BNs where the clique tree approach in practice cannot be used. On the other hand, it is well-known that tree-structured BNs can be used to answer probabilistic queries in polynomial time. In this article, we develop an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN's non-root nodes to the number of root nodes, or (ii) the expected number of moral edges in their moral graphs. Our approach is based on combining analytical and experimental results. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for each set. For the special case of bipartite BNs, we consequently have two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, we systematically increase the degree of the root nodes in bipartite Bayesian networks, and find that root clique growth is well-approximated by Gompertz growth curves. It is believed that this research improves the understanding of the scaling behavior of clique tree clustering, provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms, and presents an aid for analytical trade-off studies of clique tree clustering using growth curves.

  9. Bayesian networks for maritime traffic accident prevention: benefits and challenges.

    PubMed

    Hänninen, Maria

    2014-12-01

    Bayesian networks are quantitative modeling tools whose applications to the maritime traffic safety context are becoming more popular. This paper discusses the utilization of Bayesian networks in maritime safety modeling. Based on literature and the author's own experiences, the paper studies what Bayesian networks can offer to maritime accident prevention and safety modeling and discusses a few challenges in their application to this context. It is argued that the capability of representing rather complex, not necessarily causal but uncertain relationships makes Bayesian networks an attractive modeling tool for the maritime safety and accidents. Furthermore, as the maritime accident and safety data is still rather scarce and has some quality problems, the possibility to combine data with expert knowledge and the easy way of updating the model after acquiring more evidence further enhance their feasibility. However, eliciting the probabilities from the maritime experts might be challenging and the model validation can be tricky. It is concluded that with the utilization of several data sources, Bayesian updating, dynamic modeling, and hidden nodes for latent variables, Bayesian networks are rather well-suited tools for the maritime safety management and decision-making. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Bayesian estimation of differential transcript usage from RNA-seq data.

    PubMed

    Papastamoulis, Panagiotis; Rattray, Magnus

    2017-11-27

    Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

  11. Spatial analysis of county-based gonorrhoea incidence in mainland China, from 2004 to 2009.

    PubMed

    Yin, Fei; Feng, Zijian; Li, Xiaosong

    2012-07-01

    Gonorrhoea is one of the most common sexually transmissible infections in mainland China. Effective spatial monitoring of gonorrhoea incidence is important for successful implementation of control and prevention programs. The county-level gonorrhoea incidence rates for all of mainland China was monitored through examining spatial patterns. County-level data on gonorrhoea cases between 2004 and 2009 were obtained from the China Information System for Disease Control and Prevention. Bayesian smoothing and exploratory spatial data analysis (ESDA) methods were used to characterise the spatial distribution pattern of gonorrhoea cases. During the 6-year study period, the average annual gonorrhoea incidence was 12.41 cases per 100000 people. Using empirical Bayes smoothed rates, the local Moran test identified one significant single-centre cluster and two significant multi-centre clusters of high gonorrhoea risk (all P-values <0.01). Bayesian smoothing and ESDA methods can assist public health officials in using gonorrhoea surveillance data to identify high risk areas. Allocating more resources to such areas could effectively reduce gonorrhoea incidence.

  12. Matched Filter Stochastic Background Characterization for Hyperspectral Target Detection

    DTIC Science & Technology

    2005-09-30

    and Pre- Clustering MVN Test.....................126 4.2.3 Pre- Clustering Detection Results.................................................130...4.2.4 Pre- Clustering Target Influence..................................................134 4.2.5 Statistical Distance Exclusion and Low Contrast...al, 2001] Figure 2.7 ROC Curve Comparison of RX, K-Means, and Bayesian Pre- Clustering Applied to Anomaly Detection [Ashton, 1998] Figure 2.8 ROC

  13. Semisupervised learning using Bayesian interpretation: application to LS-SVM.

    PubMed

    Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain

    2011-04-01

    Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.

  14. Geographic Variation of Amyotrophic Lateral Sclerosis Incidence in New Jersey, 2009–2011

    PubMed Central

    Henry, Kevin A.; Fagliano, Jerald; Jordan, Heather M.; Rechtman, Lindsay; Kaye, Wendy E.

    2015-01-01

    Few analyses in the United States have examined geographic variation and socioeconomic disparities in amyotrophic lateral sclerosis (ALS) incidence, because of lack of population-based incidence data. In this analysis, we used population-based ALS data to identify whether ALS incidence clusters geographically and to determine whether ALS risk varies by area-based socioeconomic status (SES). This study included 493 incident ALS cases diagnosed (via El Escorial criteria) in New Jersey between 2009 and 2011. Geographic variation and clustering of ALS incidence was assessed using a spatial scan statistic and Bayesian geoadditive models. Poisson regression was used to estimate the associations between ALS risk and SES based on census-tract median income while controlling for age, sex, and race. ALS incidence varied across and within counties, but there were no statistically significant geographic clusters. SES was associated with ALS incidence. After adjustment for age, sex, and race, the relative risk of ALS was significantly higher (relative risk (RR) = 1.37, 95% confidence interval (CI): 1.02, 1.82) in the highest income quartile than in the lowest. The relative risk of ALS was significantly lower among blacks (RR = 0.57, 95% CI: 0.39, 0.83) and Asians (RR = 0.63, 95% CI: 0.41, 0.97) than among whites. Our findings suggest that ALS incidence in New Jersey appears to be associated with SES and race. PMID:26041711

  15. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  16. Bayesian Image Segmentations by Potts Prior and Loopy Belief Propagation

    NASA Astrophysics Data System (ADS)

    Tanaka, Kazuyuki; Kataoka, Shun; Yasuda, Muneki; Waizumi, Yuji; Hsu, Chiou-Ting

    2014-12-01

    This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in loopy belief propagations for Potts models influence our hyperparameter estimation procedures.

  17. Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model

    PubMed Central

    Bitzer, Sebastian; Park, Hame; Blankenburg, Felix; Kiebel, Stefan J.

    2014-01-01

    Behavioral data obtained with perceptual decision making experiments are typically analyzed with the drift-diffusion model. This parsimonious model accumulates noisy pieces of evidence toward a decision bound to explain the accuracy and reaction times of subjects. Recently, Bayesian models have been proposed to explain how the brain extracts information from noisy input as typically presented in perceptual decision making tasks. It has long been known that the drift-diffusion model is tightly linked with such functional Bayesian models but the precise relationship of the two mechanisms was never made explicit. Using a Bayesian model, we derived the equations which relate parameter values between these models. In practice we show that this equivalence is useful when fitting multi-subject data. We further show that the Bayesian model suggests different decision variables which all predict equal responses and discuss how these may be discriminated based on neural correlates of accumulated evidence. In addition, we discuss extensions to the Bayesian model which would be difficult to derive for the drift-diffusion model. We suggest that these and other extensions may be highly useful for deriving new experiments which test novel hypotheses. PMID:24616689

  18. Strategies to reduce the complexity of hydrologic data assimilation for high-dimensional models

    NASA Astrophysics Data System (ADS)

    Hernandez, F.; Liang, X.

    2017-12-01

    Probabilistic forecasts in the geosciences offer invaluable information by allowing to estimate the uncertainty of predicted conditions (including threats like floods and droughts). However, while forecast systems based on modern data assimilation algorithms are capable of producing multi-variate probability distributions of future conditions, the computational resources required to fully characterize the dependencies between the model's state variables render their applicability impractical for high-resolution cases. This occurs because of the quadratic space complexity of storing the covariance matrices that encode these dependencies and the cubic time complexity of performing inference operations with them. In this work we introduce two complementary strategies to reduce the size of the covariance matrices that are at the heart of Bayesian assimilation methods—like some variants of (ensemble) Kalman filters and of particle filters—and variational methods. The first strategy involves the optimized grouping of state variables by clustering individual cells of the model into "super-cells." A dynamic fuzzy clustering approach is used to take into account the states (e.g., soil moisture) and forcings (e.g., precipitation) of each cell at each time step. The second strategy consists in finding a compressed representation of the covariance matrix that still encodes the most relevant information but that can be more efficiently stored and processed. A learning and a belief-propagation inference algorithm are developed to take advantage of this modified low-rank representation. The two proposed strategies are incorporated into OPTIMISTS, a state-of-the-art hybrid Bayesian/variational data assimilation algorithm, and comparative streamflow forecasting tests are performed using two watersheds modeled with the Distributed Hydrology Soil Vegetation Model (DHSVM). Contrasts are made between the efficiency gains and forecast accuracy losses of each strategy used in isolation, and of those achieved through their coupling. We expect these developments to help catalyze improvements in the predictive accuracy of large-scale forecasting operations by lowering the costs of deploying advanced data assimilation techniques.

  19. Nonparametric Bayesian Modeling for Automated Database Schema Matching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferragut, Erik M; Laska, Jason A

    2015-01-01

    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.

  20. Improved Accuracy Using Recursive Bayesian Estimation Based Language Model Fusion in ERP-Based BCI Typing Systems

    PubMed Central

    Orhan, U.; Erdogmus, D.; Roark, B.; Oken, B.; Purwar, S.; Hild, K. E.; Fowler, A.; Fried-Oken, M.

    2013-01-01

    RSVP Keyboard™ is an electroencephalography (EEG) based brain computer interface (BCI) typing system, designed as an assistive technology for the communication needs of people with locked-in syndrome (LIS). It relies on rapid serial visual presentation (RSVP) and does not require precise eye gaze control. Existing BCI typing systems which uses event related potentials (ERP) in EEG suffer from low accuracy due to low signal-to-noise ratio. Henceforth, RSVP Keyboard™ utilizes a context based decision making via incorporating a language model, to improve the accuracy of letter decisions. To further improve the contributions of the language model, we propose recursive Bayesian estimation, which relies on non-committing string decisions, and conduct an offline analysis, which compares it with the existing naïve Bayesian fusion approach. The results indicate the superiority of the recursive Bayesian fusion and in the next generation of RSVP Keyboard™ we plan to incorporate this new approach. PMID:23366432

  1. Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering

    NASA Astrophysics Data System (ADS)

    Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki

    2016-11-01

    We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.

  2. 2D Bayesian automated tilted-ring fitting of disc galaxies in large H I galaxy surveys: 2DBAT

    NASA Astrophysics Data System (ADS)

    Oh, Se-Heon; Staveley-Smith, Lister; Spekkens, Kristine; Kamphuis, Peter; Koribalski, Bärbel S.

    2018-01-01

    We present a novel algorithm based on a Bayesian method for 2D tilted-ring analysis of disc galaxy velocity fields. Compared to the conventional algorithms based on a chi-squared minimization procedure, this new Bayesian-based algorithm suffers less from local minima of the model parameters even with highly multimodal posterior distributions. Moreover, the Bayesian analysis, implemented via Markov Chain Monte Carlo sampling, only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature will be essential when performing kinematic analysis on the large number of resolved galaxies expected to be detected in neutral hydrogen (H I) surveys with the Square Kilometre Array and its pathfinders. The so-called 2D Bayesian Automated Tilted-ring fitter (2DBAT) implements Bayesian fits of 2D tilted-ring models in order to derive rotation curves of galaxies. We explore 2DBAT performance on (a) artificial H I data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies, and (b) Australia Telescope Compact Array H I data from the Local Volume H I Survey. We find that 2DBAT works best for well-resolved galaxies with intermediate inclinations (20° < i < 70°), complementing 3D techniques better suited to modelling inclined galaxies.

  3. Theory-based Bayesian Models of Inductive Inference

    DTIC Science & Technology

    2010-07-19

    Subjective randomness and natural scene statistics. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom/papers/randscenes.pdf Page 1...in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review . http://cocosci.berkeley.edu/tom

  4. Clinical Outcome Prediction in Aneurysmal Subarachnoid Hemorrhage Using Bayesian Neural Networks with Fuzzy Logic Inferences

    PubMed Central

    Lo, Benjamin W. Y.; Macdonald, R. Loch; Baker, Andrew; Levine, Mitchell A. H.

    2013-01-01

    Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH). Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients). Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs). Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique) denoted cut-off points for poor prognosis at greater than 2.5 clusters. Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication. PMID:23690884

  5. Bayesian Modeling of a Human MMORPG Player

    NASA Astrophysics Data System (ADS)

    Synnaeve, Gabriel; Bessière, Pierre

    2011-03-01

    This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.

  6. cosmoabc: Likelihood-free inference for cosmology

    NASA Astrophysics Data System (ADS)

    Ishida, Emille E. O.; Vitenti, Sandro D. P.; Penna-Lima, Mariana; Trindade, Arlindo M.; Cisewski, Jessi; M.; de Souza, Rafael; Cameron, Ewan; Busti, Vinicius C.

    2015-05-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogs. cosmoabc is a Python Approximate Bayesian Computation (ABC) sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code can be coupled to an external simulator to allow incorporation of arbitrary distance and prior functions. When coupled with the numcosmo library, it has been used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function.

  7. Posterior Predictive Bayesian Phylogenetic Model Selection

    PubMed Central

    Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn

    2014-01-01

    We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892

  8. A local approach for focussed Bayesian fusion

    NASA Astrophysics Data System (ADS)

    Sander, Jennifer; Heizmann, Michael; Goussev, Igor; Beyerer, Jürgen

    2009-04-01

    Local Bayesian fusion approaches aim to reduce high storage and computational costs of Bayesian fusion which is separated from fixed modeling assumptions. Using the small world formalism, we argue why this proceeding is conform with Bayesian theory. Then, we concentrate on the realization of local Bayesian fusion by focussing the fusion process solely on local regions that are task relevant with a high probability. The resulting local models correspond then to restricted versions of the original one. In a previous publication, we used bounds for the probability of misleading evidence to show the validity of the pre-evaluation of task specific knowledge and prior information which we perform to build local models. In this paper, we prove the validity of this proceeding using information theoretic arguments. For additional efficiency, local Bayesian fusion can be realized in a distributed manner. Here, several local Bayesian fusion tasks are evaluated and unified after the actual fusion process. For the practical realization of distributed local Bayesian fusion, software agents are predestinated. There is a natural analogy between the resulting agent based architecture and criminal investigations in real life. We show how this analogy can be used to improve the efficiency of distributed local Bayesian fusion additionally. Using a landscape model, we present an experimental study of distributed local Bayesian fusion in the field of reconnaissance, which highlights its high potential.

  9. Bayesian statistics as a new tool for spectral analysis - I. Application for the determination of basic parameters of massive stars

    NASA Astrophysics Data System (ADS)

    Mugnes, J.-M.; Robert, C.

    2015-11-01

    Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here, we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-Mégantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence velocities are often underestimated. We also confirm the trend that B stars in clusters are on average faster rotators than field B stars.

  10. An introduction to Bayesian statistics in health psychology.

    PubMed

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  11. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox.

    PubMed

    Basto, Mafalda P; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation.

  12. Genome-wide SNP discovery and population structure analysis in pepper (Capsicum annuum) using genotyping by sequencing.

    PubMed

    Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P

    2016-11-21

    Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.

  13. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox

    PubMed Central

    Basto, Mafalda P.; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W.; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation. PMID:26727497

  14. Bayesian estimation of seasonal course of canopy leaf area index from hyperspectral satellite data

    NASA Astrophysics Data System (ADS)

    Varvia, Petri; Rautiainen, Miina; Seppänen, Aku

    2018-03-01

    In this paper, Bayesian inversion of a physically-based forest reflectance model is investigated to estimate of boreal forest canopy leaf area index (LAI) from EO-1 Hyperion hyperspectral data. The data consist of multiple forest stands with different species compositions and structures, imaged in three phases of the growing season. The Bayesian estimates of canopy LAI are compared to reference estimates based on a spectral vegetation index. The forest reflectance model contains also other unknown variables in addition to LAI, for example leaf single scattering albedo and understory reflectance. In the Bayesian approach, these variables are estimated simultaneously with LAI. The feasibility and seasonal variation of these estimates is also examined. Credible intervals for the estimates are also calculated and evaluated. The results show that the Bayesian inversion approach is significantly better than using a comparable spectral vegetation index regression.

  15. Tree Biomass Estimation of Chinese fir (Cunninghamia lanceolata) Based on Bayesian Method

    PubMed Central

    Zhang, Jianguo

    2013-01-01

    Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important conifer species for timber production with huge distribution area in southern China. Accurate estimation of biomass is required for accounting and monitoring Chinese forest carbon stocking. In the study, allometric equation was used to analyze tree biomass of Chinese fir. The common methods for estimating allometric model have taken the classical approach based on the frequency interpretation of probability. However, many different biotic and abiotic factors introduce variability in Chinese fir biomass model, suggesting that parameters of biomass model are better represented by probability distributions rather than fixed values as classical method. To deal with the problem, Bayesian method was used for estimating Chinese fir biomass model. In the Bayesian framework, two priors were introduced: non-informative priors and informative priors. For informative priors, 32 biomass equations of Chinese fir were collected from published literature in the paper. The parameter distributions from published literature were regarded as prior distributions in Bayesian model for estimating Chinese fir biomass. Therefore, the Bayesian method with informative priors was better than non-informative priors and classical method, which provides a reasonable method for estimating Chinese fir biomass. PMID:24278198

  16. Tree biomass estimation of Chinese fir (Cunninghamia lanceolata) based on Bayesian method.

    PubMed

    Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo

    2013-01-01

    Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important conifer species for timber production with huge distribution area in southern China. Accurate estimation of biomass is required for accounting and monitoring Chinese forest carbon stocking. In the study, allometric equation W = a(D2H)b was used to analyze tree biomass of Chinese fir. The common methods for estimating allometric model have taken the classical approach based on the frequency interpretation of probability. However, many different biotic and abiotic factors introduce variability in Chinese fir biomass model, suggesting that parameters of biomass model are better represented by probability distributions rather than fixed values as classical method. To deal with the problem, Bayesian method was used for estimating Chinese fir biomass model. In the Bayesian framework, two priors were introduced: non-informative priors and informative priors. For informative priors, 32 biomass equations of Chinese fir were collected from published literature in the paper. The parameter distributions from published literature were regarded as prior distributions in Bayesian model for estimating Chinese fir biomass. Therefore, the Bayesian method with informative priors was better than non-informative priors and classical method, which provides a reasonable method for estimating Chinese fir biomass.

  17. Resolution analysis of marine seismic full waveform data by Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Ray, A.; Sekar, A.; Hoversten, G. M.; Albertin, U.

    2015-12-01

    The Bayesian posterior density function (PDF) of earth models that fit full waveform seismic data convey information on the uncertainty with which the elastic model parameters are resolved. In this work, we apply the trans-dimensional reversible jump Markov Chain Monte Carlo method (RJ-MCMC) for the 1D inversion of noisy synthetic full-waveform seismic data in the frequency-wavenumber domain. While seismic full waveform inversion (FWI) is a powerful method for characterizing subsurface elastic parameters, the uncertainty in the inverted models has remained poorly known, if at all and is highly initial model dependent. The Bayesian method we use is trans-dimensional in that the number of model layers is not fixed, and flexible such that the layer boundaries are free to move around. The resulting parameterization does not require regularization to stabilize the inversion. Depth resolution is traded off with the number of layers, providing an estimate of uncertainty in elastic parameters (compressional and shear velocities Vp and Vs as well as density) with depth. We find that in the absence of additional constraints, Bayesian inversion can result in a wide range of posterior PDFs on Vp, Vs and density. These PDFs range from being clustered around the true model, to those that contain little resolution of any particular features other than those in the near surface, depending on the particular data and target geometry. We present results for a suite of different frequencies and offset ranges, examining the differences in the posterior model densities thus derived. Though these results are for a 1D earth, they are applicable to areas with simple, layered geology and provide valuable insight into the resolving capabilities of FWI, as well as highlight the challenges in solving a highly non-linear problem. The RJ-MCMC method also presents a tantalizing possibility for extension to 2D and 3D Bayesian inversion of full waveform seismic data in the future, as it objectively tackles the problem of model selection (i.e., the number of layers or cells for parameterization), which could ease the computational burden of evaluating forward models with many parameters.

  18. Postglacial recolonization history of the European crabapple (Malus sylvestris Mill.), a wild contributor to the domesticated apple.

    PubMed

    Cornille, A; Giraud, T; Bellard, C; Tellier, A; Le Cam, B; Smulders, M J M; Kleinschmit, J; Roldan-Ruiz, I; Gladieux, P

    2013-04-01

    Understanding the way in which the climatic oscillations of the Quaternary Period have shaped the distribution and genetic structure of extant tree species provides insight into the processes driving species diversification, distribution and survival. Deciphering the genetic consequences of past climatic change is also critical for the conservation and sustainable management of forest and tree genetic resources, a timely endeavour as the Earth heads into a period of fast climate change. We used a combination of genetic data and ecological niche models to investigate the historical patterns of biogeographic range expansion of a wild fruit tree, the European crabapple (Malus sylvestris), a wild contributor to the domesticated apple. Both climatic predictions for the last glacial maximum and analyses of microsatellite variation indicated that M. sylvestris experienced range contraction and fragmentation. Bayesian clustering analyses revealed a clear pattern of genetic structure, with one genetic cluster spanning a large area in Western Europe and two other genetic clusters with a more limited distribution range in Eastern Europe, one around the Carpathian Mountains and the other restricted to the Balkan Peninsula. Approximate Bayesian computation appeared to be a powerful technique for inferring the history of these clusters, supporting a scenario of simultaneous differentiation of three separate glacial refugia. Admixture between these three populations was found in their suture zones. A weak isolation by distance pattern was detected within each population, indicating a high extent of historical gene flow for the European crabapple. © 2013 Blackwell Publishing Ltd.

  19. Probabilistic Inference: Task Dependency and Individual Differences of Probability Weighting Revealed by Hierarchical Bayesian Modeling

    PubMed Central

    Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno

    2016-01-01

    Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323

  20. Interactive Inverse Groundwater Modeling - Addressing User Fatigue

    NASA Astrophysics Data System (ADS)

    Singh, A.; Minsker, B. S.

    2006-12-01

    This paper builds on ongoing research on developing an interactive and multi-objective framework to solve the groundwater inverse problem. In this work we solve the classic groundwater inverse problem of estimating a spatially continuous conductivity field, given field measurements of hydraulic heads. The proposed framework is based on an interactive multi-objective genetic algorithm (IMOGA) that not only considers quantitative measures such as calibration error and degree of regularization, but also takes into account expert knowledge about the structure of the underlying conductivity field expressed as subjective rankings of potential conductivity fields by the expert. The IMOGA converges to the optimal Pareto front representing the best trade- off among the qualitative as well as quantitative objectives. However, since the IMOGA is a population-based iterative search it requires the user to evaluate hundreds of solutions. This leads to the problem of 'user fatigue'. We propose a two step methodology to combat user fatigue in such interactive systems. The first step is choosing only a few highly representative solutions to be shown to the expert for ranking. Spatial clustering is used to group the search space based on the similarity of the conductivity fields. Sampling is then carried out from different clusters to improve the diversity of solutions shown to the user. Once the expert has ranked representative solutions from each cluster a machine learning model is used to 'learn user preference' and extrapolate these for the solutions not ranked by the expert. We investigate different machine learning models such as Decision Trees, Bayesian learning model, and instance based weighting to model user preference. In addition, we also investigate ways to improve the performance of these models by providing information about the spatial structure of the conductivity fields (which is what the expert bases his or her rank on). Results are shown for each of these machine learning models and the advantages and disadvantages for each approach are discussed. These results indicate that using the proposed two-step methodology leads to significant reduction in user-fatigue without deteriorating the solution quality of the IMOGA.

  1. Efficient fuzzy Bayesian inference algorithms for incorporating expert knowledge in parameter estimation

    NASA Astrophysics Data System (ADS)

    Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad

    2016-05-01

    Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.

  2. Hierarchical structure of the Sicilian goats revealed by Bayesian analyses of microsatellite information.

    PubMed

    Siwek, M; Finocchiaro, R; Curik, I; Portolano, B

    2011-02-01

    Genetic structure and relationship amongst the main goat populations in Sicily (Girgentana, Derivata di Siria, Maltese and Messinese) were analysed using information from 19 microsatellite markers genotyped on 173 individuals. A posterior Bayesian approach implemented in the program STRUCTURE revealed a hierarchical structure with two clusters at the first level (Girgentana vs. Messinese, Derivata di Siria and Maltese), explaining 4.8% of variation (amovaФ(ST) estimate). Seven clusters nested within these first two clusters (further differentiations of Girgentana, Derivata di Siria and Maltese), explaining 8.5% of variation (amovaФ(SC) estimate). The analyses and methods applied in this study indicate their power to detect subtle population structure. © 2010 The Authors, Animal Genetics © 2010 Stichting International Foundation for Animal Genetics.

  3. Evaluation of calibration efficacy under different levels of uncertainty

    DOE PAGES

    Heo, Yeonsook; Graziano, Diane J.; Guzowski, Leah; ...

    2014-06-10

    This study examines how calibration performs under different levels of uncertainty in model input data. It specifically assesses the efficacy of Bayesian calibration to enhance the reliability of EnergyPlus model predictions. A Bayesian approach can be used to update uncertain values of parameters, given measured energy-use data, and to quantify the associated uncertainty.We assess the efficacy of Bayesian calibration under a controlled virtual-reality setup, which enables rigorous validation of the accuracy of calibration results in terms of both calibrated parameter values and model predictions. Case studies demonstrate the performance of Bayesian calibration of base models developed from audit data withmore » differing levels of detail in building design, usage, and operation.« less

  4. Predicting Flavonoid UGT Regioselectivity

    PubMed Central

    Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

    2011-01-01

    Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849

  5. Efficient Probabilistic Diagnostics for Electrical Power Systems

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.; Chavira, Mark; Cascio, Keith; Poll, Scott; Darwiche, Adnan; Uckun, Serdar

    2008-01-01

    We consider in this work the probabilistic approach to model-based diagnosis when applied to electrical power systems (EPSs). Our probabilistic approach is formally well-founded, as it based on Bayesian networks and arithmetic circuits. We investigate the diagnostic task known as fault isolation, and pay special attention to meeting two of the main challenges . model development and real-time reasoning . often associated with real-world application of model-based diagnosis technologies. To address the challenge of model development, we develop a systematic approach to representing electrical power systems as Bayesian networks, supported by an easy-to-use speci.cation language. To address the real-time reasoning challenge, we compile Bayesian networks into arithmetic circuits. Arithmetic circuit evaluation supports real-time diagnosis by being predictable and fast. In essence, we introduce a high-level EPS speci.cation language from which Bayesian networks that can diagnose multiple simultaneous failures are auto-generated, and we illustrate the feasibility of using arithmetic circuits, compiled from Bayesian networks, for real-time diagnosis on real-world EPSs of interest to NASA. The experimental system is a real-world EPS, namely the Advanced Diagnostic and Prognostic Testbed (ADAPT) located at the NASA Ames Research Center. In experiments with the ADAPT Bayesian network, which currently contains 503 discrete nodes and 579 edges, we .nd high diagnostic accuracy in scenarios where one to three faults, both in components and sensors, were inserted. The time taken to compute the most probable explanation using arithmetic circuits has a small mean of 0.2625 milliseconds and standard deviation of 0.2028 milliseconds. In experiments with data from ADAPT we also show that arithmetic circuit evaluation substantially outperforms joint tree propagation and variable elimination, two alternative algorithms for diagnosis using Bayesian network inference.

  6. Bayesian data analysis in population ecology: motivations, methods, and benefits

    USGS Publications Warehouse

    Dorazio, Robert

    2016-01-01

    During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.

  7. Assessing an ensemble docking-based virtual screening strategy for kinase targets by considering protein flexibility.

    PubMed

    Tian, Sheng; Sun, Huiyong; Pan, Peichen; Li, Dan; Zhen, Xuechu; Li, Youyong; Hou, Tingjun

    2014-10-27

    In this study, to accommodate receptor flexibility, based on multiple receptor conformations, a novel ensemble docking protocol was developed by using the naïve Bayesian classification technique, and it was evaluated in terms of the prediction accuracy of docking-based virtual screening (VS) of three important targets in the kinase family: ALK, CDK2, and VEGFR2. First, for each target, the representative crystal structures were selected by structural clustering, and the capability of molecular docking based on each representative structure to discriminate inhibitors from non-inhibitors was examined. Then, for each target, 50 ns molecular dynamics (MD) simulations were carried out to generate an ensemble of the conformations, and multiple representative structures/snapshots were extracted from each MD trajectory by structural clustering. On average, the representative crystal structures outperform the representative structures extracted from MD simulations in terms of the capabilities to separate inhibitors from non-inhibitors. Finally, by using the naïve Bayesian classification technique, an integrated VS strategy was developed to combine the prediction results of molecular docking based on different representative conformations chosen from crystal structures and MD trajectories. It was encouraging to observe that the integrated VS strategy yields better performance than the docking-based VS based on any single rigid conformation. This novel protocol may provide an improvement over existing strategies to search for more diverse and promising active compounds for a target of interest.

  8. Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.

    2010-01-01

    One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.

  9. Calibrating Bayesian Network Representations of Social-Behavioral Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitney, Paul D.; Walsh, Stephen J.

    2010-04-08

    While human behavior has long been studied, recent and ongoing advances in computational modeling present opportunities for recasting research outcomes in human behavior. In this paper we describe how Bayesian networks can represent outcomes of human behavior research. We demonstrate a Bayesian network that represents political radicalization research – and show a corresponding visual representation of aspects of this research outcome. Since Bayesian networks can be quantitatively compared with external observations, the representation can also be used for empirical assessments of the research which the network summarizes. For a political radicalization model based on published research, we show this empiricalmore » comparison with data taken from the Minorities at Risk Organizational Behaviors database.« less

  10. Assessment of parametric uncertainty for groundwater reactive transport modeling,

    USGS Publications Warehouse

    Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun

    2014-01-01

    The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.

  11. Hierarchical Bayesian Spatio–Temporal Analysis of Climatic and Socio–Economic Determinants of Rocky Mountain Spotted Fever

    PubMed Central

    Raghavan, Ram K.; Goodin, Douglas G.; Neises, Daniel; Anderson, Gary A.; Ganta, Roman R.

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio–economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio–temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio–economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main–effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate–change impacts on tick–borne diseases are discussed. PMID:26942604

  12. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    PubMed

    Raghavan, Ram K; Goodin, Douglas G; Neises, Daniel; Anderson, Gary A; Ganta, Roman R

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed.

  13. Using Bayesian Networks for Candidate Generation in Consistency-based Diagnosis

    NASA Technical Reports Server (NTRS)

    Narasimhan, Sriram; Mengshoel, Ole

    2008-01-01

    Consistency-based diagnosis relies heavily on the assumption that discrepancies between model predictions and sensor observations can be detected accurately. When sources of uncertainty like sensor noise and model abstraction exist robust schemes have to be designed to make a binary decision on whether predictions are consistent with observations. This risks the occurrence of false alarms and missed alarms when an erroneous decision is made. Moreover when multiple sensors (with differing sensing properties) are available the degree of match between predictions and observations can be used to guide the search for fault candidates. In this paper we propose a novel approach to handle this problem using Bayesian networks. In the consistency- based diagnosis formulation, automatically generated Bayesian networks are used to encode a probabilistic measure of fit between predictions and observations. A Bayesian network inference algorithm is used to compute most probable fault candidates.

  14. Analysis of capture-recapture models with individual covariates using data augmentation

    USGS Publications Warehouse

    Royle, J. Andrew

    2009-01-01

    I consider the analysis of capture-recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability.

  15. Basics of Bayesian methods.

    PubMed

    Ghosh, Sujit K

    2010-01-01

    Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.

  16. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  17. A Bayesian Multi-Level Factor Analytic Model of Consumer Price Sensitivities across Categories

    ERIC Educational Resources Information Center

    Duvvuri, Sri Devi; Gruca, Thomas S.

    2010-01-01

    Identifying price sensitive consumers is an important problem in marketing. We develop a Bayesian multi-level factor analytic model of the covariation among household-level price sensitivities across product categories that are substitutes. Based on a multivariate probit model of category incidence, this framework also allows the researcher to…

  18. Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models

    ERIC Educational Resources Information Center

    Price, Larry R.

    2012-01-01

    The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…

  19. Built environment and Property Crime in Seattle, 1998-2000: A Bayesian Analysis.

    PubMed

    Matthews, Stephen A; Yang, Tse-Chuan; Hayslett-McCall, Karen L; Ruback, R Barry

    2010-06-01

    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998-2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary.

  20. Built environment and Property Crime in Seattle, 1998–2000: A Bayesian Analysis

    PubMed Central

    Matthews, Stephen A.; Yang, Tse-chuan; Hayslett-McCall, Karen L.; Ruback, R. Barry

    2014-01-01

    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998–2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary. PMID:24737924

  1. A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging

    PubMed Central

    Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu

    2017-01-01

    This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583

  2. Incorporating approximation error in surrogate based Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Zeng, L.; Li, W.; Wu, L.

    2015-12-01

    There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.

  3. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  4. Appraisal of jump distributions in ensemble-based sampling algorithms

    NASA Astrophysics Data System (ADS)

    Dejanic, Sanda; Scheidegger, Andreas; Rieckermann, Jörg; Albert, Carlo

    2017-04-01

    Sampling Bayesian posteriors of model parameters is often required for making model-based probabilistic predictions. For complex environmental models, standard Monte Carlo Markov Chain (MCMC) methods are often infeasible because they require too many sequential model runs. Therefore, we focused on ensemble methods that use many Markov chains in parallel, since they can be run on modern cluster architectures. Little is known about how to choose the best performing sampler, for a given application. A poor choice can lead to an inappropriate representation of posterior knowledge. We assessed two different jump moves, the stretch and the differential evolution move, underlying, respectively, the software packages EMCEE and DREAM, which are popular in different scientific communities. For the assessment, we used analytical posteriors with features as they often occur in real posteriors, namely high dimensionality, strong non-linear correlations or multimodality. For posteriors with non-linear features, standard convergence diagnostics based on sample means can be insufficient. Therefore, we resorted to an entropy-based convergence measure. We assessed the samplers by means of their convergence speed, robustness and effective sample sizes. For posteriors with strongly non-linear features, we found that the stretch move outperforms the differential evolution move, w.r.t. all three aspects.

  5. Additive Genetic Variability and the Bayesian Alphabet

    PubMed Central

    Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan

    2009-01-01

    The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397

  6. Protein construct storage: Bayesian variable selection and prediction with mixtures.

    PubMed

    Clyde, M A; Parmigiani, G

    1998-07-01

    Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.

  7. Bayesian estimation inherent in a Mexican-hat-type neural network

    NASA Astrophysics Data System (ADS)

    Takiyama, Ken

    2016-05-01

    Brain functions, such as perception, motor control and learning, and decision making, have been explained based on a Bayesian framework, i.e., to decrease the effects of noise inherent in the human nervous system or external environment, our brain integrates sensory and a priori information in a Bayesian optimal manner. However, it remains unclear how Bayesian computations are implemented in the brain. Herein, I address this issue by analyzing a Mexican-hat-type neural network, which was used as a model of the visual cortex, motor cortex, and prefrontal cortex. I analytically demonstrate that the dynamics of an order parameter in the model corresponds exactly to a variational inference of a linear Gaussian state-space model, a Bayesian estimation, when the strength of recurrent synaptic connectivity is appropriately stronger than that of an external stimulus, a plausible condition in the brain. This exact correspondence can reveal the relationship between the parameters in the Bayesian estimation and those in the neural network, providing insight for understanding brain functions.

  8. Socio-ecological factors and hand, foot and mouth disease in dry climate regions: a Bayesian spatial approach in Gansu, China

    NASA Astrophysics Data System (ADS)

    Gou, Faxiang; Liu, Xinfeng; Ren, Xiaowei; Liu, Dongpeng; Liu, Haixia; Wei, Kongfu; Yang, Xiaoting; Cheng, Yao; Zheng, Yunhe; Jiang, Xiaojuan; Li, Juansheng; Meng, Lei; Hu, Wenbiao

    2017-01-01

    The influence of socio-ecological factors on hand, foot and mouth disease (HFMD) were explored in this study using Bayesian spatial modeling and spatial patterns identified in dry regions of Gansu, China. Notified HFMD cases and socio-ecological data were obtained from the China Information System for Disease Control and Prevention, Gansu Yearbook and Gansu Meteorological Bureau. A Bayesian spatial conditional autoregressive model was used to quantify the effects of socio-ecological factors on the HFMD and explore spatial patterns, with the consideration of its socio-ecological effects. Our non-spatial model suggests temperature (relative risk (RR) 1.15, 95 % CI 1.01-1.31), GDP per capita (RR 1.19, 95 % CI 1.01-1.39) and population density (RR 1.98, 95 % CI 1.19-3.17) to have a significant effect on HFMD transmission. However, after controlling for spatial random effects, only temperature (RR 1.25, 95 % CI 1.04-1.53) showed significant association with HFMD. The spatial model demonstrates temperature to play a major role in the transmission of HFMD in dry regions. Estimated residual variation after taking into account the socio-ecological variables indicated that high incidences of HFMD were mainly clustered in the northwest of Gansu. And, spatial structure showed a unique distribution after taking account of socio-ecological effects.

  9. MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control

    NASA Astrophysics Data System (ADS)

    Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming

    2017-09-01

    Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.

  10. Genetic variation and structure in remnant population of critically endangered Melicope zahlbruckneri

    USGS Publications Warehouse

    Raji, J. A.; Atkinson, Carter T.

    2016-01-01

    The distribution and amount of genetic variation within and between populations of plant species are important for their adaptability to future habitat changes and also critical for their restoration and overall management. This study was initiated to assess the genetic status of the remnant population of Melicope zahlbruckneri–a critically endangered species in Hawaii, and determine the extent of genetic variation and diversity in order to propose valuable conservation approaches. Estimated genetic structure of individuals based on molecular marker allele frequencies identified genetic groups with low overall differentiation but identified the most genetically diverse individuals within the population. Analysis of Amplified Fragment Length Polymorphic (AFLP) marker loci in the population based on Bayesian model and multivariate statistics classified the population into four subgroups. We inferred a mixed species population structure based on Bayesian clustering and frequency of unique alleles. The percentage of Polymorphic Fragment (PPF) ranged from 18.8 to 64.6% for all marker loci with an average of 54.9% within the population. Inclusion of all surviving M. zahlbruckneri trees in future restorative planting at new sites are suggested, and approaches for longer term maintenance of genetic variability are discussed. To our knowledge, this study represents the first report of molecular genetic analysis of the remaining population of M. zahlbruckneri and also illustrates the importance of genetic variability for conservation of a small endangered population.

  11. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  12. An evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting.

    PubMed

    Hippert, Henrique S; Taylor, James W

    2010-04-01

    Artificial neural networks have frequently been proposed for electricity load forecasting because of their capabilities for the nonlinear modelling of large multivariate data sets. Modelling with neural networks is not an easy task though; two of the main challenges are defining the appropriate level of model complexity, and choosing the input variables. This paper evaluates techniques for automatic neural network modelling within a Bayesian framework, as applied to six samples containing daily load and weather data for four different countries. We analyse input selection as carried out by the Bayesian 'automatic relevance determination', and the usefulness of the Bayesian 'evidence' for the selection of the best structure (in terms of number of neurones), as compared to methods based on cross-validation. Copyright 2009 Elsevier Ltd. All rights reserved.

  13. Three sympatric clusters of the malaria vector Anopheles culicifacies E (Diptera: Culicidae) detected in Sri Lanka.

    PubMed

    Harischandra, Iresha Nilmini; Dassanayake, Ranil Samantha; De Silva, Bambaranda Gammacharige Don Nissanka Kolitha

    2016-01-04

    The disease re-emergence threat from the major malaria vector in Sri Lanka, Anopheles culicifacies, is currently increasing. To predict malaria vector dynamics, knowledge of population genetics and gene flow is required, but this information is unavailable for Sri Lanka. This study was carried out to determine the population structure of An. culicifacies E in Sri Lanka. Eight microsatellite markers were used to examine An. culicifacies E collected from six sites in Sri Lanka during 2010-2012. Standard population genetic tests and analyses, genetic differentiation, Hardy-Weinberg equilibrium, linkage disequilibrium, Bayesian cluster analysis, AMOVA, SAMOVA and isolation-by-distance were conducted using five polymorphic loci. Five microsatellite loci were highly polymorphic with high allelic richness. Hardy-Weinberg Equilibrium (HWE) was significantly rejected for four loci with positive F(IS) values in the pooled population (p < 0.0100). Three loci showed high deviations in all sites except Kataragama, which was in agreement with HWE for all loci except one locus (p < 0.0016). Observed heterozygosity was less than the expected values for all sites except Kataragama, where reported negative F(IS) values indicated a heterozygosity excess. Genetic differentiation was observed for all sampling site pairs and was not supported by the isolation by distance model. Bayesian clustering analysis identified the presence of three sympatric clusters (gene pools) in the studied population. Significant genetic differentiation was detected in cluster pairs with low gene flow and isolation by distance was not detected between clusters. Furthermore, the results suggested the presence of a barrier to gene flow that divided the populations into two parts with the central hill region of Sri Lanka as the dividing line. Three sympatric clusters were detected among An. culicifacies E specimens isolated in Sri Lanka. There was no effect of geographic distance on genetic differentiation and the central mountain ranges in Sri Lanka appeared to be a barrier to gene flow.

  14. A Hierarchical Bayesian Model for Calibrating Estimates of Species Divergence Times

    PubMed Central

    Heath, Tracy A.

    2012-01-01

    In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account. PMID:22334343

  15. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    NASA Astrophysics Data System (ADS)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sen, Oishik, E-mail: oishik-sen@uiowa.edu; Gaul, Nicholas J., E-mail: nicholas-gaul@ramdosolutions.com; Choi, K.K., E-mail: kyung-choi@uiowa.edu

    Macro-scale computations of shocked particulate flows require closure laws that model the exchange of momentum/energy between the fluid and particle phases. Closure laws are constructed in this work in the form of surrogate models derived from highly resolved mesoscale computations of shock-particle interactions. The mesoscale computations are performed to calculate the drag force on a cluster of particles for different values of Mach Number and particle volume fraction. Two Kriging-based methods, viz. the Dynamic Kriging Method (DKG) and the Modified Bayesian Kriging Method (MBKG) are evaluated for their ability to construct surrogate models with sparse data; i.e. using the leastmore » number of mesoscale simulations. It is shown that if the input data is noise-free, the DKG method converges monotonically; convergence is less robust in the presence of noise. The MBKG method converges monotonically even with noisy input data and is therefore more suitable for surrogate model construction from numerical experiments. This work is the first step towards a full multiscale modeling of interaction of shocked particle laden flows.« less

  17. Inference on cancer screening exam accuracy using population-level administrative data.

    PubMed

    Jiang, H; Brown, P E; Walter, S D

    2016-01-15

    This paper develops a model for cancer screening and cancer incidence data, accommodating the partially unobserved disease status, clustered data structures, general covariate effects, and dependence between exams. The true unobserved cancer and detection status of screening participants are treated as latent variables, and a Markov Chain Monte Carlo algorithm is used to estimate the Bayesian posterior distributions of the diagnostic error rates and disease prevalence. We show how the Bayesian approach can be used to draw inferences about screening exam properties and disease prevalence while allowing for the possibility of conditional dependence between two exams. The techniques are applied to the estimation of the diagnostic accuracy of mammography and clinical breast examination using data from the Ontario Breast Screening Program in Canada. Copyright © 2015 John Wiley & Sons, Ltd.

  18. A Bayesian approach to landscape ecological risk assessment applied to the upper Grande Ronde watershed, Oregon

    Treesearch

    Kimberley K. Ayre; Wayne G. Landis

    2012-01-01

    We present a Bayesian network model based on the ecological risk assessment framework to evaluate potential impacts to habitats and resources resulting from wildfire, grazing, forest management activities, and insect outbreaks in a forested landscape in northeastern Oregon. The Bayesian network structure consisted of three tiers of nodes: landscape disturbances,...

  19. Spatial analysis of macro-level bicycle crashes using the class of conditional autoregressive models.

    PubMed

    Saha, Dibakar; Alluri, Priyanka; Gan, Albert; Wu, Wanyang

    2018-02-21

    The objective of this study was to investigate the relationship between bicycle crash frequency and their contributing factors at the census block group level in Florida, USA. Crashes aggregated over the census block groups tend to be clustered (i.e., spatially dependent) rather than randomly distributed. To account for the effect of spatial dependence across the census block groups, the class of conditional autoregressive (CAR) models were employed within the hierarchical Bayesian framework. Based on four years (2011-2014) of crash data, total and fatal-and-severe injury bicycle crash frequencies were modeled as a function of a large number of variables representing demographic and socio-economic characteristics, roadway infrastructure and traffic characteristics, and bicycle activity characteristics. This study explored and compared the performance of two CAR models, namely the Besag's model and the Leroux's model, in crash prediction. The Besag's models, which differ from the Leroux's models by the structure of how spatial autocorrelation are specified in the models, were found to fit the data better. A 95% Bayesian credible interval was selected to identify the variables that had credible impact on bicycle crashes. A total of 21 variables were found to be credible in the total crash model, while 18 variables were found to be credible in the fatal-and-severe injury crash model. Population, daily vehicle miles traveled, age cohorts, household automobile ownership, density of urban roads by functional class, bicycle trip miles, and bicycle trip intensity had positive effects in both the total and fatal-and-severe crash models. Educational attainment variables, truck percentage, and density of rural roads by functional class were found to be negatively associated with both total and fatal-and-severe bicycle crash frequencies. Published by Elsevier Ltd.

  20. Deep phylogeographic divergence and cytonuclear discordance in the grasshopper Oedaleus decorus.

    PubMed

    Kindler, Eveline; Arlettaz, Raphaël; Heckel, Gerald

    2012-11-01

    The grasshopper Oedaleus decorus is a thermophilic insect with a large, mostly south-Palaearctic distribution range, stretching from the Mediterranean regions in Europe to Central-Asia and China. In this study, we analyzed the extent of phylogenetic divergence and the recent evolutionary history of the species based on 274 specimens from 26 localities across the distribution range in Europe. Phylogenetic relationships were determined using sequences of two mitochondrial loci (ctr, ND2) with neighbour-joining and Bayesian methods. Additionally, genetic differentiation was analyzed based on mitochondrial DNA and 11 microsatellite markers using F-statistics, model-free multivariate and model-based Bayesian clustering approaches. Phylogenetic analyses detected consistently two highly divergent, allopatrically distributed lineages within O. decorus. The divergence among these Western and Eastern lineages meeting in the region of the Alps was similar to the divergence of each lineage to the sister species O. asiaticus. Genetic differentiation for ctr was extremely high between Western and Eastern grasshopper populations (F(ct)=0.95). Microsatellite markers detected much lower but nevertheless very significant genetic structure among population samples. The nuclear data also demonstrated a case of cytonuclear discordance because the affiliation with mitochondrial lineages was incongruent in Northern Italy. Taken together these results provide evidence of an ancient separation within Oedaleus and either historical introgression of mtDNA among lineages and/or ongoing sex-specific gene flow in this grasshopper. Our study stresses the importance of multilocus approaches for unravelling the history and status of taxa of uncertain evolutionary divergence. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Hierarchical Bayesian Models of Subtask Learning

    ERIC Educational Resources Information Center

    Anglim, Jeromy; Wynton, Sarah K. A.

    2015-01-01

    The current study used Bayesian hierarchical methods to challenge and extend previous work on subtask learning consistency. A general model of individual-level subtask learning was proposed focusing on power and exponential functions with constraints to test for inconsistency. To study subtask learning, we developed a novel computer-based booking…

  2. CDMBE: A Case Description Model Based on Evidence

    PubMed Central

    Zhu, Jianlin; Yang, Xiaoping; Zhou, Jing

    2015-01-01

    By combining the advantages of argument map and Bayesian network, a case description model based on evidence (CDMBE), which is suitable to continental law system, is proposed to describe the criminal cases. The logic of the model adopts the credibility logical reason and gets evidence-based reasoning quantitatively based on evidences. In order to consist with practical inference rules, five types of relationship and a set of rules are defined to calculate the credibility of assumptions based on the credibility and supportability of the related evidences. Experiments show that the model can get users' ideas into a figure and the results calculated from CDMBE are in line with those from Bayesian model. PMID:26421006

  3. The Impact of Environment on the Stellar Mass–Halo Mass Relation

    NASA Astrophysics Data System (ADS)

    Golden-Marx, Jesse B.; Miller, Christopher J.

    2018-06-01

    A large variance exists in the amplitude of the stellar mass–halo mass (SMHM) relation for group- and cluster-size halos. Using a sample of 254 clusters, we show that the magnitude gap between the brightest central galaxy (BCG) and its second or fourth brightest neighbor accounts for a significant portion of this variance. We find that at fixed halo mass, galaxy clusters with a larger magnitude gap have a higher BCG stellar mass. This relationship is also observed in semi-analytic representations of low-redshift galaxy clusters in simulations. This SMHM–magnitude gap stratification likely results from BCG growth via hierarchical mergers and may link the assembly of the halo with the growth of the BCG. Using a Bayesian model, we quantify the importance of the magnitude gap in the SMHM relation using a multiplicative stretch factor, which we find to be significantly non-zero. The inclusion of the magnitude gap in the SMHM relation results in a large reduction in the inferred intrinsic scatter in the BCG stellar mass at fixed halo mass. We discuss the ramifications of this result in the context of galaxy formation models of centrals in group- and cluster-size halos.

  4. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

    PubMed

    Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

    2012-09-25

    Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.

  5. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

    PubMed Central

    2012-01-01

    Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363

  6. A Gaussian random field model for similarity-based smoothing in Bayesian disease mapping.

    PubMed

    Baptista, Helena; Mendes, Jorge M; MacNab, Ying C; Xavier, Miguel; Caldas-de-Almeida, José

    2016-08-01

    Conditionally specified Gaussian Markov random field (GMRF) models with adjacency-based neighbourhood weight matrix, commonly known as neighbourhood-based GMRF models, have been the mainstream approach to spatial smoothing in Bayesian disease mapping. In the present paper, we propose a conditionally specified Gaussian random field (GRF) model with a similarity-based non-spatial weight matrix to facilitate non-spatial smoothing in Bayesian disease mapping. The model, named similarity-based GRF, is motivated for modelling disease mapping data in situations where the underlying small area relative risks and the associated determinant factors do not vary systematically in space, and the similarity is defined by "similarity" with respect to the associated disease determinant factors. The neighbourhood-based GMRF and the similarity-based GRF are compared and accessed via a simulation study and by two case studies, using new data on alcohol abuse in Portugal collected by the World Mental Health Survey Initiative and the well-known lip cancer data in Scotland. In the presence of disease data with no evidence of positive spatial correlation, the simulation study showed a consistent gain in efficiency from the similarity-based GRF, compared with the adjacency-based GMRF with the determinant risk factors as covariate. This new approach broadens the scope of the existing conditional autocorrelation models. © The Author(s) 2016.

  7. Trans-dimensional and hierarchical Bayesian approaches toward rigorous estimation of seismic sources and structures in the Northeast Asia

    NASA Astrophysics Data System (ADS)

    Kim, Seongryong; Tkalčić, Hrvoje; Mustać, Marija; Rhie, Junkee; Ford, Sean

    2016-04-01

    A framework is presented within which we provide rigorous estimations for seismic sources and structures in the Northeast Asia. We use Bayesian inversion methods, which enable statistical estimations of models and their uncertainties based on data information. Ambiguities in error statistics and model parameterizations are addressed by hierarchical and trans-dimensional (trans-D) techniques, which can be inherently implemented in the Bayesian inversions. Hence reliable estimation of model parameters and their uncertainties is possible, thus avoiding arbitrary regularizations and parameterizations. Hierarchical and trans-D inversions are performed to develop a three-dimensional velocity model using ambient noise data. To further improve the model, we perform joint inversions with receiver function data using a newly developed Bayesian method. For the source estimation, a novel moment tensor inversion method is presented and applied to regional waveform data of the North Korean nuclear explosion tests. By the combination of new Bayesian techniques and the structural model, coupled with meaningful uncertainties related to each of the processes, more quantitative monitoring and discrimination of seismic events is possible.

  8. BANYAN. XI. The BANYAN Σ Multivariate Bayesian Algorithm to Identify Members of Young Associations with 150 pc

    NASA Astrophysics Data System (ADS)

    Gagné, Jonathan; Mamajek, Eric E.; Malo, Lison; Riedel, Adric; Rodriguez, David; Lafrenière, David; Faherty, Jacqueline K.; Roy-Loubier, Olivier; Pueyo, Laurent; Robin, Annie C.; Doyon, René

    2018-03-01

    BANYAN Σ is a new Bayesian algorithm to identify members of young stellar associations within 150 pc of the Sun. It includes 27 young associations with ages in the range ∼1–800 Myr, modeled with multivariate Gaussians in six-dimensional (6D) XYZUVW space. It is the first such multi-association classification tool to include the nearest sub-groups of the Sco-Cen OB star-forming region, the IC 2602, IC 2391, Pleiades and Platais 8 clusters, and the ρ Ophiuchi, Corona Australis, and Taurus star formation regions. A model of field stars is built from a mixture of multivariate Gaussians based on the Besançon Galactic model. The algorithm can derive membership probabilities for objects with only sky coordinates and proper motion, but can also include parallax and radial velocity measurements, as well as spectrophotometric distance constraints from sequences in color–magnitude or spectral type–magnitude diagrams. BANYAN Σ benefits from an analytical solution to the Bayesian marginalization integrals over unknown radial velocities and distances that makes it more accurate and significantly faster than its predecessor BANYAN II. A contamination versus hit rate analysis is presented and demonstrates that BANYAN Σ achieves a better classification performance than other moving group tools available in the literature, especially in terms of cross-contamination between young associations. An updated list of bona fide members in the 27 young associations, augmented by the Gaia-DR1 release, as well as all parameters for the 6D multivariate Gaussian models for each association and the Galactic field neighborhood within 300 pc are presented. This new tool will make it possible to analyze large data sets such as the upcoming Gaia-DR2 to identify new young stars. IDL and Python versions of BANYAN Σ are made available with this publication, and a more limited online web tool is available at http://www.exoplanetes.umontreal.ca/banyan/banyansigma.php.

  9. Investigating different approaches to develop informative priors in hierarchical Bayesian safety performance functions.

    PubMed

    Yu, Rongjie; Abdel-Aty, Mohamed

    2013-07-01

    The Bayesian inference method has been frequently adopted to develop safety performance functions. One advantage of the Bayesian inference is that prior information for the independent variables can be included in the inference procedures. However, there are few studies that discussed how to formulate informative priors for the independent variables and evaluated the effects of incorporating informative priors in developing safety performance functions. This paper addresses this deficiency by introducing four approaches of developing informative priors for the independent variables based on historical data and expert experience. Merits of these informative priors have been tested along with two types of Bayesian hierarchical models (Poisson-gamma and Poisson-lognormal models). Deviance information criterion (DIC), R-square values, and coefficients of variance for the estimations were utilized as evaluation measures to select the best model(s). Comparison across the models indicated that the Poisson-gamma model is superior with a better model fit and it is much more robust with the informative priors. Moreover, the two-stage Bayesian updating informative priors provided the best goodness-of-fit and coefficient estimation accuracies. Furthermore, informative priors for the inverse dispersion parameter have also been introduced and tested. Different types of informative priors' effects on the model estimations and goodness-of-fit have been compared and concluded. Finally, based on the results, recommendations for future research topics and study applications have been made. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Design space construction of multiple dose-strength tablets utilizing bayesian estimation based on one set of design-of-experiments.

    PubMed

    Maeda, Jin; Suzuki, Tatsuya; Takayama, Kozo

    2012-01-01

    Design spaces for multiple dose strengths of tablets were constructed using a Bayesian estimation method with one set of design of experiments (DoE) of only the highest dose-strength tablet. The lubricant blending process for theophylline tablets with dose strengths of 100, 50, and 25 mg is used as a model manufacturing process in order to construct design spaces. The DoE was conducted using various Froude numbers (X(1)) and blending times (X(2)) for theophylline 100-mg tablet. The response surfaces, design space, and their reliability of the compression rate of the powder mixture (Y(1)), tablet hardness (Y(2)), and dissolution rate (Y(3)) of the 100-mg tablet were calculated using multivariate spline interpolation, a bootstrap resampling technique, and self-organizing map clustering. Three experiments under an optimal condition and two experiments under other conditions were performed using 50- and 25-mg tablets, respectively. The response surfaces of the highest-strength tablet were corrected to those of the lower-strength tablets by Bayesian estimation using the manufacturing data of the lower-strength tablets. Experiments under three additional sets of conditions of lower-strength tablets showed that the corrected design space made it possible to predict the quality of lower-strength tablets more precisely than the design space of the highest-strength tablet. This approach is useful for constructing design spaces of tablets with multiple strengths.

  11. An adaptive Gaussian process-based method for efficient Bayesian experimental design in groundwater contaminant source identification problems: ADAPTIVE GAUSSIAN PROCESS-BASED INVERSION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao

    Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less

  12. Pan-genome and phylogeny of Bacillus cereus sensu lato.

    PubMed

    Bazinet, Adam L

    2017-08-02

    Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.

  13. A latent class distance association model for cross-classified data with a categorical response variable.

    PubMed

    Vera, José Fernando; de Rooij, Mark; Heiser, Willem J

    2014-11-01

    In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low-dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented. © 2014 The British Psychological Society.

  14. Mitochondrial DNA Reveals Genetic Structuring of Pinna nobilis across the Mediterranean Sea

    PubMed Central

    Sanna, Daria; Cossu, Piero; Dedola, Gian Luca; Scarpa, Fabio; Maltagliati, Ferruccio; Castelli, Alberto; Franzoi, Piero; Lai, Tiziana; Cristo, Benedetto; Curini-Galletti, Marco; Francalacci, Paolo; Casu, Marco

    2013-01-01

    Pinna nobilis is the largest endemic Mediterranean marine bivalve. During past centuries, various human activities have promoted the regression of its populations. As a consequence of stringent standards of protection, demographic expansions are currently reported in many sites. The aim of this study was to provide the first large broad-scale insight into the genetic variability of P. nobilis in the area that encompasses the western Mediterranean, Ionian Sea, and Adriatic Sea marine ecoregions. To accomplish this objective twenty-five populations from this area were surveyed using two mitochondrial DNA markers (COI and 16S). Our dataset was then merged with those obtained in other studies for the Aegean and Tunisian populations (eastern Mediterranean), and statistical analyses (Bayesian model-based clustering, median-joining network, AMOVA, mismatch distribution, Tajima’s and Fu’s neutrality tests and Bayesian skyline plots) were performed. The results revealed genetic divergence among three distinguishable areas: (1) western Mediterranean and Ionian Sea; (2) Adriatic Sea; and (3) Aegean Sea and Tunisian coastal areas. From a conservational point of view, populations from the three genetically divergent groups found may be considered as different management units. PMID:23840684

  15. Extracting recurrent scenarios from narrative texts using a Bayesian network: application to serious occupational accidents with movement disturbance.

    PubMed

    Abdat, F; Leclercq, S; Cuny, X; Tissot, C

    2014-09-01

    A probabilistic approach has been developed to extract recurrent serious Occupational Accident with Movement Disturbance (OAMD) scenarios from narrative texts within a prevention framework. Relevant data extracted from 143 accounts was initially coded as logical combinations of generic accident factors. A Bayesian Network (BN)-based model was then built for OAMDs using these data and expert knowledge. A data clustering process was subsequently performed to group the OAMDs into similar classes from generic factor occurrence and pattern standpoints. Finally, the Most Probable Explanation (MPE) was evaluated and identified as the associated recurrent scenario for each class. Using this approach, 8 scenarios were extracted to describe 143 OAMDs in the construction and metallurgy sectors. Their recurrent nature is discussed. Probable generic factor combinations provide a fair representation of particularly serious OAMDs, as described in narrative texts. This work represents a real contribution to raising company awareness of the variety of circumstances, in which these accidents occur, to progressing in the prevention of such accidents and to developing an analysis framework dedicated to this kind of accident. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Bayesian truncation errors in chiral effective field theory: model checking and accounting for correlations

    NASA Astrophysics Data System (ADS)

    Melendez, Jordan; Wesolowski, Sarah; Furnstahl, Dick

    2017-09-01

    Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. A Bayesian model yields posterior probability distribution functions for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. As a general example of a statistical approach to truncation errors, the model was applied to chiral EFT for neutron-proton scattering using various semi-local potentials of Epelbaum, Krebs, and Meißner (EKM). Here we discuss how our model can learn correlation information from the data and how to perform Bayesian model checking to validate that the EFT is working as advertised. Supported in part by NSF PHY-1614460 and DOE NUCLEI SciDAC DE-SC0008533.

  17. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  18. Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field

    NASA Astrophysics Data System (ADS)

    Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen

    2017-10-01

    Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.

  19. Habitat fragmentation in coastal southern California disrupts genetic connectivity in the cactus wren (Campylorhynchus brunneicapillus)

    USGS Publications Warehouse

    Barr, Kelly R.; Kus, Barbara E.; Preston, Kristine; Howell, Scarlett; Perkins, Emily; Vandergast, Amy

    2015-01-01

    Achieving long-term persistence of species in urbanized landscapes requires characterizing population genetic structure to understand and manage the effects of anthropogenic disturbance on connectivity. Urbanization over the past century in coastal southern California has caused both precipitous loss of coastal sage scrub habitat and declines in populations of the cactus wren (Campylorhynchus brunneicapillus). Using 22 microsatellite loci, we found that remnant cactus wren aggregations in coastal southern California comprised 20 populations based on strict exact tests for population differentiation, and 12 genetic clusters with hierarchical Bayesian clustering analyses. Genetic structure patterns largely mirrored underlying habitat availability, with cluster and population boundaries coinciding with fragmentation caused primarily by urbanization. Using a habitat model we developed, we detected stronger associations between habitat-based distances and genetic distances than Euclidean geographic distance. Within populations, we detected a positive association between available local habitat and allelic richness and a negative association with relatedness. Isolation-by-distance patterns varied over the study area, which we attribute to temporal differences in anthropogenic landscape development. We also found that genetic bottleneck signals were associated with wildfire frequency. These results indicate that habitat fragmentation and alterations have reduced genetic connectivity and diversity of cactus wren populations in coastal southern California. Management efforts focused on improving connectivity among remaining populations may help to ensure population persistence.

  20. A Dynamic Bayesian Network Based Structural Learning towards Automated Handwritten Digit Recognition

    NASA Astrophysics Data System (ADS)

    Pauplin, Olivier; Jiang, Jianmin

    Pattern recognition using Dynamic Bayesian Networks (DBNs) is currently a growing area of study. In this paper, we present DBN models trained for classification of handwritten digit characters. The structure of these models is partly inferred from the training data of each class of digit before performing parameter learning. Classification results are presented for the four described models.

  1. Technical note: Bayesian calibration of dynamic ruminant nutrition models.

    PubMed

    Reed, K F; Arhonditsis, G B; France, J; Kebreab, E

    2016-08-01

    Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  2. Bayesian inference of T Tauri star properties using multi-wavelength survey photometry

    NASA Astrophysics Data System (ADS)

    Barentsen, Geert; Vink, J. S.; Drew, J. E.; Sale, S. E.

    2013-03-01

    There are many pertinent open issues in the area of star and planet formation. Large statistical samples of young stars across star-forming regions are needed to trigger a breakthrough in our understanding, but most optical studies are based on a wide variety of spectrographs and analysis methods, which introduces large biases. Here we show how graphical Bayesian networks can be employed to construct a hierarchical probabilistic model which allows pre-main-sequence ages, masses, accretion rates and extinctions to be estimated using two widely available photometric survey data bases (Isaac Newton Telescope Photometric Hα Survey r'/Hα/i' and Two Micron All Sky Survey J-band magnitudes). Because our approach does not rely on spectroscopy, it can easily be applied to ho-mogeneously study the large number of clusters for which Gaia will yield membership lists. We explain how the analysis is carried out using the Markov chain Monte Carlo method and provide PYTHON source code. We then demonstrate its use on 587 known low-mass members of the star-forming region NGC 2264 (Cone Nebula), arriving at a median age of 3.0 Myr, an accretion fraction of 20 ± 2 per cent and a median accretion rate of 10-8.4 M⊙ yr-1. The Bayesian analysis formulated in this work delivers results which are in agreement with spectroscopic studies already in the literature, but achieves this with great efficiency by depending only on photometry. It is a significant step forward from previous photometric studies because the probabilistic approach ensures that nuisance parameters, such as extinction and distance, are fully included in the analysis with a clear picture on any degeneracies.

  3. Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2014-02-01

    Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.

  4. Bayesian outcome-based strategy classification.

    PubMed

    Lee, Michael D

    2016-03-01

    Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014) recently developed a method for making inferences about the decision processes people use in multi-attribute forced choice tasks. Their paper makes a number of worthwhile theoretical and methodological contributions. Theoretically, they provide an insightful psychological motivation for a probabilistic extension of the widely-used "weighted additive" (WADD) model, and show how this model, as well as other important models like "take-the-best" (TTB), can and should be expressed in terms of meaningful priors. Methodologically, they develop an inference approach based on the Minimum Description Length (MDL) principles that balances both the goodness-of-fit and complexity of the decision models they consider. This paper aims to preserve these useful contributions, but provide a complementary Bayesian approach with some theoretical and methodological advantages. We develop a simple graphical model, implemented in JAGS, that allows for fully Bayesian inferences about which models people use to make decisions. To demonstrate the Bayesian approach, we apply it to the models and data considered by Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014), showing how a prior predictive analysis of the models, and posterior inferences about which models people use and the parameter settings at which they use them, can contribute to our understanding of human decision making.

  5. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.

    PubMed

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-12-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

  6. CODEX weak lensing: concentration of galaxy clusters at z ~ 0.5

    DOE PAGES

    Cibirka, N.; Cypriano, E. S.; Brimioulle, F.; ...

    2017-03-04

    Here, we present a stacked weak-lensing analysis of 27 richness selected galaxy clusters at 0.40 ≤ z ≤ 0.62 in the COnstrain Dark Energy with X-ray galaxy clusters (CODEX) survey. The fields were observed in five bands with the Canada–France–Hawaii Telescope (CFHT). We measure the stacked surface mass density profile with a 14σ significance in the radial range 0.1 < RMpch -1 < 2.5. The profile is well described by the halo model, with the main halo term following a Navarro–Frenk–White profile (NFW) profile and including the off-centring effect. We select the background sample using a conservative colour–magnitude method to reduce the potential systematic errors and contamination by cluster member galaxies. We perform a Bayesian analysis for the stacked profile and constrain the best-fitting NFW parameters M 200c=6.6more » $$+1.0\\atop{-0.8}$$×10 14h -1 M⊙ and c 200c=3.7$$+0.7\\atop{-0.6}$$. The off-centring effect was modelled based on previous observational results found for redMaPPer Sloan Digital Sky Survey clusters. Our constraints on M200c and c200c allow us to investigate the consistency with numerical predictions and select a concentration–mass relation to describe the high richness CODEX sample. Comparing our best-fitting values for M200c and c200c with other observational surveys at different redshifts, we find no evidence for evolution in the concentration–mass relation, though it could be mitigated by particular selection functions. Similar to previous studies investigating the X-ray luminosity–mass relation, our data suggest a lower evolution than expected from self-similarity.« less

  7. Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory

    ERIC Educational Resources Information Center

    Muthen, Bengt; Asparouhov, Tihomir

    2012-01-01

    This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed…

  8. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  9. Probing dark energy via galaxy cluster outskirts

    NASA Astrophysics Data System (ADS)

    Morandi, Andrea; Sun, Ming

    2016-04-01

    We present a Bayesian approach to combine Planck data and the X-ray physical properties of the intracluster medium in the virialization region of a sample of 320 galaxy clusters (0.056 < z < 1.24, kT ≳ 3 keV) observed with Chandra. We exploited the high level of similarity of the emission measure in the cluster outskirts as cosmology proxy. The cosmological parameters are thus constrained assuming that the emission measure profiles at different redshift are weakly self-similar, that is their shape is universal, explicitly allowing for temperature and redshift dependence of the gas fraction. This cosmological test, in combination with Planck+SNIa data, allows us to put a tight constraint on the dark energy models. For a constant-w model, we have w = -1.010 ± 0.030 and Ωm = 0.311 ± 0.014, while for a time-evolving equation of state of dark energy w(z) we have Ωm = 0.308 ± 0.017, w0 = -0.993 ± 0.046 and wa = -0.123 ± 0.400. Constraints on the cosmology are further improved by adding priors on the gas fraction evolution from hydrodynamic simulations. Current data favour the cosmological constant with w ≡ -1, with no evidence for dynamic dark energy. We checked that our method is robust towards different sources of systematics, including background modelling, outlier measurements, selection effects, inhomogeneities of the gas distribution and cosmic filaments. We also provided for the first time constraints on which definition of cluster boundary radius is more tenable, namely based on a fixed overdensity with respect to the critical density of the Universe. This novel cosmological test has the capacity to provide a generational leap forward in our understanding of the equation of state of dark energy.

  10. Sequential Inverse Problems Bayesian Principles and the Logistic Map Example

    NASA Astrophysics Data System (ADS)

    Duan, Lian; Farmer, Chris L.; Moroz, Irene M.

    2010-09-01

    Bayesian statistics provides a general framework for solving inverse problems, but is not without interpretation and implementation problems. This paper discusses difficulties arising from the fact that forward models are always in error to some extent. Using a simple example based on the one-dimensional logistic map, we argue that, when implementation problems are minimal, the Bayesian framework is quite adequate. In this paper the Bayesian Filter is shown to be able to recover excellent state estimates in the perfect model scenario (PMS) and to distinguish the PMS from the imperfect model scenario (IMS). Through a quantitative comparison of the way in which the observations are assimilated in both the PMS and the IMS scenarios, we suggest that one can, sometimes, measure the degree of imperfection.

  11. Hierarchy Bayesian model based services awareness of high-speed optical access networks

    NASA Astrophysics Data System (ADS)

    Bai, Hui-feng

    2018-03-01

    As the speed of optical access networks soars with ever increasing multiple services, the service-supporting ability of optical access networks suffers greatly from the shortage of service awareness. Aiming to solve this problem, a hierarchy Bayesian model based services awareness mechanism is proposed for high-speed optical access networks. This approach builds a so-called hierarchy Bayesian model, according to the structure of typical optical access networks. Moreover, the proposed scheme is able to conduct simple services awareness operation in each optical network unit (ONU) and to perform complex services awareness from the whole view of system in optical line terminal (OLT). Simulation results show that the proposed scheme is able to achieve better quality of services (QoS), in terms of packet loss rate and time delay.

  12. Bayesian Action–Perception Computational Model: Interaction of Production and Recognition of Cursive Letters

    PubMed Central

    Gilet, Estelle; Diard, Julien; Bessière, Pierre

    2011-01-01

    In this paper, we study the collaboration of perception and action representations involved in cursive letter recognition and production. We propose a mathematical formulation for the whole perception–action loop, based on probabilistic modeling and Bayesian inference, which we call the Bayesian Action–Perception (BAP) model. Being a model of both perception and action processes, the purpose of this model is to study the interaction of these processes. More precisely, the model includes a feedback loop from motor production, which implements an internal simulation of movement. Motor knowledge can therefore be involved during perception tasks. In this paper, we formally define the BAP model and show how it solves the following six varied cognitive tasks using Bayesian inference: i) letter recognition (purely sensory), ii) writer recognition, iii) letter production (with different effectors), iv) copying of trajectories, v) copying of letters, and vi) letter recognition (with internal simulation of movements). We present computer simulations of each of these cognitive tasks, and discuss experimental predictions and theoretical developments. PMID:21674043

  13. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.

    PubMed

    Duarte, Belmiro P M; Wong, Weng Kee

    2015-08-01

    This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.

  14. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach

    PubMed Central

    Duarte, Belmiro P. M.; Wong, Weng Kee

    2014-01-01

    Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159

  15. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less

  16. Using Discrete Loss Functions and Weighted Kappa for Classification: An Illustration Based on Bayesian Network Analysis

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Lenaburg, Lubella

    2009-01-01

    In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…

  17. Incorporating Resilience into Dynamic Social Models

    DTIC Science & Technology

    2016-07-20

    solved by simply using the information provided by the scenario. Instead, additional knowledge is required from relevant fields that study these...resilience function by leveraging Bayesian Knowledge Bases (BKBs), a probabilistic reasoning network framework[5],[6]. BKBs allow for inferencing...reasoning network framework based on Bayesian Knowledge Bases (BKBs). BKBs are central to our social resilience framework as they are used to

  18. Progressive colonization and restricted gene flow shape island-dependent population structure in Galápagos marine iguanas (Amblyrhynchus cristatus)

    PubMed Central

    2009-01-01

    Background Marine iguanas (Amblyrhynchus cristatus) inhabit the coastlines of large and small islands throughout the Galápagos archipelago, providing a rich system to study the spatial and temporal factors influencing the phylogeographic distribution and population structure of a species. Here, we analyze the microevolution of marine iguanas using the complete mitochondrial control region (CR) as well as 13 microsatellite loci representing more than 1200 individuals from 13 islands. Results CR data show that marine iguanas occupy three general clades: one that is widely distributed across the northern archipelago, and likely spread from east to west by way of the South Equatorial current, a second that is found mostly on the older eastern and central islands, and a third that is limited to the younger northern and western islands. Generally, the CR haplotype distribution pattern supports the colonization of the archipelago from the older, eastern islands to the younger, western islands. However, there are also signatures of recurrent, historical gene flow between islands after population establishment. Bayesian cluster analysis of microsatellite genotypes indicates the existence of twenty distinct genetic clusters generally following a one-cluster-per-island pattern. However, two well-differentiated clusters were found on the easternmost island of San Cristóbal, while nine distinct and highly intermixed clusters were found on youngest, westernmost islands of Isabela and Fernandina. High mtDNA and microsatellite genetic diversity were observed for populations on Isabela and Fernandina that may be the result of a recent population expansion and founder events from multiple sources. Conclusions While a past genetic study based on pure FST analysis suggested that marine iguana populations display high levels of nuclear (but not mitochondrial) gene flow due to male-biased dispersal, the results of our sex-biased dispersal tests and the finding of strong genetic differentiation between islands do not support this view. Therefore, our study is a nice example of how recently developed analytical tools such as Bayesian clustering analysis and DNA sequence-based demographic analyses can overcome potential biases introduced by simply relying on FST estimates from markers with different inheritance patterns. PMID:20028547

  19. Impact of socioeconomic inequalities on geographic disparities in cancer incidence: comparison of methods for spatial disease mapping.

    PubMed

    Goungounga, Juste Aristide; Gaudart, Jean; Colonna, Marc; Giorgi, Roch

    2016-10-12

    The reliability of spatial statistics is often put into question because real spatial variations may not be found, especially in heterogeneous areas. Our objective was to compare empirically different cluster detection methods. We assessed their ability to find spatial clusters of cancer cases and evaluated the impact of the socioeconomic status (e.g., the Townsend index) on cancer incidence. Moran's I, the empirical Bayes index (EBI), and Potthoff-Whittinghill test were used to investigate the general clustering. The local cluster detection methods were: i) the spatial oblique decision tree (SpODT); ii) the spatial scan statistic of Kulldorff (SaTScan); and, iii) the hierarchical Bayesian spatial modeling (HBSM) in a univariate and multivariate setting. These methods were used with and without introducing the Townsend index of socioeconomic deprivation known to be related to the distribution of cancer incidence. Incidence data stemmed from the Cancer Registry of Isère and were limited to prostate, lung, colon-rectum, and bladder cancers diagnosed between 1999 and 2007 in men only. The study found a spatial heterogeneity (p < 0.01) and an autocorrelation for prostate (EBI = 0.02; p = 0.001), lung (EBI = 0.01; p = 0.019) and bladder (EBI = 0.007; p = 0.05) cancers. After introduction of the Townsend index, SaTScan failed in finding cancers clusters. This introduction changed the results obtained with the other methods. SpODT identified five spatial classes (p < 0.05): four in the Western and one in the Northern parts of the study area (standardized incidence ratios: 1.68, 1.39, 1.14, 1.12, and 1.16, respectively). In the univariate setting, the Bayesian smoothing method found the same clusters as the two other methods (RR >1.2). The multivariate HBSM found a spatial correlation between lung and bladder cancers (r = 0.6). In spatial analysis of cancer incidence, SpODT and HBSM may be used not only for cluster detection but also for searching for confounding or etiological factors in small areas. Moreover, the multivariate HBSM offers a flexible and meaningful modeling of spatial variations; it shows plausible previously unknown associations between various cancers.

  20. Molecular characterization and population structure study of cambuci: strategy for conservation and genetic improvement.

    PubMed

    Santos, D N; Nunes, C F; Setotaw, T A; Pio, R; Pasqual, M; Cançado, G M A

    2016-12-19

    Cambuci (Campomanesia phaea) belongs to the Myrtaceae family and is native to the Atlantic Forest of Brazil. It has ecological and social appeal but is exposed to problems associated with environmental degradation and expansion of agricultural activities in the region. Comprehensive studies on this species are rare, making its conservation and genetic improvement difficult. Thus, it is important to develop research activities to understand the current situation of the species as well as to make recommendations for its conservation and use. This study was performed to characterize the cambuci accessions found in the germplasm bank of Coordenadoria de Assistência Técnica Integral using inter-simple sequence repeat markers, with the goal of understanding the plant's population structure. The results showed the existence of some level of genetic diversity among the cambuci accessions that could be exploited for the genetic improvement of the species. Principal coordinate analysis and discriminant analysis clustered the 80 accessions into three groups, whereas Bayesian model-based clustering analysis clustered them into two groups. The formation of two cluster groups and the high membership coefficients within the groups pointed out the importance of further collection to cover more areas and more genetic variability within the species. The study also showed the lack of conservation activities; therefore, more attention from the appropriate organizations is needed to plan and implement natural and ex situ conservation activities.

  1. Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling

    NASA Astrophysics Data System (ADS)

    Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

    2017-04-01

    Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.

  2. Valence-Dependent Belief Updating: Computational Validation

    PubMed Central

    Kuzmanovic, Bojana; Rigoux, Lionel

    2017-01-01

    People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments. PMID:28706499

  3. Valence-Dependent Belief Updating: Computational Validation.

    PubMed

    Kuzmanovic, Bojana; Rigoux, Lionel

    2017-01-01

    People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors. We compared biased and unbiased Bayesian models of belief updating, and specified alternative models based on reinforcement learning. The experiment consisted of 80 trials with 80 different adverse future life events. In each trial, participants estimated the base rate of one of these events and estimated their own risk of experiencing the event before and after being confronted with the actual base rate. Belief updates corresponded to the difference between the two self-risk estimates. Valence-dependent updating was assessed by comparing trials with good news (better-than-expected base rates) with trials with bad news (worse-than-expected base rates). After receiving bad relative to good news, participants' updates were smaller and deviated more strongly from rational Bayesian predictions, indicating a valence-induced bias. Model comparison revealed that the biased (i.e., optimistic) Bayesian model of belief updating better accounted for data than the unbiased (i.e., rational) Bayesian model, confirming that the valence of the new information influenced the amount of updating. Moreover, alternative computational modeling based on reinforcement learning demonstrated higher learning rates for good than for bad news, as well as a moderating role of personal knowledge. Finally, in this specific experimental context, the approach based on reinforcement learning was superior to the Bayesian approach. The computational validation of valence-dependent belief updating represents a novel support for a genuine optimism bias in human belief formation. Moreover, the precise control of relevant cognitive variables justifies the conclusion that the motivation to adopt the most favorable self-referential conclusions biases human judgments.

  4. Convergence analysis of surrogate-based methods for Bayesian inverse problems

    NASA Astrophysics Data System (ADS)

    Yan, Liang; Zhang, Yuan-Xiang

    2017-12-01

    The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback-Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.

  5. Online Variational Bayesian Filtering-Based Mobile Target Tracking in Wireless Sensor Networks

    PubMed Central

    Zhou, Bingpeng; Chen, Qingchun; Li, Tiffany Jing; Xiao, Pei

    2014-01-01

    The received signal strength (RSS)-based online tracking for a mobile node in wireless sensor networks (WSNs) is investigated in this paper. Firstly, a multi-layer dynamic Bayesian network (MDBN) is introduced to characterize the target mobility with either directional or undirected movement. In particular, it is proposed to employ the Wishart distribution to approximate the time-varying RSS measurement precision's randomness due to the target movement. It is shown that the proposed MDBN offers a more general analysis model via incorporating the underlying statistical information of both the target movement and observations, which can be utilized to improve the online tracking capability by exploiting the Bayesian statistics. Secondly, based on the MDBN model, a mean-field variational Bayesian filtering (VBF) algorithm is developed to realize the online tracking of a mobile target in the presence of nonlinear observations and time-varying RSS precision, wherein the traditional Bayesian filtering scheme cannot be directly employed. Thirdly, a joint optimization between the real-time velocity and its prior expectation is proposed to enable online velocity tracking in the proposed online tacking scheme. Finally, the associated Bayesian Cramer–Rao Lower Bound (BCRLB) analysis and numerical simulations are conducted. Our analysis unveils that, by exploiting the potential state information via the general MDBN model, the proposed VBF algorithm provides a promising solution to the online tracking of a mobile node in WSNs. In addition, it is shown that the final tracking accuracy linearly scales with its expectation when the RSS measurement precision is time-varying. PMID:25393784

  6. Bayesian performance metrics and small system integration in recent homeland security and defense applications

    NASA Astrophysics Data System (ADS)

    Jannson, Tomasz; Kostrzewski, Andrew; Patton, Edward; Pradhan, Ranjit; Shih, Min-Yi; Walter, Kevin; Savant, Gajendra; Shie, Rick; Forrester, Thomas

    2010-04-01

    In this paper, Bayesian inference is applied to performance metrics definition of the important class of recent Homeland Security and defense systems called binary sensors, including both (internal) system performance and (external) CONOPS. The medical analogy is used to define the PPV (Positive Predictive Value), the basic Bayesian metrics parameter of the binary sensors. Also, Small System Integration (SSI) is discussed in the context of recent Homeland Security and defense applications, emphasizing a highly multi-technological approach, within the broad range of clusters ("nexus") of electronics, optics, X-ray physics, γ-ray physics, and other disciplines.

  7. Technical Note: Approximate Bayesian parameterization of a complex tropical forest model

    NASA Astrophysics Data System (ADS)

    Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.

    2013-08-01

    Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.

  8. Portfolios in Stochastic Local Search: Efficiently Computing Most Probable Explanations in Bayesian Networks

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.; Roth, Dan; Wilkins, David C.

    2001-01-01

    Portfolio methods support the combination of different algorithms and heuristics, including stochastic local search (SLS) heuristics, and have been identified as a promising approach to solve computationally hard problems. While successful in experiments, theoretical foundations and analytical results for portfolio-based SLS heuristics are less developed. This article aims to improve the understanding of the role of portfolios of heuristics in SLS. We emphasize the problem of computing most probable explanations (MPEs) in Bayesian networks (BNs). Algorithmically, we discuss a portfolio-based SLS algorithm for MPE computation, Stochastic Greedy Search (SGS). SGS supports the integration of different initialization operators (or initialization heuristics) and different search operators (greedy and noisy heuristics), thereby enabling new analytical and experimental results. Analytically, we introduce a novel Markov chain model tailored to portfolio-based SLS algorithms including SGS, thereby enabling us to analytically form expected hitting time results that explain empirical run time results. For a specific BN, we show the benefit of using a homogenous initialization portfolio. To further illustrate the portfolio approach, we consider novel additive search heuristics for handling determinism in the form of zero entries in conditional probability tables in BNs. Our additive approach adds rather than multiplies probabilities when computing the utility of an explanation. We motivate the additive measure by studying the dramatic impact of zero entries in conditional probability tables on the number of zero-probability explanations, which again complicates the search process. We consider the relationship between MAXSAT and MPE, and show that additive utility (or gain) is a generalization, to the probabilistic setting, of MAXSAT utility (or gain) used in the celebrated GSAT and WalkSAT algorithms and their descendants. Utilizing our Markov chain framework, we show that expected hitting time is a rational function - i.e. a ratio of two polynomials - of the probability of applying an additive search operator. Experimentally, we report on synthetically generated BNs as well as BNs from applications, and compare SGSs performance to that of Hugin, which performs BN inference by compilation to and propagation in clique trees. On synthetic networks, SGS speeds up computation by approximately two orders of magnitude compared to Hugin. In application networks, our approach is highly competitive in Bayesian networks with a high degree of determinism. In addition to showing that stochastic local search can be competitive with clique tree clustering, our empirical results provide an improved understanding of the circumstances under which portfolio-based SLS outperforms clique tree clustering and vice versa.

  9. Comprehension priming as rational expectation for repetition: Evidence from syntactic processing.

    PubMed

    Myslín, Mark; Levy, Roger

    2016-02-01

    Why do comprehenders process repeated stimuli more rapidly than novel stimuli? We consider an adaptive explanation for why such facilitation may be beneficial: priming is a consequence of expectation for repetition due to rational adaptation to the environment. If occurrences of a stimulus cluster in time, given one occurrence it is rational to expect a second occurrence closely following. Leveraging such knowledge may be particularly useful in online processing of language, where pervasive clustering may help comprehenders negotiate the considerable challenge of continual expectation update at multiple levels of linguistic structure and environmental variability. We test this account in the domain of structural priming in syntax, making use of the sentential complement-direct object (SC-DO) ambiguity. We first show that sentences containing SC continuations cluster in natural language, motivating an expectation for repetition of this structure. Second, we show that comprehenders are indeed sensitive to the syntactic clustering properties of their current environment. In a series of between-groups self-paced reading studies, we find that participants who are exposed to clusters of SC sentences subsequently process repetitions of SC structure more rapidly than participants who are exposed to the same number of SCs spaced in time, and attribute the difference to the learned degree of expectation for repetition. We model this behavior through Bayesian belief update, showing that (the optimal degree of) sensitivity to clustering properties of syntactic structures is indeed learnable through experience. Comprehension priming effects are thus consistent with rational expectation for repetition based on adaptation to the linguistic environment. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Comprehension priming as rational expectation for repetition: Evidence from syntactic processing

    PubMed Central

    Levy, Roger

    2015-01-01

    Why do comprehenders process repeated stimuli more rapidly than novel stimuli? We consider an adaptive explanation for why such facilitation may be beneficial: priming is a consequence of expectation for repetition due to rational adaptation to the environment. If occurrences of a stimulus cluster in time, given one occurrence it is rational to expect a second occurrence closely following. Leveraging such knowledge may be particularly useful in online processing of language, where pervasive clustering may help comprehenders negotiate the considerable challenge of continual expectation update at multiple levels of linguistic structure and environmental variability. We test this account in the domain of structural priming in syntax, making use of the sentential complement-direct object (SC-DO) ambiguity. We first show that sentences containing SC continuations cluster in natural language, motivating an expectation for repetition of this structure. Second, we show that comprehenders are indeed sensitive to the syntactic clustering properties of their current environment. In a series of between-groups self-paced reading studies, we find that participants who are exposed to clusters of SC sentences subsequently process repetitions of SC structure more rapidly than participants who are exposed to the same number of SCs spaced in time, and attribute the difference to the learned degree of expectation for repetition. We model this behavior through Bayesian belief update, showing that (the optimal degree of) sensitivity to clustering properties of syntactic structures is indeed learnable through experience. Comprehension priming effects are thus consistent with rational expectation for repetition based on adaptation to the linguistic environment. PMID:26605963

  11. Bayesian Inference of High-Dimensional Dynamical Ocean Models

    NASA Astrophysics Data System (ADS)

    Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.

    2015-12-01

    This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.

  12. A Bayesian Approach for Summarizing and Modeling Time-Series Exposure Data with Left Censoring.

    PubMed

    Houseman, E Andres; Virji, M Abbas

    2017-08-01

    Direct reading instruments are valuable tools for measuring exposure as they provide real-time measurements for rapid decision making. However, their use is limited to general survey applications in part due to issues related to their performance. Moreover, statistical analysis of real-time data is complicated by autocorrelation among successive measurements, non-stationary time series, and the presence of left-censoring due to limit-of-detection (LOD). A Bayesian framework is proposed that accounts for non-stationary autocorrelation and LOD issues in exposure time-series data in order to model workplace factors that affect exposure and estimate summary statistics for tasks or other covariates of interest. A spline-based approach is used to model non-stationary autocorrelation with relatively few assumptions about autocorrelation structure. Left-censoring is addressed by integrating over the left tail of the distribution. The model is fit using Markov-Chain Monte Carlo within a Bayesian paradigm. The method can flexibly account for hierarchical relationships, random effects and fixed effects of covariates. The method is implemented using the rjags package in R, and is illustrated by applying it to real-time exposure data. Estimates for task means and covariates from the Bayesian model are compared to those from conventional frequentist models including linear regression, mixed-effects, and time-series models with different autocorrelation structures. Simulations studies are also conducted to evaluate method performance. Simulation studies with percent of measurements below the LOD ranging from 0 to 50% showed lowest root mean squared errors for task means and the least biased standard deviations from the Bayesian model compared to the frequentist models across all levels of LOD. In the application, task means from the Bayesian model were similar to means from the frequentist models, while the standard deviations were different. Parameter estimates for covariates were significant in some frequentist models, but in the Bayesian model their credible intervals contained zero; such discrepancies were observed in multiple datasets. Variance components from the Bayesian model reflected substantial autocorrelation, consistent with the frequentist models, except for the auto-regressive moving average model. Plots of means from the Bayesian model showed good fit to the observed data. The proposed Bayesian model provides an approach for modeling non-stationary autocorrelation in a hierarchical modeling framework to estimate task means, standard deviations, quantiles, and parameter estimates for covariates that are less biased and have better performance characteristics than some of the contemporary methods. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.

  13. SIG-VISA: Signal-based Vertically Integrated Seismic Monitoring

    NASA Astrophysics Data System (ADS)

    Moore, D.; Mayeda, K. M.; Myers, S. C.; Russell, S.

    2013-12-01

    Traditional seismic monitoring systems rely on discrete detections produced by station processing software; however, while such detections may constitute a useful summary of station activity, they discard large amounts of information present in the original recorded signal. We present SIG-VISA (Signal-based Vertically Integrated Seismic Analysis), a system for seismic monitoring through Bayesian inference on seismic signals. By directly modeling the recorded signal, our approach incorporates additional information unavailable to detection-based methods, enabling higher sensitivity and more accurate localization using techniques such as waveform matching. SIG-VISA's Bayesian forward model of seismic signal envelopes includes physically-derived models of travel times and source characteristics as well as Gaussian process (kriging) statistical models of signal properties that combine interpolation of historical data with extrapolation of learned physical trends. Applying Bayesian inference, we evaluate the model on earthquakes as well as the 2009 DPRK test event, demonstrating a waveform matching effect as part of the probabilistic inference, along with results on event localization and sensitivity. In particular, we demonstrate increased sensitivity from signal-based modeling, in which the SIGVISA signal model finds statistical evidence for arrivals even at stations for which the IMS station processing failed to register any detection.

  14. Genetic and morphological characterisation of the Ankole Longhorn cattle in the African Great Lakes region.

    PubMed

    Ndumu, Deo B; Baumung, Roswitha; Hanotte, Olivier; Wurzinger, Maria; Okeyo, Mwai A; Jianlin, Han; Kibogo, Harrison; Sölkner, Johann

    2008-01-01

    The study investigated the population structure, diversity and differentiation of almost all of the ecotypes representing the African Ankole Longhorn cattle breed on the basis of morphometric (shape and size), genotypic and spatial distance data. Twentyone morphometric measurements were used to describe the morphology of 439 individuals from 11 sub-populations located in five countries around the Great Lakes region of central and eastern Africa. Additionally, 472 individuals were genotyped using 15 DNA microsatellites. Femoral length, horn length, horn circumference, rump height, body length and fore-limb circumference showed the largest differences between regions. An overall FST index indicated that 2.7% of the total genetic variation was present among sub-populations. The least differentiation was observed between the two sub-populations of Mbarara south and Luwero in Uganda, while the highest level of differentiation was observed between the Mugamba in Burundi and Malagarasi in Tanzania. An estimated membership of four for the inferred clusters from a model-based Bayesian approach was obtained. Both analyses on distance-based and model-based methods consistently isolated the Mugamba sub-population in Burundi from the others.

  15. A review on machine learning principles for multi-view biological data integration.

    PubMed

    Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune

    2018-03-01

    Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

  16. Prognostics of lithium-ion batteries based on Dempster-Shafer theory and the Bayesian Monte Carlo method

    NASA Astrophysics Data System (ADS)

    He, Wei; Williard, Nicholas; Osterman, Michael; Pecht, Michael

    A new method for state of health (SOH) and remaining useful life (RUL) estimations for lithium-ion batteries using Dempster-Shafer theory (DST) and the Bayesian Monte Carlo (BMC) method is proposed. In this work, an empirical model based on the physical degradation behavior of lithium-ion batteries is developed. Model parameters are initialized by combining sets of training data based on DST. BMC is then used to update the model parameters and predict the RUL based on available data through battery capacity monitoring. As more data become available, the accuracy of the model in predicting RUL improves. Two case studies demonstrating this approach are presented.

  17. Small area clustering of under-five children's mortality and associated factors using geo-additive Bayesian discrete-time survival model in Kersa HDSS, Ethiopia.

    PubMed

    Dedefo, Melkamu; Oljira, Lemessa; Assefa, Nega

    2016-02-01

    Child mortality reflects a country's level of socio-economic development and quality of life. In Ethiopia, limited studies were conducted on under-five mortality and almost none of them tried to identify the spatial effect on mortality. Thus, this study explored the small area clustering of under-five mortality and associated factors in Kersa HDSS, Eastern Ethiopia. The study population included all children under the age of five years during the time September, 2008-august 31, 2012 which are registered in Kersa Health and Demographic Surveillance System (Kersa HDSS). A flexible Bayesian geo-additive discrete-time survival mixed model was used. Some of the factors that are significantly associated with under-five mortality, with posterior odds ratio and 95% credible intervals, are maternal educational status 1.31(1.13,-1.49), place of delivery 1.016(1.013-1.12), no of live birth at a delivery 0.35(0.23,1.83), low household wealth index 1.26(1.10 1.43) middle level household wealth index 0.95 (0.84 1.07) pre-term duration of pregnancy 1.95(1.27,2.91), post-term duration of pregnancy 0.74(0.60,0.93) and antenatal visit 1.19(1.06, 1.35). Variation was noted in the risk of under-five mortality by the selected small administrative regions (kebeles). This study reveals geographic patterns in rates of under-five mortality in those selected small administrative regions and shows some important determinants of under-five mortality. More importantly, we observed clustering of under-five mortality, which indicates the importance of spatial effects and presentation of this clustering through maps that facilitates visuality and highlights differentials across geographical areas that would, otherwise, be overlooked in traditional data-analytic methods. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Bayesian logistic regression approaches to predict incorrect DRG assignment.

    PubMed

    Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural

    2018-05-07

    Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.

  19. U.S. consumer demand for restaurant calorie information: targeting demographic and behavioral segments in labeling initiatives.

    PubMed

    Kolodinsky, Jane; Reynolds, Travis William; Cannella, Mark; Timmons, David; Bromberg, Daniel

    2009-01-01

    To identify different segments of U.S. consumers based on food choices, exercise patterns, and desire for restaurant calorie labeling. Using a stratified (by region) random sample of the U.S. population, trained interviewers collected data for this cross-sectional study through telephone surveys. Center for Rural Studies U.S. national health survey. The final sample included 580 responses (22% response rate); data were weighted to be representative of age and gender characteristics of the U.S. population. Self-reported behaviors related to food choices, exercise patterns, desire for calorie information in restaurants, and sample demographics. Clusters were identified using Schwartz Bayesian criteria. Impacts of demographic characteristics on cluster membership were analyzed using bivariate tests of association and multinomial logit regression. Cluster analysis revealed three clusters based on respondents' food choices, activity levels, and desire for restaurant labeling. Two clusters, comprising three quarters of the sample, desired calorie labeling in restaurants. The remaining cluster opposed restaurant labeling. Demographic variables significantly predicting cluster membership included region of residence (p < .10), income (p < .05), gender (p < .01), and age (p < .10). Though limited by a low response and potential self-reporting bias in the phone survey, this study suggests that several groups are likely to benefit from restaurant calorie labeling. Specific demographic clusters could be targeted through labeling initiatives.

  20. Bayesian B-spline mapping for dynamic quantitative traits.

    PubMed

    Xing, Jun; Li, Jiahan; Yang, Runqing; Zhou, Xiaojing; Xu, Shizhong

    2012-04-01

    Owing to their ability and flexibility to describe individual gene expression at different time points, random regression (RR) analyses have become a popular procedure for the genetic analysis of dynamic traits whose phenotypes are collected over time. Specifically, when modelling the dynamic patterns of gene expressions in the RR framework, B-splines have been proved successful as an alternative to orthogonal polynomials. In the so-called Bayesian B-spline quantitative trait locus (QTL) mapping, B-splines are used to characterize the patterns of QTL effects and individual-specific time-dependent environmental errors over time, and the Bayesian shrinkage estimation method is employed to estimate model parameters. Extensive simulations demonstrate that (1) in terms of statistical power, Bayesian B-spline mapping outperforms the interval mapping based on the maximum likelihood; (2) for the simulated dataset with complicated growth curve simulated by B-splines, Legendre polynomial-based Bayesian mapping is not capable of identifying the designed QTLs accurately, even when higher-order Legendre polynomials are considered and (3) for the simulated dataset using Legendre polynomials, the Bayesian B-spline mapping can find the same QTLs as those identified by Legendre polynomial analysis. All simulation results support the necessity and flexibility of B-spline in Bayesian mapping of dynamic traits. The proposed method is also applied to a real dataset, where QTLs controlling the growth trajectory of stem diameters in Populus are located.

  1. A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods

    ERIC Educational Resources Information Center

    Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich

    2013-01-01

    The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…

  2. A Measure of Systems Engineering Effectiveness in Government Acquisition of Complex Information Systems: A Bayesian Belief Network-Based Approach

    ERIC Educational Resources Information Center

    Doskey, Steven Craig

    2014-01-01

    This research presents an innovative means of gauging Systems Engineering effectiveness through a Systems Engineering Relative Effectiveness Index (SE REI) model. The SE REI model uses a Bayesian Belief Network to map causal relationships in government acquisitions of Complex Information Systems (CIS), enabling practitioners to identify and…

  3. Robust Bayesian Analysis of Heavy-tailed Stochastic Volatility Models using Scale Mixtures of Normal Distributions

    PubMed Central

    Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.

    2009-01-01

    A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043

  4. Application of a predictive Bayesian model to environmental accounting.

    PubMed

    Anex, R P; Englehardt, J D

    2001-03-30

    Environmental accounting techniques are intended to capture important environmental costs and benefits that are often overlooked in standard accounting practices. Environmental accounting methods themselves often ignore or inadequately represent large but highly uncertain environmental costs and costs conditioned by specific prior events. Use of a predictive Bayesian model is demonstrated for the assessment of such highly uncertain environmental and contingent costs. The predictive Bayesian approach presented generates probability distributions for the quantity of interest (rather than parameters thereof). A spreadsheet implementation of a previously proposed predictive Bayesian model, extended to represent contingent costs, is described and used to evaluate whether a firm should undertake an accelerated phase-out of its PCB containing transformers. Variability and uncertainty (due to lack of information) in transformer accident frequency and severity are assessed simultaneously using a combination of historical accident data, engineering model-based cost estimates, and subjective judgement. Model results are compared using several different risk measures. Use of the model for incorporation of environmental risk management into a company's overall risk management strategy is discussed.

  5. Universal Darwinism As a Process of Bayesian Inference.

    PubMed

    Campbell, John O

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.

  6. Universal Darwinism As a Process of Bayesian Inference

    PubMed Central

    Campbell, John O.

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an “experiment” in the external world environment, and the results of that “experiment” or the “surprise” entailed by predicted and actual outcomes of the “experiment.” Minimization of free energy implies that the implicit measure of “surprise” experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438

  7. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    PubMed

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  8. A Bayesian sequential processor approach to spectroscopic portal system decisions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sale, K; Candy, J; Breitfeller, E

    The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waitingmore » for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.« less

  9. Comparison of sampling techniques for Bayesian parameter estimation

    NASA Astrophysics Data System (ADS)

    Allison, Rupert; Dunkley, Joanna

    2014-02-01

    The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.

  10. Revisiting crash spatial heterogeneity: A Bayesian spatially varying coefficients approach.

    PubMed

    Xu, Pengpeng; Huang, Helai; Dong, Ni; Wong, S C

    2017-01-01

    This study was performed to investigate the spatially varying relationships between crash frequency and related risk factors. A Bayesian spatially varying coefficients model was elaborately introduced as a methodological alternative to simultaneously account for the unstructured and spatially structured heterogeneity of the regression coefficients in predicting crash frequencies. The proposed method was appealing in that the parameters were modeled via a conditional autoregressive prior distribution, which involved a single set of random effects and a spatial correlation parameter with extreme values corresponding to pure unstructured or pure spatially correlated random effects. A case study using a three-year crash dataset from the Hillsborough County, Florida, was conducted to illustrate the proposed model. Empirical analysis confirmed the presence of both unstructured and spatially correlated variations in the effects of contributory factors on severe crash occurrences. The findings also suggested that ignoring spatially structured heterogeneity may result in biased parameter estimates and incorrect inferences, while assuming the regression coefficients to be spatially clustered only is probably subject to the issue of over-smoothness. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. A Bayesian approach to estimating hidden variables as well as missing and wrong molecular interactions in ordinary differential equation-based mathematical models.

    PubMed

    Engelhardt, Benjamin; Kschischo, Maik; Fröhlich, Holger

    2017-06-01

    Ordinary differential equations (ODEs) are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted. Wrongly modelled biological mechanisms as well as relevant external influence factors that are not included into the model are likely to manifest in major discrepancies between model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue, we suggest a Bayesian approach to estimate hidden influences in ODE-based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specified as well as missed molecular interactions in the model. We demonstrate the performance of our Bayesian dynamic elastic-net with several ordinary differential equation models from the literature, such as human JAK-STAT signalling, information processing at the erythropoietin receptor, isomerization of liquid α -Pinene, G protein cycling in yeast and UV-B triggered signalling in plants. Moreover, we investigate a set of commonly known network motifs and a gene-regulatory network. Altogether our method supports the modeller in an algorithmic manner to identify possible sources of errors in ODE-based models on the basis of experimental data. © 2017 The Author(s).

  12. Impact assessment of extreme storm events using a Bayesian network

    USGS Publications Warehouse

    den Heijer, C.(Kees); Knipping, Dirk T.J.A.; Plant, Nathaniel G.; van Thiel de Vries, Jaap S. M.; Baart, Fedor; van Gelder, Pieter H. A. J. M.

    2012-01-01

    This paper describes an investigation on the usefulness of Bayesian Networks in the safety assessment of dune coasts. A network has been created that predicts the erosion volume based on hydraulic boundary conditions and a number of cross-shore profile indicators. Field measurement data along a large part of the Dutch coast has been used to train the network. Corresponding storm impact on the dunes was calculated with an empirical dune erosion model named duros+. Comparison between the Bayesian Network predictions and the original duros+ results, here considered as observations, results in a skill up to 0.88, provided that the training data covers the range of predictions. Hence, the predictions from a deterministic model (duros+) can be captured in a probabilistic model (Bayesian Network) such that both the process knowledge and uncertainties can be included in impact and vulnerability assessments.

  13. Two-Stage Bayesian Model Averaging in Endogenous Variable Models*

    PubMed Central

    Lenkoski, Alex; Eicher, Theo S.; Raftery, Adrian E.

    2013-01-01

    Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed. PMID:24223471

  14. Constraining stellar binary black hole formation scenarios with eLISA eccentricity measurements

    NASA Astrophysics Data System (ADS)

    Nishizawa, Atsushi; Sesana, Alberto; Berti, Emanuele; Klein, Antoine

    2017-03-01

    A space-based interferometer such as the evolved Laser Interferometer Space Antenna (eLISA) could observe a few to a few thousands of progenitors of black hole binaries (BHBs) similar to those recently detected by Advanced LIGO. Gravitational radiation circularizes the orbit during inspiral, but some BHBs retain a measurable eccentricity at the low frequencies where eLISA is the most sensitive. The eccentricity of a BHB carries precious information about its formation channel: BHBs formed in the field, in globular clusters, or close to a massive black hole (MBH) have distinct eccentricity distributions in the eLISA band. We generate mock eLISA observations, folding in measurement errors, and using a Bayesian model selection, we study whether eLISA measurements can identify the BHB formation channel. We find that a handful of observations would suffice to tell whether BHBs were formed in the gravitational field of an MBH. Conversely, several tens of observations are needed to tell apart field formation from globular cluster formation. A 5-yr eLISA mission with the longest possible armlength is desirable to shed light on BHB formation scenarios.

  15. Constraining stellar binary black hole formation scenarios with LISA eccentricity measurements

    NASA Astrophysics Data System (ADS)

    Berti, Emanuele; Nishizawa, Atsushi; Sesana, Alberto; Klein, Antoine

    2017-01-01

    A space-based interferometer such as LISA could observe few to few thousands progenitors of black hole binaries (BHBs) similar to those recently detected by Advanced LIGO. Gravitational radiation circularizes the orbit during inspiral, but some BHBs retain a measurable eccentricity at the low frequencies where LISA is most sensitive. The eccentricity of a BHB carries precious information about its formation channel: BHBs formed in the field, in globular clusters, or close to a massive black hole (MBH) have distinct eccentricity distributions in the LISA band. We generate mock LISA observations, folding in measurement errors, and using Bayesian model selection we study whether LISA measurements can identify the BHB formation channel. We find that a handful of observations would suffice to tell whether BHBs were formed in the gravitational field of a MBH. Conversely, several tens of observations are needed to tell apart field formation from globular cluster formation. A five-year LISA mission with the longest possible armlength is desirable to shed light on BHB formation scenarios. NSF CAREER Grant No. PHY-1055103, NSF Grant No. PHY-1607130, FCT contract IF/00797/2014/CP1214/CT0012.

  16. Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review.

    PubMed

    Pombo, Nuno; Garcia, Nuno; Bousson, Kouamana

    2017-03-01

    Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios. This study aims to systematically review the literature on systems for the detection and/or prediction of apnea events using a classification model. Forty-five included studies revealed a combination of classification techniques for the diagnosis of apnea, such as threshold-based (14.75%) and machine learning (ML) models (85.25%). In addition, the ML models, were clustered in a mind map, include neural networks (44.26%), regression (4.91%), instance-based (11.47%), Bayesian algorithms (1.63%), reinforcement learning (4.91%), dimensionality reduction (8.19%), ensemble learning (6.55%), and decision trees (3.27%). A classification model should provide an auto-adaptive and no external-human action dependency. In addition, the accuracy of the classification models is related with the effective features selection. New high-quality studies based on randomized controlled trials and validation of models using a large and multiple sample of data are recommended. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  17. Bayesian network modeling applied to coastal geomorphology: lessons learned from a decade of experimentation and application

    NASA Astrophysics Data System (ADS)

    Plant, N. G.; Thieler, E. R.; Gutierrez, B.; Lentz, E. E.; Zeigler, S. L.; Van Dongeren, A.; Fienen, M. N.

    2016-12-01

    We evaluate the strengths and weaknesses of Bayesian networks that have been used to address scientific and decision-support questions related to coastal geomorphology. We will provide an overview of coastal geomorphology research that has used Bayesian networks and describe what this approach can do and when it works (or fails to work). Over the past decade, Bayesian networks have been formulated to analyze the multi-variate structure and evolution of coastal morphology and associated human and ecological impacts. The approach relates observable system variables to each other by estimating discrete correlations. The resulting Bayesian-networks make predictions that propagate errors, conduct inference via Bayes rule, or both. In scientific applications, the model results are useful for hypothesis testing, using confidence estimates to gage the strength of tests while applications to coastal resource management are aimed at decision-support, where the probabilities of desired ecosystems outcomes are evaluated. The range of Bayesian-network applications to coastal morphology includes emulation of high-resolution wave transformation models to make oceanographic predictions, morphologic response to storms and/or sea-level rise, groundwater response to sea-level rise and morphologic variability, habitat suitability for endangered species, and assessment of monetary or human-life risk associated with storms. All of these examples are based on vast observational data sets, numerical model output, or both. We will discuss the progression of our experiments, which has included testing whether the Bayesian-network approach can be implemented and is appropriate for addressing basic and applied scientific problems and evaluating the hindcast and forecast skill of these implementations. We will present and discuss calibration/validation tests that are used to assess the robustness of Bayesian-network models and we will compare these results to tests of other models. This will demonstrate how Bayesian networks are used to extract new insights about coastal morphologic behavior, assess impacts to societal and ecological systems, and communicate probabilistic predictions to decision makers.

  18. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  19. A comparison of Monte Carlo-based Bayesian parameter estimation methods for stochastic models of genetic networks

    PubMed Central

    Zaikin, Alexey; Míguez, Joaquín

    2017-01-01

    We compare three state-of-the-art Bayesian inference methods for the estimation of the unknown parameters in a stochastic model of a genetic network. In particular, we introduce a stochastic version of the paradigmatic synthetic multicellular clock model proposed by Ullner et al., 2007. By introducing dynamical noise in the model and assuming that the partial observations of the system are contaminated by additive noise, we enable a principled mechanism to represent experimental uncertainties in the synthesis of the multicellular system and pave the way for the design of probabilistic methods for the estimation of any unknowns in the model. Within this setup, we tackle the Bayesian estimation of a subset of the model parameters. Specifically, we compare three Monte Carlo based numerical methods for the approximation of the posterior probability density function of the unknown parameters given a set of partial and noisy observations of the system. The schemes we assess are the particle Metropolis-Hastings (PMH) algorithm, the nonlinear population Monte Carlo (NPMC) method and the approximate Bayesian computation sequential Monte Carlo (ABC-SMC) scheme. We present an extensive numerical simulation study, which shows that while the three techniques can effectively solve the problem there are significant differences both in estimation accuracy and computational efficiency. PMID:28797087

  20. Application of bayesian networks to real-time flood risk estimation

    NASA Astrophysics Data System (ADS)

    Garrote, L.; Molina, M.; Blasco, G.

    2003-04-01

    This paper presents the application of a computational paradigm taken from the field of artificial intelligence - the bayesian network - to model the behaviour of hydrologic basins during floods. The final goal of this research is to develop representation techniques for hydrologic simulation models in order to define, develop and validate a mechanism, supported by a software environment, oriented to build decision models for the prediction and management of river floods in real time. The emphasis is placed on providing decision makers with tools to incorporate their knowledge of basin behaviour, usually formulated in terms of rainfall-runoff models, in the process of real-time decision making during floods. A rainfall-runoff model is only a step in the process of decision making. If a reliable rainfall forecast is available and the rainfall-runoff model is well calibrated, decisions can be based mainly on model results. However, in most practical situations, uncertainties in rainfall forecasts or model performance have to be incorporated in the decision process. The computation paradigm adopted for the simulation of hydrologic processes is the bayesian network. A bayesian network is a directed acyclic graph that represents causal influences between linked variables. Under this representation, uncertain qualitative variables are related through causal relations quantified with conditional probabilities. The solution algorithm allows the computation of the expected probability distribution of unknown variables conditioned to the observations. An approach to represent hydrologic processes by bayesian networks with temporal and spatial extensions is presented in this paper, together with a methodology for the development of bayesian models using results produced by deterministic hydrologic simulation models

  1. Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.

    PubMed

    Hack, C Eric

    2006-04-17

    Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.

  2. Rigorous Approach in Investigation of Seismic Structure and Source Characteristicsin Northeast Asia: Hierarchical and Trans-dimensional Bayesian Inversion

    NASA Astrophysics Data System (ADS)

    Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.

    2015-12-01

    Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.

  3. Bayesian model checking: A comparison of tests

    NASA Astrophysics Data System (ADS)

    Lucy, L. B.

    2018-06-01

    Two procedures for checking Bayesian models are compared using a simple test problem based on the local Hubble expansion. Over four orders of magnitude, p-values derived from a global goodness-of-fit criterion for posterior probability density functions agree closely with posterior predictive p-values. The former can therefore serve as an effective proxy for the difficult-to-calculate posterior predictive p-values.

  4. An improved approximate-Bayesian model-choice method for estimating shared evolutionary history

    PubMed Central

    2014-01-01

    Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937

  5. Bayesian block-diagonal variable selection and model averaging

    PubMed Central

    Papaspiliopoulos, O.; Rossell, D.

    2018-01-01

    Summary We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs. Posterior probabilities for any number of models are obtained by evaluating a single one-dimensional integral, and other quantities of interest such as variable inclusion probabilities and model-averaged regression estimates are obtained by an adaptive, deterministic one-dimensional numerical integration. The overall computational cost scales linearly with the number of blocks, which can be processed in parallel, and exponentially with the block size, rendering it most adequate in situations where predictors are organized in many moderately-sized blocks. For general designs, we approximate the Gram matrix by a block-diagonal matrix using spectral clustering and propose an iterative algorithm that capitalizes on the block-diagonal algorithms to explore efficiently the model space. All methods proposed in this paper are implemented in the R library mombf. PMID:29861501

  6. Bayesian Networks Improve Causal Environmental Assessments for Evidence-Based Policy.

    PubMed

    Carriger, John F; Barron, Mace G; Newman, Michael C

    2016-12-20

    Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on valued ecological resources. These aspects are demonstrated through hypothetical problem scenarios that explore some major benefits of using Bayesian networks for reasoning and making inferences in evidence-based policy.

  7. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  8. A Fatigue Crack Size Evaluation Method Based on Lamb Wave Simulation and Limited Experimental Data

    PubMed Central

    He, Jingjing; Ran, Yunmeng; Liu, Bin; Yang, Jinsong; Guan, Xuefei

    2017-01-01

    This paper presents a systematic and general method for Lamb wave-based crack size quantification using finite element simulations and Bayesian updating. The method consists of construction of a baseline quantification model using finite element simulation data and Bayesian updating with limited Lamb wave data from target structure. The baseline model correlates two proposed damage sensitive features, namely the normalized amplitude and phase change, with the crack length through a response surface model. The two damage sensitive features are extracted from the first received S0 mode wave package. The model parameters of the baseline model are estimated using finite element simulation data. To account for uncertainties from numerical modeling, geometry, material and manufacturing between the baseline model and the target model, Bayesian method is employed to update the baseline model with a few measurements acquired from the actual target structure. A rigorous validation is made using in-situ fatigue testing and Lamb wave data from coupon specimens and realistic lap-joint components. The effectiveness and accuracy of the proposed method is demonstrated under different loading and damage conditions. PMID:28902148

  9. Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework

    NASA Astrophysics Data System (ADS)

    Achieng, K. O.; Zhu, J.

    2017-12-01

    There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?

  10. A school based cluster randomised health education intervention trial for improving knowledge and attitudes related to Taenia solium cysticercosis and taeniasis in Mbulu district, northern Tanzania.

    PubMed

    Mwidunda, Sylvester A; Carabin, Hélène; Matuja, William B M; Winkler, Andrea S; Ngowi, Helena A

    2015-01-01

    Taenia solium causes significant economic and public health impacts in endemic countries. This study determined effectiveness of a health education intervention at improving school children's knowledge and attitudes related to T. solium cysticercosis and taeniasis in Tanzania. A cluster randomised controlled health education intervention trial was conducted in 60 schools (30 primary, 30 secondary) in Mbulu district. Baseline data were collected using a structured questionnaire in the 60 schools and group discussions in three other schools. The 60 schools stratified by baseline knowledge were randomised to receive the intervention or serve as control. The health education consisted of an address by a trained teacher, a video show and a leaflet given to each pupil. Two post-intervention re-assessments (immediately and 6 months post-intervention) were conducted in all schools and the third (12 months post-intervention) was conducted in 28 secondary schools. Data were analysed using Bayesian hierarchical log-binomial models for individual knowledge and attitude questions and Bayesian hierarchical linear regression models for scores. The overall score (percentage of correct answers) improved by about 10% in all schools after 6 months, but was slightly lower among secondary schools. Monitoring alone was associated with improvement in scores by about 6%. The intervention was linked to improvements in knowledge regarding taeniasis, porcine cysticercosis, human cysticercosis, epilepsy, the attitude of condemning infected meat but it reduced the attitude of contacting a veterinarian if a pig was found to be infected with cysticercosis. Monitoring alone was linked to an improvement in how best to raise pigs. This study demonstrates the potential value of school children as targets for health messages to control T. solium cysticercosis and taeniasis in endemic areas. Studies are needed to assess effectiveness of message transmission from children to parents and the general community and their impacts in improving behaviours facilitating disease transmission.

  11. A School Based Cluster Randomised Health Education Intervention Trial for Improving Knowledge and Attitudes Related to Taenia solium Cysticercosis and Taeniasis in Mbulu District, Northern Tanzania

    PubMed Central

    Mwidunda, Sylvester A.; Carabin, Hélène; Matuja, William B. M.; Winkler, Andrea S.; Ngowi, Helena A.

    2015-01-01

    Taenia solium causes significant economic and public health impacts in endemic countries. This study determined effectiveness of a health education intervention at improving school children’s knowledge and attitudes related to T. solium cysticercosis and taeniasis in Tanzania. A cluster randomised controlled health education intervention trial was conducted in 60 schools (30 primary, 30 secondary) in Mbulu district. Baseline data were collected using a structured questionnaire in the 60 schools and group discussions in three other schools. The 60 schools stratified by baseline knowledge were randomised to receive the intervention or serve as control. The health education consisted of an address by a trained teacher, a video show and a leaflet given to each pupil. Two post-intervention re-assessments (immediately and 6 months post-intervention) were conducted in all schools and the third (12 months post-intervention) was conducted in 28 secondary schools. Data were analysed using Bayesian hierarchical log-binomial models for individual knowledge and attitude questions and Bayesian hierarchical linear regression models for scores. The overall score (percentage of correct answers) improved by about 10% in all schools after 6 months, but was slightly lower among secondary schools. Monitoring alone was associated with improvement in scores by about 6%. The intervention was linked to improvements in knowledge regarding taeniasis, porcine cysticercosis, human cysticercosis, epilepsy, the attitude of condemning infected meat but it reduced the attitude of contacting a veterinarian if a pig was found to be infected with cysticercosis. Monitoring alone was linked to an improvement in how best to raise pigs. This study demonstrates the potential value of school children as targets for health messages to control T. solium cysticercosis and taeniasis in endemic areas. Studies are needed to assess effectiveness of message transmission from children to parents and the general community and their impacts in improving behaviours facilitating disease transmission. PMID:25719902

  12. Phylogeography of speciation: allopatric divergence and secondary contact between outcrossing and selfing Clarkia.

    PubMed

    Pettengill, James B; Moeller, David A

    2012-09-01

    The origins of hybrid zones between parapatric taxa have been of particular interest for understanding the evolution of reproductive isolation and the geographic context of species divergence. One challenge has been to distinguish between allopatric divergence (followed by secondary contact) versus primary intergradation (parapatric speciation) as alternative divergence histories. Here, we use complementary phylogeographic and population genetic analyses to investigate the recent divergence of two subspecies of Clarkia xantiana and the formation of a hybrid zone within the narrow region of sympatry. We tested alternative phylogeographic models of divergence using approximate Bayesian computation (ABC) and found strong support for a secondary contact model and little support for a model allowing for gene flow throughout the divergence process (i.e. primary intergradation). Two independent methods for inferring the ancestral geography of each subspecies, one based on probabilistic character state reconstructions and the other on palaeo-distribution modelling, also support a model of divergence in allopatry and range expansion leading to secondary contact. The membership of individuals to genetic clusters suggests geographic substructure within each taxon where allopatric and sympatric samples are primarily found in separate clusters. We also observed coincidence and concordance of genetic clines across three types of molecular markers, which suggests that there is a strong barrier to gene flow. Taken together, our results provide evidence for allopatric divergence followed by range expansion leading to secondary contact. The location of refugial populations and the directionality of range expansion are consistent with expectations based on climate change since the last glacial maximum. Our approach also illustrates the utility of combining phylogeographic hypothesis testing with species distribution modelling and fine-scale population genetic analyses for inferring the geography of the divergence process. © 2012 Blackwell Publishing Ltd.

  13. Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014–2015)

    PubMed Central

    2016-01-01

    The renewed urgency to develop new treatments for Mycobacterium tuberculosis (Mtb) infection has resulted in large-scale phenotypic screening and thousands of new active compounds in vitro. The next challenge is to identify candidates to pursue in a mouse in vivo efficacy model as a step to predicting clinical efficacy. We previously analyzed over 70 years of this mouse in vivo efficacy data, which we used to generate and validate machine learning models. Curation of 60 additional small molecules with in vivo data published in 2014 and 2015 was undertaken to further test these models. This represents a much larger test set than for the previous models. Several computational approaches have now been applied to analyze these molecules and compare their molecular properties beyond those attempted previously. Our previous machine learning models have been updated, and a novel aspect has been added in the form of mouse liver microsomal half-life (MLM t1/2) and in vitro-based Mtb models incorporating cytotoxicity data that were used to predict in vivo activity for comparison. Our best Mtbin vivo models possess fivefold ROC values > 0.7, sensitivity > 80%, and concordance > 60%, while the best specificity value is >40%. Use of an MLM t1/2 Bayesian model affords comparable results for scoring the 60 compounds tested. Combining MLM stability and in vitroMtb models in a novel consensus workflow in the best cases has a positive predicted value (hit rate) > 77%. Our results indicate that Bayesian models constructed with literature in vivoMtb data generated by different laboratories in various mouse models can have predictive value and may be used alongside MLM t1/2 and in vitro-based Mtb models to assist in selecting antitubercular compounds with desirable in vivo efficacy. We demonstrate for the first time that consensus models of any kind can be used to predict in vivo activity for Mtb. In addition, we describe a new clustering method for data visualization and apply this to the in vivo training and test data, ultimately making the method accessible in a mobile app. PMID:27335215

  14. Selected aspects of prior and likelihood information for a Bayesian classifier in a road safety analysis.

    PubMed

    Nowakowska, Marzena

    2017-04-01

    The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Bayesian learning and the psychology of rule induction

    PubMed Central

    Endress, Ansgar D.

    2014-01-01

    In recent years, Bayesian learning models have been applied to an increasing variety of domains. While such models have been criticized on theoretical grounds, the underlying assumptions and predictions are rarely made concrete and tested experimentally. Here, I use Frank and Tenenbaum's (2011) Bayesian model of rule-learning as a case study to spell out the underlying assumptions, and to confront them with the empirical results Frank and Tenenbaum (2011) propose to simulate, as well as with novel experiments. While rule-learning is arguably well suited to rational Bayesian approaches, I show that their models are neither psychologically plausible nor ideal observer models. Further, I show that their central assumption is unfounded: humans do not always preferentially learn more specific rules, but, at least in some situations, those rules that happen to be more salient. Even when granting the unsupported assumptions, I show that all of the experiments modeled by Frank and Tenenbaum (2011) either contradict their models, or have a large number of more plausible interpretations. I provide an alternative account of the experimental data based on simple psychological mechanisms, and show that this account both describes the data better, and is easier to falsify. I conclude that, despite the recent surge in Bayesian models of cognitive phenomena, psychological phenomena are best understood by developing and testing psychological theories rather than models that can be fit to virtually any data. PMID:23454791

  16. Recurrent-neural-network-based Boolean factor analysis and its application to word clustering.

    PubMed

    Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu

    2009-07-01

    The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.

  17. Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

    PubMed

    Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

    2017-06-01

    Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. A systematic review of Bayesian articles in psychology: The last 25 years.

    PubMed

    van de Schoot, Rens; Winter, Sonja D; Ryan, Oisín; Zondervan-Zwijnenburg, Mariëlle; Depaoli, Sarah

    2017-06-01

    Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  19. Improved prediction of tacrolimus concentrations early after kidney transplantation using theory-based pharmacokinetic modelling.

    PubMed

    Størset, Elisabet; Holford, Nick; Hennig, Stefanie; Bergmann, Troels K; Bergan, Stein; Bremer, Sara; Åsberg, Anders; Midtvedt, Karsten; Staatz, Christine E

    2014-09-01

    The aim was to develop a theory-based population pharmacokinetic model of tacrolimus in adult kidney transplant recipients and to externally evaluate this model and two previous empirical models. Data were obtained from 242 patients with 3100 tacrolimus whole blood concentrations. External evaluation was performed by examining model predictive performance using Bayesian forecasting. Pharmacokinetic disposition parameters were estimated based on tacrolimus plasma concentrations, predicted from whole blood concentrations, haematocrit and literature values for tacrolimus binding to red blood cells. Disposition parameters were allometrically scaled to fat free mass. Tacrolimus whole blood clearance/bioavailability standardized to haematocrit of 45% and fat free mass of 60 kg was estimated to be 16.1 l h−1 [95% CI 12.6, 18.0 l h−1]. Tacrolimus clearance was 30% higher (95% CI 13, 46%) and bioavailability 18% lower (95% CI 2, 29%) in CYP3A5 expressers compared with non-expressers. An Emax model described decreasing tacrolimus bioavailability with increasing prednisolone dose. The theory-based model was superior to the empirical models during external evaluation displaying a median prediction error of −1.2% (95% CI −3.0, 0.1%). Based on simulation, Bayesian forecasting led to 65% (95% CI 62, 68%) of patients achieving a tacrolimus average steady-state concentration within a suggested acceptable range. A theory-based population pharmacokinetic model was superior to two empirical models for prediction of tacrolimus concentrations and seemed suitable for Bayesian prediction of tacrolimus doses early after kidney transplantation.

  20. Bayesian algorithm implementation in a real time exposure assessment model on benzene with calculation of associated cancer risks.

    PubMed

    Sarigiannis, Dimosthenis A; Karakitsios, Spyros P; Gotti, Alberto; Papaloukas, Costas L; Kassomenos, Pavlos A; Pilidis, Georgios A

    2009-01-01

    The objective of the current study was the development of a reliable modeling platform to calculate in real time the personal exposure and the associated health risk for filling station employees evaluating current environmental parameters (traffic, meteorological and amount of fuel traded) determined by the appropriate sensor network. A set of Artificial Neural Networks (ANNs) was developed to predict benzene exposure pattern for the filling station employees. Furthermore, a Physiology Based Pharmaco-Kinetic (PBPK) risk assessment model was developed in order to calculate the lifetime probability distribution of leukemia to the employees, fed by data obtained by the ANN model. Bayesian algorithm was involved in crucial points of both model sub compartments. The application was evaluated in two filling stations (one urban and one rural). Among several algorithms available for the development of the ANN exposure model, Bayesian regularization provided the best results and seemed to be a promising technique for prediction of the exposure pattern of that occupational population group. On assessing the estimated leukemia risk under the scope of providing a distribution curve based on the exposure levels and the different susceptibility of the population, the Bayesian algorithm was a prerequisite of the Monte Carlo approach, which is integrated in the PBPK-based risk model. In conclusion, the modeling system described herein is capable of exploiting the information collected by the environmental sensors in order to estimate in real time the personal exposure and the resulting health risk for employees of gasoline filling stations.

  1. Bayesian Algorithm Implementation in a Real Time Exposure Assessment Model on Benzene with Calculation of Associated Cancer Risks

    PubMed Central

    Sarigiannis, Dimosthenis A.; Karakitsios, Spyros P.; Gotti, Alberto; Papaloukas, Costas L.; Kassomenos, Pavlos A.; Pilidis, Georgios A.

    2009-01-01

    The objective of the current study was the development of a reliable modeling platform to calculate in real time the personal exposure and the associated health risk for filling station employees evaluating current environmental parameters (traffic, meteorological and amount of fuel traded) determined by the appropriate sensor network. A set of Artificial Neural Networks (ANNs) was developed to predict benzene exposure pattern for the filling station employees. Furthermore, a Physiology Based Pharmaco-Kinetic (PBPK) risk assessment model was developed in order to calculate the lifetime probability distribution of leukemia to the employees, fed by data obtained by the ANN model. Bayesian algorithm was involved in crucial points of both model sub compartments. The application was evaluated in two filling stations (one urban and one rural). Among several algorithms available for the development of the ANN exposure model, Bayesian regularization provided the best results and seemed to be a promising technique for prediction of the exposure pattern of that occupational population group. On assessing the estimated leukemia risk under the scope of providing a distribution curve based on the exposure levels and the different susceptibility of the population, the Bayesian algorithm was a prerequisite of the Monte Carlo approach, which is integrated in the PBPK-based risk model. In conclusion, the modeling system described herein is capable of exploiting the information collected by the environmental sensors in order to estimate in real time the personal exposure and the resulting health risk for employees of gasoline filling stations. PMID:22399936

  2. Development and comparison of Bayesian modularization method in uncertainty assessment of hydrological models

    NASA Astrophysics Data System (ADS)

    Li, L.; Xu, C.-Y.; Engeland, K.

    2012-04-01

    With respect to model calibration, parameter estimation and analysis of uncertainty sources, different approaches have been used in hydrological models. Bayesian method is one of the most widely used methods for uncertainty assessment of hydrological models, which incorporates different sources of information into a single analysis through Bayesian theorem. However, none of these applications can well treat the uncertainty in extreme flows of hydrological models' simulations. This study proposes a Bayesian modularization method approach in uncertainty assessment of conceptual hydrological models by considering the extreme flows. It includes a comprehensive comparison and evaluation of uncertainty assessments by a new Bayesian modularization method approach and traditional Bayesian models using the Metropolis Hasting (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions are used in combination with traditional Bayesian: the AR (1) plus Normal and time period independent model (Model 1), the AR (1) plus Normal and time period dependent model (Model 2) and the AR (1) plus multi-normal model (Model 3). The results reveal that (1) the simulations derived from Bayesian modularization method are more accurate with the highest Nash-Sutcliffe efficiency value, and (2) the Bayesian modularization method performs best in uncertainty estimates of entire flows and in terms of the application and computational efficiency. The study thus introduces a new approach for reducing the extreme flow's effect on the discharge uncertainty assessment of hydrological models via Bayesian. Keywords: extreme flow, uncertainty assessment, Bayesian modularization, hydrological model, WASMOD

  3. On the blind use of statistical tools in the analysis of globular cluster stars

    NASA Astrophysics Data System (ADS)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco

    2018-04-01

    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  4. Habitat fragmentation in coastal southern California disrupts genetic connectivity in the cactus wren (Campylorhynchus brunneicapillus).

    PubMed

    Barr, Kelly R; Kus, Barbara E; Preston, Kristine L; Howell, Scarlett; Perkins, Emily; Vandergast, Amy G

    2015-05-01

    Achieving long-term persistence of species in urbanized landscapes requires characterizing population genetic structure to understand and manage the effects of anthropogenic disturbance on connectivity. Urbanization over the past century in coastal southern California has caused both precipitous loss of coastal sage scrub habitat and declines in populations of the cactus wren (Campylorhynchus brunneicapillus). Using 22 microsatellite loci, we found that remnant cactus wren aggregations in coastal southern California comprised 20 populations based on strict exact tests for population differentiation, and 12 genetic clusters with hierarchical Bayesian clustering analyses. Genetic structure patterns largely mirrored underlying habitat availability, with cluster and population boundaries coinciding with fragmentation caused primarily by urbanization. Using a habitat model we developed, we detected stronger associations between habitat-based distances and genetic distances than Euclidean geographic distance. Within populations, we detected a positive association between available local habitat and allelic richness and a negative association with relatedness. Isolation-by-distance patterns varied over the study area, which we attribute to temporal differences in anthropogenic landscape development. We also found that genetic bottleneck signals were associated with wildfire frequency. These results indicate that habitat fragmentation and alterations have reduced genetic connectivity and diversity of cactus wren populations in coastal southern California. Management efforts focused on improving connectivity among remaining populations may help to ensure population persistence. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  5. Analyzing chromatographic data using multilevel modeling.

    PubMed

    Wiczling, Paweł

    2018-06-01

    It is relatively easy to collect chromatographic measurements for a large number of analytes, especially with gradient chromatographic methods coupled with mass spectrometry detection. Such data often have a hierarchical or clustered structure. For example, analytes with similar hydrophobicity and dissociation constant tend to be more alike in their retention than a randomly chosen set of analytes. Multilevel models recognize the existence of such data structures by assigning a model for each parameter, with its parameters also estimated from data. In this work, a multilevel model is proposed to describe retention time data obtained from a series of wide linear organic modifier gradients of different gradient duration and different mobile phase pH for a large set of acids and bases. The multilevel model consists of (1) the same deterministic equation describing the relationship between retention time and analyte-specific and instrument-specific parameters, (2) covariance relationships relating various physicochemical properties of the analyte to chromatographically specific parameters through quantitative structure-retention relationship based equations, and (3) stochastic components of intra-analyte and interanalyte variability. The model was implemented in Stan, which provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods. Graphical abstract Relationships between log k and MeOH content for acidic, basic, and neutral compounds with different log P. CI credible interval, PSA polar surface area.

  6. Research on Bayes matting algorithm based on Gaussian mixture model

    NASA Astrophysics Data System (ADS)

    Quan, Wei; Jiang, Shan; Han, Cheng; Zhang, Chao; Jiang, Zhengang

    2015-12-01

    The digital matting problem is a classical problem of imaging. It aims at separating non-rectangular foreground objects from a background image, and compositing with a new background image. Accurate matting determines the quality of the compositing image. A Bayesian matting Algorithm Based on Gaussian Mixture Model is proposed to solve this matting problem. Firstly, the traditional Bayesian framework is improved by introducing Gaussian mixture model. Then, a weighting factor is added in order to suppress the noises of the compositing images. Finally, the effect is further improved by regulating the user's input. This algorithm is applied to matting jobs of classical images. The results are compared to the traditional Bayesian method. It is shown that our algorithm has better performance in detail such as hair. Our algorithm eliminates the noise well. And it is very effectively in dealing with the kind of work, such as interested objects with intricate boundaries.

  7. Limitations of cytochrome oxidase I for the barcoding of Neritidae (Mollusca: Gastropoda) as revealed by Bayesian analysis.

    PubMed

    Chee, S Y

    2015-05-25

    The mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) gene has been universally and successfully utilized as a barcoding gene, mainly because it can be amplified easily, applied across a wide range of taxa, and results can be obtained cheaply and quickly. However, in rare cases, the gene can fail to distinguish between species, particularly when exposed to highly sensitive methods of data analysis, such as the Bayesian method, or when taxa have undergone introgressive hybridization, over-splitting, or incomplete lineage sorting. Such cases require the use of alternative markers, and nuclear DNA markers are commonly used. In this study, a dendrogram produced by Bayesian analysis of an mtDNA COI dataset was compared with that of a nuclear DNA ATPS-α dataset, in order to evaluate the efficiency of COI in barcoding Malaysian nerites (Neritidae). In the COI dendrogram, most of the species were in individual clusters, except for two species: Nerita chamaeleon and N. histrio. These two species were placed in the same subcluster, whereas in the ATPS-α dendrogram they were in their own subclusters. Analysis of the ATPS-α gene also placed the two genera of nerites (Nerita and Neritina) in separate clusters, whereas COI gene analysis placed both genera in the same cluster. Therefore, in the case of the Neritidae, the ATPS-α gene is a better barcoding gene than the COI gene.

  8. Hierarchical Bayesian sparse image reconstruction with application to MRFM.

    PubMed

    Dobigeon, Nicolas; Hero, Alfred O; Tourneret, Jean-Yves

    2009-09-01

    This paper presents a hierarchical Bayesian model to reconstruct sparse images when the observations are obtained from linear transformations and corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be used to estimate the image to be recovered, e.g., by maximizing the estimated posterior distribution. In our fully Bayesian approach, the posteriors of all the parameters are available. Thus, our algorithm provides more information than other previously proposed sparse reconstruction methods that only give a point estimate. The performance of the proposed hierarchical Bayesian sparse reconstruction method is illustrated on synthetic data and real data collected from a tobacco virus sample using a prototype MRFM instrument.

  9. A Bayesian Alternative for Multi-objective Ecohydrological Model Specification

    NASA Astrophysics Data System (ADS)

    Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.

    2015-12-01

    Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.

  10. Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction

    PubMed Central

    Shen, Li; Qi, Yuan; Kim, Sungeun; Nho, Kwangsik; Wan, Jing; Risacher, Shannon L.; Saykin, Andrew J.

    2010-01-01

    We apply sparse Bayesian learning methods, automatic relevance determination (ARD) and predictive ARD (PARD), to Alzheimer’s disease (AD) classification to make accurate prediction and identify critical imaging markers relevant to AD at the same time. ARD is one of the most successful Bayesian feature selection methods. PARD is a powerful Bayesian feature selection method, and provides sparse models that is easy to interpret. PARD selects the model with the best estimate of the predictive performance instead of choosing the one with the largest marginal model likelihood. Comparative study with support vector machine (SVM) shows that ARD/PARD in general outperform SVM in terms of prediction accuracy. Additional comparison with surface-based general linear model (GLM) analysis shows that regions with strongest signals are identified by both GLM and ARD/PARD. While GLM P-map returns significant regions all over the cortex, ARD/PARD provide a small number of relevant and meaningful imaging markers with predictive power, including both cortical and subcortical measures. PMID:20879451

  11. Bayesian analysis of caustic-crossing microlensing events

    NASA Astrophysics Data System (ADS)

    Cassan, A.; Horne, K.; Kains, N.; Tsapras, Y.; Browne, P.

    2010-06-01

    Aims: Caustic-crossing binary-lens microlensing events are important anomalous events because they are capable of detecting an extrasolar planet companion orbiting the lens star. Fast and robust modelling methods are thus of prime interest in helping to decide whether a planet is detected by an event. Cassan introduced a new set of parameters to model binary-lens events, which are closely related to properties of the light curve. In this work, we explain how Bayesian priors can be added to this framework, and investigate on interesting options. Methods: We develop a mathematical formulation that allows us to compute analytically the priors on the new parameters, given some previous knowledge about other physical quantities. We explicitly compute the priors for a number of interesting cases, and show how this can be implemented in a fully Bayesian, Markov chain Monte Carlo algorithm. Results: Using Bayesian priors can accelerate microlens fitting codes by reducing the time spent considering physically implausible models, and helps us to discriminate between alternative models based on the physical plausibility of their parameters.

  12. Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences

    PubMed Central

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566

  13. Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network

    NASA Astrophysics Data System (ADS)

    Li, Zhiqiang; Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu

    2018-04-01

    This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit.

  14. Profile-Based LC-MS Data Alignment—A Bayesian Approach

    PubMed Central

    Tsai, Tsung-Heng; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.

    2014-01-01

    A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. PMID:23929872

  15. Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.; Wilkins, David C.; Roth, Dan

    2010-01-01

    For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which algorithms are effective for initialization? When should the search process be restarted? In the present work we investigate these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE search time for application BNs by several orders of magnitude compared to using uniform at random initialization without restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.

  16. Evaluation of a neutron spectrum from Bonner spheres measurements using a Bayesian parameter estimation combined with the traditional unfolding methods

    NASA Astrophysics Data System (ADS)

    Mazrou, H.; Bezoubiri, F.

    2018-07-01

    In this work, a new program developed under MATLAB environment and supported by the Bayesian software WinBUGS has been combined to the traditional unfolding codes namely MAXED and GRAVEL, to evaluate a neutron spectrum from the Bonner spheres measured counts obtained around a shielded 241AmBe based-neutron irradiator located at a Secondary Standards Dosimetry Laboratory (SSDL) at CRNA. In the first step, the results obtained by the standalone Bayesian program, using a parametric neutron spectrum model based on a linear superposition of three components namely: a thermal-Maxwellian distribution, an epithermal (1/E behavior) and a kind of a Watt fission and Evaporation models to represent the fast component, were compared to those issued from MAXED and GRAVEL assuming a Monte Carlo default spectrum. Through the selection of new upper limits for some free parameters, taking into account the physical characteristics of the irradiation source, of both considered models, good agreement was obtained for investigated integral quantities i.e. fluence rate and ambient dose equivalent rate compared to MAXED and GRAVEL results. The difference was generally below 4% for investigated parameters suggesting, thereby, the reliability of the proposed models. In the second step, the Bayesian results obtained from the previous calculations were used, as initial guess spectra, for the traditional unfolding codes, MAXED and GRAVEL to derive the solution spectra. Here again the results were in very good agreement, confirming the stability of the Bayesian solution.

  17. Spatiotemporal clusters of malaria cases at village level, northwest Ethiopia.

    PubMed

    Alemu, Kassahun; Worku, Alemayehu; Berhane, Yemane; Kumie, Abera

    2014-06-06

    Malaria attacks are not evenly distributed in space and time. In highland areas with low endemicity, malaria transmission is highly variable and malaria acquisition risk for individuals is unevenly distributed even within a neighbourhood. Characterizing the spatiotemporal distribution of malaria cases in high-altitude villages is necessary to prioritize the risk areas and facilitate interventions. Spatial scan statistics using the Bernoulli method were employed to identify spatial and temporal clusters of malaria in high-altitude villages. Daily malaria data were collected, using a passive surveillance system, from patients visiting local health facilities. Georeference data were collected at villages using hand-held global positioning system devices and linked to patient data. Bernoulli model using Bayesian approaches and Marcov Chain Monte Carlo (MCMC) methods were used to identify the effects of factors on spatial clusters of malaria cases. The deviance information criterion (DIC) was used to assess the goodness-of-fit of the different models. The smaller the DIC, the better the model fit. Malaria cases were clustered in both space and time in high-altitude villages. Spatial scan statistics identified a total of 56 spatial clusters of malaria in high-altitude villages. Of these, 39 were the most likely clusters (LLR = 15.62, p < 0.00001) and 17 were secondary clusters (LLR = 7.05, p < 0.03). The significant most likely temporal malaria clusters were detected between August and December (LLR = 17.87, p < 0.001). Travel away home, males and age above 15 years had statistically significant effect on malaria clusters at high-altitude villages. The study identified spatial clusters of malaria cases occurring at high elevation villages within the district. A patient who travelled away from home to a malaria-endemic area might be the most probable source of malaria infection in a high-altitude village. Malaria interventions in high altitude villages should address factors associated with malaria clustering.

  18. New Mycobacterium tuberculosis LAM sublineage with geographical specificity for the Old World revealed by phylogenetical and Bayesian analyses.

    PubMed

    Reynaud, Yann; Rastogi, Nalin

    2016-12-01

    We recently showed that the Mycobacterium tuberculosis sublineage LAM9 could be subdivided as two distinct subpopulations - each reflecting its unique biogeographical structure and evolutionary history. We subsequently attempted to verify if this genetic structuration could be traced in an enlarged global sample. For this purpose, we analyzed global evolutionary relationships of LAM strains in a large dataset (n = 1923 isolates from 35 countries worldwide) with concomitant spoligotyping and MIRU-VNTR data, followed by a deeper analysis of LAM9 sublineage (n = 851 isolates). Based on a combination of phylogenetical analysis and Bayesian statistics, a total of three different clusters, tentatively named LAM9C1, C2 and C3 were described in this dataset. Closer inspection of the phylogenetic tree with concomitant data on origin of isolates with genetic clusterization revealed LAM9C3 being the most tightly knit group exclusively found in the Old World as opposed to LAM9C2 being a loosely-knit group without any phylogeographical specificity; while LAM9C1 appeared with a majority of strains being well-clustered despite some isolates that intermixed with unrelated LAM clusters. Subsequently, we hereby describe a new M. tuberculosis LAM sublineage named LAM9C3 with phylogeographical specificity for the Old World. These findings open new perspectives to study respective migration histories and adaptation to human hosts of specific M. tuberculosis clones during the exploration and conquest of the New World. We therefore plan to reevaluate the nomenclature and evolutionary history of various LAM sublineages using Whole Genome Sequencing (WGS). Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. A Bayesian approach to model structural error and input variability in groundwater modeling

    NASA Astrophysics Data System (ADS)

    Xu, T.; Valocchi, A. J.; Lin, Y. F. F.; Liang, F.

    2015-12-01

    Effective water resource management typically relies on numerical models to analyze groundwater flow and solute transport processes. Model structural error (due to simplification and/or misrepresentation of the "true" environmental system) and input forcing variability (which commonly arises since some inputs are uncontrolled or estimated with high uncertainty) are ubiquitous in groundwater models. Calibration that overlooks errors in model structure and input data can lead to biased parameter estimates and compromised predictions. We present a fully Bayesian approach for a complete assessment of uncertainty for spatially distributed groundwater models. The approach explicitly recognizes stochastic input and uses data-driven error models based on nonparametric kernel methods to account for model structural error. We employ exploratory data analysis to assist in specifying informative prior for error models to improve identifiability. The inference is facilitated by an efficient sampling algorithm based on DREAM-ZS and a parameter subspace multiple-try strategy to reduce the required number of forward simulations of the groundwater model. We demonstrate the Bayesian approach through a synthetic case study of surface-ground water interaction under changing pumping conditions. It is found that explicit treatment of errors in model structure and input data (groundwater pumping rate) has substantial impact on the posterior distribution of groundwater model parameters. Using error models reduces predictive bias caused by parameter compensation. In addition, input variability increases parametric and predictive uncertainty. The Bayesian approach allows for a comparison among the contributions from various error sources, which could inform future model improvement and data collection efforts on how to best direct resources towards reducing predictive uncertainty.

  20. Precise Network Modeling of Systems Genetics Data Using the Bayesian Network Webserver.

    PubMed

    Ziebarth, Jesse D; Cui, Yan

    2017-01-01

    The Bayesian Network Webserver (BNW, http://compbio.uthsc.edu/BNW ) is an integrated platform for Bayesian network modeling of biological datasets. It provides a web-based network modeling environment that seamlessly integrates advanced algorithms for probabilistic causal modeling and reasoning with Bayesian networks. BNW is designed for precise modeling of relatively small networks that contain less than 20 nodes. The structure learning algorithms used by BNW guarantee the discovery of the best (most probable) network structure given the data. To facilitate network modeling across multiple biological levels, BNW provides a very flexible interface that allows users to assign network nodes into different tiers and define the relationships between and within the tiers. This function is particularly useful for modeling systems genetics datasets that often consist of multiscalar heterogeneous genotype-to-phenotype data. BNW enables users to, within seconds or minutes, go from having a simply formatted input file containing a dataset to using a network model to make predictions about the interactions between variables and the potential effects of experimental interventions. In this chapter, we will introduce the functions of BNW and show how to model systems genetics datasets with BNW.

  1. Predicting Mycobacterium tuberculosis Complex Clades Using Knowledge-Based Bayesian Networks

    PubMed Central

    Bennett, Kristin P.

    2014-01-01

    We develop a novel approach for incorporating expert rules into Bayesian networks for classification of Mycobacterium tuberculosis complex (MTBC) clades. The proposed knowledge-based Bayesian network (KBBN) treats sets of expert rules as prior distributions on the classes. Unlike prior knowledge-based support vector machine approaches which require rules expressed as polyhedral sets, KBBN directly incorporates the rules without any modification. KBBN uses data to refine rule-based classifiers when the rule set is incomplete or ambiguous. We develop a predictive KBBN model for 69 MTBC clades found in the SITVIT international collection. We validate the approach using two testbeds that model knowledge of the MTBC obtained from two different experts and large DNA fingerprint databases to predict MTBC genetic clades and sublineages. These models represent strains of MTBC using high-throughput biomarkers called spacer oligonucleotide types (spoligotypes), since these are routinely gathered from MTBC isolates of tuberculosis (TB) patients. Results show that incorporating rules into problems can drastically increase classification accuracy if data alone are insufficient. The SITVIT KBBN is publicly available for use on the World Wide Web. PMID:24864238

  2. A Bayesian approach for parameter estimation and prediction using a computationally intensive model

    DOE PAGES

    Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...

    2015-02-05

    Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less

  3. A Bayesian Nonparametric Approach to Test Equating

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2009-01-01

    A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…

  4. Model Diagnostics for Bayesian Networks

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2006-01-01

    Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…

  5. Assessing population genetic structure via the maximisation of genetic distance

    PubMed Central

    2009-01-01

    Background The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. Methods In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set. Results The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found. Conclusion This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present. PMID:19900278

  6. Validation of the thermal challenge problem using Bayesian Belief Networks.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McFarland, John; Swiler, Laura Painton

    The thermal challenge problem has been developed at Sandia National Laboratories as a testbed for demonstrating various types of validation approaches and prediction methods. This report discusses one particular methodology to assess the validity of a computational model given experimental data. This methodology is based on Bayesian Belief Networks (BBNs) and can incorporate uncertainty in experimental measurements, in physical quantities, and model uncertainties. The approach uses the prior and posterior distributions of model output to compute a validation metric based on Bayesian hypothesis testing (a Bayes' factor). This report discusses various aspects of the BBN, specifically in the context ofmore » the thermal challenge problem. A BBN is developed for a given set of experimental data in a particular experimental configuration. The development of the BBN and the method for ''solving'' the BBN to develop the posterior distribution of model output through Monte Carlo Markov Chain sampling is discussed in detail. The use of the BBN to compute a Bayes' factor is demonstrated.« less

  7. Evaluating Great Lakes bald eagle nesting habitat with Bayesian inference

    Treesearch

    Teryl G. Grubb; William W. Bowerman; Allen J. Bath; John P. Giesy; D. V. Chip Weseloh

    2003-01-01

    Bayesian inference facilitated structured interpretation of a nonreplicated, experience-based survey of potential nesting habitat for bald eagles (Haliaeetus leucocephalus) along the five Great Lakes shorelines. We developed a pattern recognition (PATREC) model of our aerial search image with six habitat attributes: (a) tree cover, (b) proximity and...

  8. Genetic diversity and divergence at the Arbutus unedo L. (Ericaceae) westernmost distribution limit.

    PubMed

    Ribeiro, Maria Margarida; Piotti, Andrea; Ricardo, Alexandra; Gaspar, Daniel; Costa, Rita; Parducci, Laura; Vendramin, Giovanni Giuseppe

    2017-01-01

    Mediterranean forests are fragile ecosystems vulnerable to recent global warming and reduction of precipitation, and a long-term negative effect is expected on vegetation with increasing drought and in areas burnt by fires. We investigated the spatial distribution of genetic variation of Arbutus unedo in the western Iberia Peninsula, using plastid markers with conservation and provenance regions design purposes. This species is currently undergoing an intense domestication process in the region, and, like other species, is increasingly under the threat from climate change, habitat fragmentation and wildfires. We sampled 451 trees from 15 natural populations from different ecological conditions spanning the whole species' distribution range in the region. We applied Bayesian analysis and identified four clusters (north, centre, south, and a single-population cluster). Hierarchical AMOVA showed higher differentiation among clusters than among populations within clusters. The relatively low within-clusters differentiation can be explained by a common postglacial history of nearby populations. The genetic structure found, supported by the few available palaeobotanical records, cannot exclude the hypothesis of two independent A. unedo refugia in western Iberia Peninsula during the Last Glacial Maximum. Based on the results we recommend a conservation strategy by selecting populations for conservation based on their allelic richness and diversity and careful seed transfer consistent with current species' genetic structure.

  9. Genetic diversity and divergence at the Arbutus unedo L. (Ericaceae) westernmost distribution limit

    PubMed Central

    Ribeiro, Maria Margarida; Piotti, Andrea; Ricardo, Alexandra; Gaspar, Daniel; Costa, Rita; Parducci, Laura; Vendramin, Giovanni Giuseppe

    2017-01-01

    Mediterranean forests are fragile ecosystems vulnerable to recent global warming and reduction of precipitation, and a long-term negative effect is expected on vegetation with increasing drought and in areas burnt by fires. We investigated the spatial distribution of genetic variation of Arbutus unedo in the western Iberia Peninsula, using plastid markers with conservation and provenance regions design purposes. This species is currently undergoing an intense domestication process in the region, and, like other species, is increasingly under the threat from climate change, habitat fragmentation and wildfires. We sampled 451 trees from 15 natural populations from different ecological conditions spanning the whole species’ distribution range in the region. We applied Bayesian analysis and identified four clusters (north, centre, south, and a single-population cluster). Hierarchical AMOVA showed higher differentiation among clusters than among populations within clusters. The relatively low within-clusters differentiation can be explained by a common postglacial history of nearby populations. The genetic structure found, supported by the few available palaeobotanical records, cannot exclude the hypothesis of two independent A. unedo refugia in western Iberia Peninsula during the Last Glacial Maximum. Based on the results we recommend a conservation strategy by selecting populations for conservation based on their allelic richness and diversity and careful seed transfer consistent with current species’ genetic structure. PMID:28384294

  10. Compulsive buying disorder clustering based on sex, age, onset and personality traits.

    PubMed

    Granero, Roser; Fernández-Aranda, Fernando; Baño, Marta; Steward, Trevor; Mestre-Bach, Gemma; Del Pino-Gutiérrez, Amparo; Moragas, Laura; Mallorquí-Bagué, Núria; Aymamí, Neus; Goméz-Peña, Mónica; Tárrega, Salomé; Menchón, José M; Jiménez-Murcia, Susana

    2016-07-01

    In spite of the revived interest in compulsive buying disorder (CBD), its classification into the contemporary nosologic systems continues to be debated, and scarce studies have addressed heterogeneity in the clinical phenotype through methodologies based on a person-centered approach. To identify empirical clusters of CBD employing personality traits, as well as patients' sex, age and the age of CBD onset as indicators. An agglomerative hierarchical clustering method defining a combination of the Schwarz Bayesian Information Criterion and log-likelihood was used. Three clusters were identified in a sample of n=110 patients attending a specialized CBD unit a) "male compulsive buyers" reported the highest prevalence of comorbid gambling disorder and the lowest levels of reward dependence; b) "female low-dysfunctional" mainly included employed women, with the highest level of education, the oldest age of onset, the lowest scores in harm avoidance and the highest levels of persistence, self-directedness and cooperativeness; and c) "female highly-dysfunctional" with the youngest age of onset, the highest levels of comorbid psychopathology and harm avoidance, and the lowest score in self-directedness. Sociodemographic characteristics and personality traits can be used to determine CBD clusters which represent different clinical subtypes. These subtypes should be considered when developing assessment instruments, preventive programs and treatment interventions. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Umetsu, Keiichi

    2013-05-20

    Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less

  12. A Bayesian approach to meta-analysis of plant pathology studies.

    PubMed

    Mila, A L; Ngugi, H K

    2011-01-01

    Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework. Bayesian meta-analysis can readily include information not easily incorporated in classical methods, and allow for a full evaluation of competing models. Given the power and flexibility of Bayesian methods, we expect them to become widely adopted for meta-analysis of plant pathology studies.

  13. Model selection on solid ground: Rigorous comparison of nine ways to evaluate Bayesian model evidence

    PubMed Central

    Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang

    2014-01-01

    Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272

  14. Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.

    PubMed

    Dettmer, Jan; Dosso, Stan E; Osler, John C

    2010-12-01

    This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.

  15. Efficient Posterior Probability Mapping Using Savage-Dickey Ratios

    PubMed Central

    Penny, William D.; Ridgway, Gerard R.

    2013-01-01

    Statistical Parametric Mapping (SPM) is the dominant paradigm for mass-univariate analysis of neuroimaging data. More recently, a Bayesian approach termed Posterior Probability Mapping (PPM) has been proposed as an alternative. PPM offers two advantages: (i) inferences can be made about effect size thus lending a precise physiological meaning to activated regions, (ii) regions can be declared inactive. This latter facility is most parsimoniously provided by PPMs based on Bayesian model comparisons. To date these comparisons have been implemented by an Independent Model Optimization (IMO) procedure which separately fits null and alternative models. This paper proposes a more computationally efficient procedure based on Savage-Dickey approximations to the Bayes factor, and Taylor-series approximations to the voxel-wise posterior covariance matrices. Simulations show the accuracy of this Savage-Dickey-Taylor (SDT) method to be comparable to that of IMO. Results on fMRI data show excellent agreement between SDT and IMO for second-level models, and reasonable agreement for first-level models. This Savage-Dickey test is a Bayesian analogue of the classical SPM-F and allows users to implement model comparison in a truly interactive manner. PMID:23533640

  16. Bayesian dose-response analysis for epidemiological studies with complex uncertainty in dose estimation.

    PubMed

    Kwon, Deukwoo; Hoffman, F Owen; Moroz, Brian E; Simon, Steven L

    2016-02-10

    Most conventional risk analysis methods rely on a single best estimate of exposure per person, which does not allow for adjustment for exposure-related uncertainty. Here, we propose a Bayesian model averaging method to properly quantify the relationship between radiation dose and disease outcomes by accounting for shared and unshared uncertainty in estimated dose. Our Bayesian risk analysis method utilizes multiple realizations of sets (vectors) of doses generated by a two-dimensional Monte Carlo simulation method that properly separates shared and unshared errors in dose estimation. The exposure model used in this work is taken from a study of the risk of thyroid nodules among a cohort of 2376 subjects who were exposed to fallout from nuclear testing in Kazakhstan. We assessed the performance of our method through an extensive series of simulations and comparisons against conventional regression risk analysis methods. When the estimated doses contain relatively small amounts of uncertainty, the Bayesian method using multiple a priori plausible draws of dose vectors gave similar results to the conventional regression-based methods of dose-response analysis. However, when large and complex mixtures of shared and unshared uncertainties are present, the Bayesian method using multiple dose vectors had significantly lower relative bias than conventional regression-based risk analysis methods and better coverage, that is, a markedly increased capability to include the true risk coefficient within the 95% credible interval of the Bayesian-based risk estimate. An evaluation of the dose-response using our method is presented for an epidemiological study of thyroid disease following radiation exposure. Copyright © 2015 John Wiley & Sons, Ltd.

  17. Theory-based Bayesian models of inductive learning and reasoning.

    PubMed

    Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles

    2006-07-01

    Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.

  18. SensibleSleep: A Bayesian Model for Learning Sleep Patterns from Smartphone Events

    PubMed Central

    Sekara, Vedran; Jonsson, Håkan; Larsen, Jakob Eg; Lehmann, Sune

    2017-01-01

    We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals’ daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armband sleep trackers. We show that the model is able to produce reliable sleep estimates with an accuracy of 0.89, both at the individual and at the collective level. Moreover the Bayesian model is able to quantify uncertainty and encode prior knowledge about sleep patterns. Compared with existing smartphone-based systems, our method requires only screen on/off events, and is therefore much less intrusive in terms of privacy and more battery-efficient. PMID:28076375

  19. SensibleSleep: A Bayesian Model for Learning Sleep Patterns from Smartphone Events.

    PubMed

    Cuttone, Andrea; Bækgaard, Per; Sekara, Vedran; Jonsson, Håkan; Larsen, Jakob Eg; Lehmann, Sune

    2017-01-01

    We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals' daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armband sleep trackers. We show that the model is able to produce reliable sleep estimates with an accuracy of 0.89, both at the individual and at the collective level. Moreover the Bayesian model is able to quantify uncertainty and encode prior knowledge about sleep patterns. Compared with existing smartphone-based systems, our method requires only screen on/off events, and is therefore much less intrusive in terms of privacy and more battery-efficient.

  20. Bayesian mixture modeling for blood sugar levels of diabetes mellitus patients (case study in RSUD Saiful Anwar Malang Indonesia)

    NASA Astrophysics Data System (ADS)

    Budi Astuti, Ani; Iriawan, Nur; Irhamah; Kuswanto, Heri; Sasiarini, Laksmi

    2017-10-01

    Bayesian statistics proposes an approach that is very flexible in the number of samples and distribution of data. Bayesian Mixture Model (BMM) is a Bayesian approach for multimodal models. Diabetes Mellitus (DM) is more commonly known in the Indonesian community as sweet pee. This disease is one type of chronic non-communicable diseases but it is very dangerous to humans because of the effects of other diseases complications caused. WHO reports in 2013 showed DM disease was ranked 6th in the world as the leading causes of human death. In Indonesia, DM disease continues to increase over time. These research would be studied patterns and would be built the BMM models of the DM data through simulation studies where the simulation data built on cases of blood sugar levels of DM patients in RSUD Saiful Anwar Malang. The results have been successfully demonstrated pattern of distribution of the DM data which has a normal mixture distribution. The BMM models have succeed to accommodate the real condition of the DM data based on the data driven concept.

  1. Bayesian Model Averaging for Propensity Score Analysis

    ERIC Educational Resources Information Center

    Kaplan, David; Chen, Jianshen

    2013-01-01

    The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…

  2. Textual and visual content-based anti-phishing: a Bayesian approach.

    PubMed

    Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin

    2011-10-01

    A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE

  3. Bayesian experimental design for models with intractable likelihoods.

    PubMed

    Drovandi, Christopher C; Pettitt, Anthony N

    2013-12-01

    In this paper we present a methodology for designing experiments for efficiently estimating the parameters of models with computationally intractable likelihoods. The approach combines a commonly used methodology for robust experimental design, based on Markov chain Monte Carlo sampling, with approximate Bayesian computation (ABC) to ensure that no likelihood evaluations are required. The utility function considered for precise parameter estimation is based upon the precision of the ABC posterior distribution, which we form efficiently via the ABC rejection algorithm based on pre-computed model simulations. Our focus is on stochastic models and, in particular, we investigate the methodology for Markov process models of epidemics and macroparasite population evolution. The macroparasite example involves a multivariate process and we assess the loss of information from not observing all variables. © 2013, The International Biometric Society.

  4. Application of a data-mining method based on Bayesian networks to lesion-deficit analysis

    NASA Technical Reports Server (NTRS)

    Herskovits, Edward H.; Gerring, Joan P.

    2003-01-01

    Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.

  5. Nonlinear and non-Gaussian Bayesian based handwriting beautification

    NASA Astrophysics Data System (ADS)

    Shi, Cao; Xiao, Jianguo; Xu, Canhui; Jia, Wenhua

    2013-03-01

    A framework is proposed in this paper to effectively and efficiently beautify handwriting by means of a novel nonlinear and non-Gaussian Bayesian algorithm. In the proposed framework, format and size of handwriting image are firstly normalized, and then typeface in computer system is applied to optimize vision effect of handwriting. The Bayesian statistics is exploited to characterize the handwriting beautification process as a Bayesian dynamic model. The model parameters to translate, rotate and scale typeface in computer system are controlled by state equation, and the matching optimization between handwriting and transformed typeface is employed by measurement equation. Finally, the new typeface, which is transformed from the original one and gains the best nonlinear and non-Gaussian optimization, is the beautification result of handwriting. Experimental results demonstrate the proposed framework provides a creative handwriting beautification methodology to improve visual acceptance.

  6. Logistic Stick-Breaking Process

    PubMed Central

    Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.

    2013-01-01

    A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593

  7. A BAYESIAN SPATIAL AND TEMPORAL MODELING APPROACH TO MAPPING GEOGRAPHIC VARIATION IN MORTALITY RATES FOR SUBNATIONAL AREAS WITH R-INLA.

    PubMed

    Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret

    2018-01-01

    Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.

  8. Comparing Families of Dynamic Causal Models

    PubMed Central

    Penny, Will D.; Stephan, Klaas E.; Daunizeau, Jean; Rosa, Maria J.; Friston, Karl J.; Schofield, Thomas M.; Leff, Alex P.

    2010-01-01

    Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data. PMID:20300649

  9. Dynamic Latent Trait Models with Mixed Hidden Markov Structure for Mixed Longitudinal Outcomes.

    PubMed

    Zhang, Yue; Berhane, Kiros

    2016-01-01

    We propose a general Bayesian joint modeling approach to model mixed longitudinal outcomes from the exponential family for taking into account any differential misclassification that may exist among categorical outcomes. Under this framework, outcomes observed without measurement error are related to latent trait variables through generalized linear mixed effect models. The misclassified outcomes are related to the latent class variables, which represent unobserved real states, using mixed hidden Markov models (MHMM). In addition to enabling the estimation of parameters in prevalence, transition and misclassification probabilities, MHMMs capture cluster level heterogeneity. A transition modeling structure allows the latent trait and latent class variables to depend on observed predictors at the same time period and also on latent trait and latent class variables at previous time periods for each individual. Simulation studies are conducted to make comparisons with traditional models in order to illustrate the gains from the proposed approach. The new approach is applied to data from the Southern California Children Health Study (CHS) to jointly model questionnaire based asthma state and multiple lung function measurements in order to gain better insight about the underlying biological mechanism that governs the inter-relationship between asthma state and lung function development.

  10. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  11. Bayesian state space models for dynamic genetic network construction across multiple tissues.

    PubMed

    Liang, Yulan; Kelemen, Arpad

    2016-08-01

    Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.

  12. Differential Gene Expression (DEX) and Alternative Splicing Events (ASE) for Temporal Dynamic Processes Using HMMs and Hierarchical Bayesian Modeling Approaches.

    PubMed

    Oh, Sunghee; Song, Seongho

    2017-01-01

    In gene expression profile, data analysis pipeline is categorized into four levels, major downstream tasks, i.e., (1) identification of differential expression; (2) clustering co-expression patterns; (3) classification of subtypes of samples; and (4) detection of genetic regulatory networks, are performed posterior to preprocessing procedure such as normalization techniques. To be more specific, temporal dynamic gene expression data has its inherent feature, namely, two neighboring time points (previous and current state) are highly correlated with each other, compared to static expression data which samples are assumed as independent individuals. In this chapter, we demonstrate how HMMs and hierarchical Bayesian modeling methods capture the horizontal time dependency structures in time series expression profiles by focusing on the identification of differential expression. In addition, those differential expression genes and transcript variant isoforms over time detected in core prerequisite steps can be generally further applied in detection of genetic regulatory networks to comprehensively uncover dynamic repertoires in the aspects of system biology as the coupled framework.

  13. Impact of censoring on learning Bayesian networks in survival modelling.

    PubMed

    Stajduhar, Ivan; Dalbelo-Basić, Bojana; Bogunović, Nikola

    2009-11-01

    Bayesian networks are commonly used for presenting uncertainty and covariate interactions in an easily interpretable way. Because of their efficient inference and ability to represent causal relationships, they are an excellent choice for medical decision support systems in diagnosis, treatment, and prognosis. Although good procedures for learning Bayesian networks from data have been defined, their performance in learning from censored survival data has not been widely studied. In this paper, we explore how to use these procedures to learn about possible interactions between prognostic factors and their influence on the variate of interest. We study how censoring affects the probability of learning correct Bayesian network structures. Additionally, we analyse the potential usefulness of the learnt models for predicting the time-independent probability of an event of interest. We analysed the influence of censoring with a simulation on synthetic data sampled from randomly generated Bayesian networks. We used two well-known methods for learning Bayesian networks from data: a constraint-based method and a score-based method. We compared the performance of each method under different levels of censoring to those of the naive Bayes classifier and the proportional hazards model. We did additional experiments on several datasets from real-world medical domains. The machine-learning methods treated censored cases in the data as event-free. We report and compare results for several commonly used model evaluation metrics. On average, the proportional hazards method outperformed other methods in most censoring setups. As part of the simulation study, we also analysed structural similarities of the learnt networks. Heavy censoring, as opposed to no censoring, produces up to a 5% surplus and up to 10% missing total arcs. It also produces up to 50% missing arcs that should originally be connected to the variate of interest. Presented methods for learning Bayesian networks from data can be used to learn from censored survival data in the presence of light censoring (up to 20%) by treating censored cases as event-free. Given intermediate or heavy censoring, the learnt models become tuned to the majority class and would thus require a different approach.

  14. Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad

    2016-02-01

    Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less

  15. An approach based on Hierarchical Bayesian Graphical Models for measurement interpretation under uncertainty

    NASA Astrophysics Data System (ADS)

    Skataric, Maja; Bose, Sandip; Zeroug, Smaine; Tilke, Peter

    2017-02-01

    It is not uncommon in the field of non-destructive evaluation that multiple measurements encompassing a variety of modalities are available for analysis and interpretation for determining the underlying states of nature of the materials or parts being tested. Despite and sometimes due to the richness of data, significant challenges arise in the interpretation manifested as ambiguities and inconsistencies due to various uncertain factors in the physical properties (inputs), environment, measurement device properties, human errors, and the measurement data (outputs). Most of these uncertainties cannot be described by any rigorous mathematical means, and modeling of all possibilities is usually infeasible for many real time applications. In this work, we will discuss an approach based on Hierarchical Bayesian Graphical Models (HBGM) for the improved interpretation of complex (multi-dimensional) problems with parametric uncertainties that lack usable physical models. In this setting, the input space of the physical properties is specified through prior distributions based on domain knowledge and expertise, which are represented as Gaussian mixtures to model the various possible scenarios of interest for non-destructive testing applications. Forward models are then used offline to generate the expected distribution of the proposed measurements which are used to train a hierarchical Bayesian network. In Bayesian analysis, all model parameters are treated as random variables, and inference of the parameters is made on the basis of posterior distribution given the observed data. Learned parameters of the posterior distribution obtained after the training can therefore be used to build an efficient classifier for differentiating new observed data in real time on the basis of pre-trained models. We will illustrate the implementation of the HBGM approach to ultrasonic measurements used for cement evaluation of cased wells in the oil industry.

  16. Detecting introgressive hybridization between free-ranging domestic dogs and wild wolves (Canis lupus) by admixture linkage disequilibrium analysis.

    PubMed

    Verardi, A; Lucchini, V; Randi, E

    2006-09-01

    Occasional crossbreeding between free-ranging domestic dogs and wild wolves (Canis lupus) has been detected in some European countries by mitochondrial DNA sequencing and genotyping unlinked microsatellite loci. Maternal and unlinked genomic markers, however, might underestimate the extent of introgressive hybridization, and their impacts on the preservation of wild wolf gene pools. In this study, we genotyped 220 presumed Italian wolves, 85 dogs and 7 known hybrids at 16 microsatellites belonging to four different linkage groups (plus four unlinked microsatellites). Population clustering and individual assignments were performed using a Bayesian procedure implemented in structure 2.1, which models the gametic disequilibrium arising between linked loci during admixtures, aiming to trace hybridization events further back in time and infer the population of origin of chromosomal blocks. Results indicate that (i) linkage disequilibrium was higher in wolves than in dogs; (ii) 11 out of 220 wolves (5.0%) were likely admixed, a proportion that is significantly higher than one admixed genotype in 107 wolves found previously in a study using unlinked markers; (iii) posterior maximum-likelihood estimates of the recombination parameter r revealed that introgression in Italian wolves is not recent, but could have continued for the last 70 (+/- 20) generations, corresponding to approximately 140-210 years. Bayesian clustering showed that, despite some admixture, wolf and dog gene pools remain sharply distinct (the average proportions of membership to wolf and dog clusters were Q(w) = 0.95 and Q(d) = 0.98, respectively), suggesting that hybridization was not frequent, and that introgression in nature is counteracted by behavioural or selective constraints.

  17. Is the cluster environment quenching the Seyfert activity in elliptical and spiral galaxies?

    NASA Astrophysics Data System (ADS)

    de Souza, R. S.; Dantas, M. L. L.; Krone-Martins, A.; Cameron, E.; Coelho, P.; Hattab, M. W.; de Val-Borro, M.; Hilbe, J. M.; Elliott, J.; Hagen, A.; COIN Collaboration

    2016-09-01

    We developed a hierarchical Bayesian model (HBM) to investigate how the presence of Seyfert activity relates to their environment, herein represented by the galaxy cluster mass, M200, and the normalized cluster centric distance, r/r200. We achieved this by constructing an unbiased sample of galaxies from the Sloan Digital Sky Survey, with morphological classifications provided by the Galaxy Zoo Project. A propensity score matching approach is introduced to control the effects of confounding variables: stellar mass, galaxy colour, and star formation rate. The connection between Seyfert-activity and environmental properties in the de-biased sample is modelled within an HBM framework using the so-called logistic regression technique, suitable for the analysis of binary data (e.g. whether or not a galaxy hosts an AGN). Unlike standard ordinary least square fitting methods, our methodology naturally allows modelling the probability of Seyfert-AGN activity in galaxies on their natural scale, I.e. as a binary variable. Furthermore, we demonstrate how an HBM can incorporate information of each particular galaxy morphological type in an unified framework. In elliptical galaxies our analysis indicates a strong correlation of Seyfert-AGN activity with r/r200, and a weaker correlation with the mass of the host cluster. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.

  18. Patterns of glaucomatous visual field loss in sita fields automatically identified using independent component analysis.

    PubMed

    Goldbaum, Michael H; Jang, Gil-Jin; Bowd, Chris; Hao, Jiucang; Zangwill, Linda M; Liebmann, Jeffrey; Girkin, Christopher; Jung, Tzyy-Ping; Weinreb, Robert N; Sample, Pamela A

    2009-12-01

    To determine if the patterns uncovered with variational Bayesian-independent component analysis-mixture model (VIM) applied to a large set of normal and glaucomatous fields obtained with the Swedish Interactive Thresholding Algorithm (SITA) are distinct, recognizable, and useful for modeling the severity of the field loss. SITA fields were obtained with the Humphrey Visual Field Analyzer (Carl Zeiss Meditec, Inc, Dublin, California) on 1,146 normal eyes and 939 glaucoma eyes from subjects followed by the Diagnostic Innovations in Glaucoma Study and the African Descent and Glaucoma Evaluation Study. VIM modifies independent component analysis (ICA) to develop separate sets of ICA axes in the cluster of normal fields and the 2 clusters of abnormal fields. Of 360 models, the model with the best separation of normal and glaucomatous fields was chosen for creating the maximally independent axes. Grayscale displays of fields generated by VIM on each axis were compared. SITA fields most closely associated with each axis and displayed in grayscale were evaluated for consistency of pattern at all severities. The best VIM model had 3 clusters. Cluster 1 (1,193) was mostly normal (1,089, 95% specificity) and had 2 axes. Cluster 2 (596) contained mildly abnormal fields (513) and 2 axes; cluster 3 (323) held mostly moderately to severely abnormal fields (322) and 5 axes. Sensitivity for clusters 2 and 3 combined was 88.9%. The VIM-generated field patterns differed from each other and resembled glaucomatous defects (eg, nasal step, arcuate, temporal wedge). SITA fields assigned to an axis resembled each other and the VIM-generated patterns for that axis. Pattern severity increased in the positive direction of each axis by expansion or deepening of the axis pattern. VIM worked well on SITA fields, separating them into distinctly different yet recognizable patterns of glaucomatous field defects. The axis and pattern properties make VIM a good candidate as a preliminary process for detecting progression.

  19. Photometrically-derived properties of massive-star clusters obtained with different massive-star evolution tracks and deterministic models

    NASA Astrophysics Data System (ADS)

    Wofford, Aida; Charlot, Stéphane; Eldridge, John

    2015-08-01

    We compute libraries of stellar + nebular spectra of populations of coeval stars with ages of <100 Myr and metallicities of Z=0.001 to 0.040, using different sets of massive-star evolution tracks, i.e., new Padova tracks for single non-rotating stars, the Geneva tracks for single non-rotating and rotating stars, and the Auckland tracks for single non-rotating and binary stars. For the stellar component, we use population synthesis codes galaxev, starburst99, and BPASS, depending on the set of tracks. For the nebular component we use photoionization code cloudy. From these spectra, we obtain magnitudes in filters F275W, F336W, F438W, F547M, F555W, F657N, and F814W of the Hubble Space Telescope (HST) Wide Field Camera Three. We use i) our computed magnitudes, ii) new multi-band photometry of massive-star clusters in nearby (<11 Mpc) galaxies spanning the metallicity range 12+log(O/H)=7.2-9.2, observed as part of HST programs 13364 (PI Calzetti) and 13773 (PI Chandar), and iii) Bayesian inference to a) establish how well the different models are able to constrain the metallicities, extinctions, ages, and masses of the star clusters, b) quantify differences in the cluster properties obtained with the different models, and c) assess how properties of lower-mass clusters are affected by the stochastic sampling of the IMF. In our models, the stellar evolution tracks, stellar atmospheres, and nebulae have similar chemical compositions. Different metallicities are available with different sets of tracks and we compare results from models of similar metallicities. Our results have implications for studies of the formation and evolution of star clusters, the cluster age and mass functions, and the star formation histories of galaxies.

  20. Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models

    NASA Astrophysics Data System (ADS)

    Xia, Wei; Dai, Xiao-Xia; Feng, Yuan

    2015-12-01

    When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).

  1. A new approach for handling longitudinal count data with zero-inflation and overdispersion: poisson geometric process model.

    PubMed

    Wan, Wai-Yin; Chan, Jennifer S K

    2009-08-01

    For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero-altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).

  2. Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry.

    PubMed

    Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

    2010-11-01

    In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000-15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert's visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.

  3. Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry

    PubMed Central

    Zhang, Jianqiu; Zhou, Xiaobo; Wang, Honghui; Suffredini, Anthony; Zhang, Lin; Huang, Yufei; Wong, Stephen

    2011-01-01

    In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition. PMID:21544266

  4. Time series forecasting using ERNN and QR based on Bayesian model averaging

    NASA Astrophysics Data System (ADS)

    Pwasong, Augustine; Sathasivam, Saratha

    2017-08-01

    The Bayesian model averaging technique is a multi-model combination technique. The technique was employed to amalgamate the Elman recurrent neural network (ERNN) technique with the quadratic regression (QR) technique. The amalgamation produced a hybrid technique known as the hybrid ERNN-QR technique. The potentials of forecasting with the hybrid technique are compared with the forecasting capabilities of individual techniques of ERNN and QR. The outcome revealed that the hybrid technique is superior to the individual techniques in the mean square error sense.

  5. Congruent population structure inferred from dispersal behaviour and intensive genetic surveys of the threatened Florida scrub-jay (Aphelocoma cœrulescens)

    USGS Publications Warehouse

    Coulon, A.; Fitzpatrick, J.W.; Bowman, R.; Stith, B.M.; Makarewich, C.A.; Stenzler, L.M.; Lovette, I.J.

    2008-01-01

    The delimitation of populations, defined as groups of individuals linked by gene flow, is possible by the analysis of genetic markers and also by spatial models based on dispersal probabilities across a landscape. We combined these two complimentary methods to define the spatial pattern of genetic structure among remaining populations of the threatened Florida scrub-jay, a species for which dispersal ability is unusually well-characterized. The range-wide population was intensively censused in the 1990s, and a metapopulation model defined population boundaries based on predicted dispersal-mediated demographic connectivity. We subjected genotypes from more than 1000 individual jays screened at 20 microsatellite loci to two Bayesian clustering methods. We describe a consensus method for identifying common features across many replicated clustering runs. Ten genetically differentiated groups exist across the present-day range of the Florida scrub-jay. These groups are largely consistent with the dispersal-defined metapopulations, which assume very limited dispersal ability. Some genetic groups comprise more than one metapopulation, likely because these genetically similar metapopulations were sundered only recently by habitat alteration. The combined reconstructions of population structure based on genetics and dispersal-mediated demographic connectivity provide a robust depiction of the current genetic and demographic organization of this species, reflecting past and present levels of dispersal among occupied habitat patches. The differentiation of populations into 10 genetic groups adds urgency to management efforts aimed at preserving what remains of genetic variation in this dwindling species, by maintaining viable populations of all genetically differentiated and geographically isolated populations.

  6. A new Bayesian recursive technique for parameter estimation

    NASA Astrophysics Data System (ADS)

    Kaheil, Yasir H.; Gill, M. Kashif; McKee, Mac; Bastidas, Luis

    2006-08-01

    The performance of any model depends on how well its associated parameters are estimated. In the current application, a localized Bayesian recursive estimation (LOBARE) approach is devised for parameter estimation. The LOBARE methodology is an extension of the Bayesian recursive estimation (BARE) method. It is applied in this paper on two different types of models: an artificial intelligence (AI) model in the form of a support vector machine (SVM) application for forecasting soil moisture and a conceptual rainfall-runoff (CRR) model represented by the Sacramento soil moisture accounting (SAC-SMA) model. Support vector machines, based on statistical learning theory (SLT), represent the modeling task as a quadratic optimization problem and have already been used in various applications in hydrology. They require estimation of three parameters. SAC-SMA is a very well known model that estimates runoff. It has a 13-dimensional parameter space. In the LOBARE approach presented here, Bayesian inference is used in an iterative fashion to estimate the parameter space that will most likely enclose a best parameter set. This is done by narrowing the sampling space through updating the "parent" bounds based on their fitness. These bounds are actually the parameter sets that were selected by BARE runs on subspaces of the initial parameter space. The new approach results in faster convergence toward the optimal parameter set using minimum training/calibration data and fewer sets of parameter values. The efficacy of the localized methodology is also compared with the previously used BARE algorithm.

  7. Evidence of new species for malaria vector Anopheles nuneztovari sensu lato in the Brazilian Amazon region.

    PubMed

    Scarpassa, Vera Margarete; Cunha-Machado, Antonio Saulo; Saraiva, José Ferreira

    2016-04-12

    Anopheles nuneztovari sensu lato comprises cryptic species in northern South America, and the Brazilian populations encompass distinct genetic lineages within the Brazilian Amazon region. This study investigated, based on two molecular markers, whether these lineages might actually deserve species status. Specimens were collected in five localities of the Brazilian Amazon, including Manaus, Careiro Castanho and Autazes, in the State of Amazonas; Tucuruí, in the State of Pará; and Abacate da Pedreira, in the State of Amapá, and analysed for the COI gene (Barcode region) and 12 microsatellite loci. Phylogenetic analyses were performed using the maximum likelihood (ML) approach. Intra and inter samples genetic diversity were estimated using population genetics analyses, and the genetic groups were identified by means of the ML, Bayesian and factorial correspondence analyses and the Bayesian analysis of population structure. The Barcode region dataset (N = 103) generated 27 haplotypes. The haplotype network suggested three lineages. The ML tree retrieved five monophyletic groups. Group I clustered all specimens from Manaus and Careiro Castanho, the majority of Autazes and a few from Abacate da Pedreira. Group II clustered most of the specimens from Abacate da Pedreira and a few from Autazes and Tucuruí. Group III clustered only specimens from Tucuruí (lineage III), strongly supported (97 %). Groups IV and V clustered specimens of A. nuneztovari s.s. and A. dunhami, strongly (98 %) and weakly (70 %) supported, respectively. In the second phylogenetic analysis, the sequences from GenBank, identified as A. goeldii, clustered to groups I and II, but not to group III. Genetic distances (Kimura-2 parameters) among the groups ranged from 1.60 % (between I and II) to 2.32 % (between I and III). Microsatellite data revealed very high intra-population genetic variability. Genetic distances showed the highest and significant values (P = 0.005) between Tucuruí and all the other samples, and between Abacate da Pedreira and all the other samples. Genetic distances, Bayesian (Structure and BAPS) analyses and FCA suggested three distinct biological groups, supporting the barcode region results. The two markers revealed three genetic lineages for A. nuneztovari s.l. in the Brazilian Amazon region. Lineages I and II may represent genetically distinct groups or species within A. goeldii. Lineage III may represent a new species, distinct from the A. goeldii group, and may be the most ancestral in the Brazilian Amazon. They may have differences in Plasmodium susceptibility and should therefore be investigated further.

  8. Improving satellite-based PM2.5 estimates in China using Gaussian processes modeling in a Bayesian hierarchical setting.

    PubMed

    Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun

    2017-08-01

    Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2  = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.

  9. Neuron’s eye view: Inferring features of complex stimuli from neural responses

    PubMed Central

    Chen, Xin; Beck, Jeffrey M.

    2017-01-01

    Experiments that study neural encoding of stimuli at the level of individual neurons typically choose a small set of features present in the world—contrast and luminance for vision, pitch and intensity for sound—and assemble a stimulus set that systematically varies along these dimensions. Subsequent analysis of neural responses to these stimuli typically focuses on regression models, with experimenter-controlled features as predictors and spike counts or firing rates as responses. Unfortunately, this approach requires knowledge in advance about the relevant features coded by a given population of neurons. For domains as complex as social interaction or natural movement, however, the relevant feature space is poorly understood, and an arbitrary a priori choice of features may give rise to confirmation bias. Here, we present a Bayesian model for exploratory data analysis that is capable of automatically identifying the features present in unstructured stimuli based solely on neuronal responses. Our approach is unique within the class of latent state space models of neural activity in that it assumes that firing rates of neurons are sensitive to multiple discrete time-varying features tied to the stimulus, each of which has Markov (or semi-Markov) dynamics. That is, we are modeling neural activity as driven by multiple simultaneous stimulus features rather than intrinsic neural dynamics. We derive a fast variational Bayesian inference algorithm and show that it correctly recovers hidden features in synthetic data, as well as ground-truth stimulus features in a prototypical neural dataset. To demonstrate the utility of the algorithm, we also apply it to cluster neural responses and demonstrate successful recovery of features corresponding to monkeys and faces in the image set. PMID:28827790

  10. Genetic and morphological characterisation of the Ankole Longhorn cattle in the African Great Lakes region

    PubMed Central

    Ndumu, Deo B; Baumung, Roswitha; Hanotte, Olivier; Wurzinger, Maria; Okeyo, Mwai A; Jianlin, Han; Kibogo, Harrison; Sölkner, Johann

    2008-01-01

    The study investigated the population structure, diversity and differentiation of almost all of the ecotypes representing the African Ankole Longhorn cattle breed on the basis of morphometric (shape and size), genotypic and spatial distance data. Twentyone morphometric measurements were used to describe the morphology of 439 individuals from 11 sub-populations located in five countries around the Great Lakes region of central and eastern Africa. Additionally, 472 individuals were genotyped using 15 DNA microsatellites. Femoral length, horn length, horn circumference, rump height, body length and fore-limb circumference showed the largest differences between regions. An overall FST index indicated that 2.7% of the total genetic variation was present among sub-populations. The least differentiation was observed between the two sub-populations of Mbarara south and Luwero in Uganda, while the highest level of differentiation was observed between the Mugamba in Burundi and Malagarasi in Tanzania. An estimated membership of four for the inferred clusters from a model-based Bayesian approach was obtained. Both analyses on distance-based and model-based methods consistently isolated the Mugamba sub-population in Burundi from the others. PMID:18694545

  11. A Single-Cell Roadmap of Lineage Bifurcation in Human ESC Models of Embryonic Brain Development.

    PubMed

    Yao, Zizhen; Mich, John K; Ku, Sherman; Menon, Vilas; Krostag, Anne-Rachel; Martinez, Refugio A; Furchtgott, Leon; Mulholland, Heather; Bort, Susan; Fuqua, Margaret A; Gregor, Ben W; Hodge, Rebecca D; Jayabalu, Anu; May, Ryan C; Melton, Samuel; Nelson, Angelique M; Ngo, N Kiet; Shapovalova, Nadiya V; Shehata, Soraya I; Smith, Michael W; Tait, Leah J; Thompson, Carol L; Thomsen, Elliot R; Ye, Chaoyang; Glass, Ian A; Kaykas, Ajamete; Yao, Shuyuan; Phillips, John W; Grimley, Joshua S; Levi, Boaz P; Wang, Yanling; Ramanathan, Sharad

    2017-01-05

    During human brain development, multiple signaling pathways generate diverse cell types with varied regional identities. Here, we integrate single-cell RNA sequencing and clonal analyses to reveal lineage trees and molecular signals underlying early forebrain and mid/hindbrain cell differentiation from human embryonic stem cells (hESCs). Clustering single-cell transcriptomic data identified 41 distinct populations of progenitor, neuronal, and non-neural cells across our differentiation time course. Comparisons with primary mouse and human gene expression data demonstrated rostral and caudal progenitor and neuronal identities from early brain development. Bayesian analyses inferred a unified cell-type lineage tree that bifurcates between cortical and mid/hindbrain cell types. Two methods of clonal analyses confirmed these findings and further revealed the importance of Wnt/β-catenin signaling in controlling this lineage decision. Together, these findings provide a rich transcriptome-based lineage map for studying human brain development and modeling developmental disorders. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Genetic Structure of Bluefin Tuna in the Mediterranean Sea Correlates with Environmental Variables

    PubMed Central

    Riccioni, Giulia; Stagioni, Marco; Landi, Monica; Ferrara, Giorgia; Barbujani, Guido; Tinti, Fausto

    2013-01-01

    Background Atlantic Bluefin Tuna (ABFT) shows complex demography and ecological variation in the Mediterranean Sea. Genetic surveys have detected significant, although weak, signals of population structuring; catch series analyses and tagging programs identified complex ABFT spatial dynamics and migration patterns. Here, we tested the hypothesis that the genetic structure of the ABFT in the Mediterranean is correlated with mean surface temperature and salinity. Methodology We used six samples collected from Western and Central Mediterranean integrated with a new sample collected from the recently identified easternmost reproductive area of Levantine Sea. To assess population structure in the Mediterranean we used a multidisciplinary framework combining classical population genetics, spatial and Bayesian clustering methods and a multivariate approach based on factor analysis. Conclusions FST analysis and Bayesian clustering methods detected several subpopulations in the Mediterranean, a result also supported by multivariate analyses. In addition, we identified significant correlations of genetic diversity with mean salinity and surface temperature values revealing that ABFT is genetically structured along two environmental gradients. These results suggest that a preference for some spawning habitat conditions could contribute to shape ABFT genetic structuring in the Mediterranean. However, further studies should be performed to assess to what extent ABFT spawning behaviour in the Mediterranean Sea can be affected by environmental variation. PMID:24260341

  13. Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes.

    PubMed

    Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon

    2017-12-01

    Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes. © 2017, The International Biometric Society.

  14. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  15. Predicting BCI subject performance using probabilistic spatio-temporal filters.

    PubMed

    Suk, Heung-Il; Fazli, Siamac; Mehnert, Jan; Müller, Klaus-Robert; Lee, Seong-Whan

    2014-01-01

    Recently, spatio-temporal filtering to enhance decoding for Brain-Computer-Interfacing (BCI) has become increasingly popular. In this work, we discuss a novel, fully Bayesian-and thereby probabilistic-framework, called Bayesian Spatio-Spectral Filter Optimization (BSSFO) and apply it to a large data set of 80 non-invasive EEG-based BCI experiments. Across the full frequency range, the BSSFO framework allows to analyze which spatio-spectral parameters are common and which ones differ across the subject population. As expected, large variability of brain rhythms is observed between subjects. We have clustered subjects according to similarities in their corresponding spectral characteristics from the BSSFO model, which is found to reflect their BCI performances well. In BCI, a considerable percentage of subjects is unable to use a BCI for communication, due to their missing ability to modulate their brain rhythms-a phenomenon sometimes denoted as BCI-illiteracy or inability. Predicting individual subjects' performance preceding the actual, time-consuming BCI-experiment enhances the usage of BCIs, e.g., by detecting users with BCI inability. This work additionally contributes by using the novel BSSFO method to predict the BCI-performance using only 2 minutes and 3 channels of resting-state EEG data recorded before the actual BCI-experiment. Specifically, by grouping the individual frequency characteristics we have nicely classified them into the subject 'prototypes' (like μ - or β -rhythm type subjects) or users without ability to communicate with a BCI, and then by further building a linear regression model based on the grouping we could predict subjects' performance with the maximum correlation coefficient of 0.581 with the performance later seen in the actual BCI session.

  16. Applications of Bayesian spectrum representation in acoustics

    NASA Astrophysics Data System (ADS)

    Botts, Jonathan M.

    This dissertation utilizes a Bayesian inference framework to enhance the solution of inverse problems where the forward model maps to acoustic spectra. A Bayesian solution to filter design inverts a acoustic spectra to pole-zero locations of a discrete-time filter model. Spatial sound field analysis with a spherical microphone array is a data analysis problem that requires inversion of spatio-temporal spectra to directions of arrival. As with many inverse problems, a probabilistic analysis results in richer solutions than can be achieved with ad-hoc methods. In the filter design problem, the Bayesian inversion results in globally optimal coefficient estimates as well as an estimate the most concise filter capable of representing the given spectrum, within a single framework. This approach is demonstrated on synthetic spectra, head-related transfer function spectra, and measured acoustic reflection spectra. The Bayesian model-based analysis of spatial room impulse responses is presented as an analogous problem with equally rich solution. The model selection mechanism provides an estimate of the number of arrivals, which is necessary to properly infer the directions of simultaneous arrivals. Although, spectrum inversion problems are fairly ubiquitous, the scope of this dissertation has been limited to these two and derivative problems. The Bayesian approach to filter design is demonstrated on an artificial spectrum to illustrate the model comparison mechanism and then on measured head-related transfer functions to show the potential range of application. Coupled with sampling methods, the Bayesian approach is shown to outperform least-squares filter design methods commonly used in commercial software, confirming the need for a global search of the parameter space. The resulting designs are shown to be comparable to those that result from global optimization methods, but the Bayesian approach has the added advantage of a filter length estimate within the same unified framework. The application to reflection data is useful for representing frequency-dependent impedance boundaries in finite difference acoustic simulations. Furthermore, since the filter transfer function is a parametric model, it can be modified to incorporate arbitrary frequency weighting and account for the band-limited nature of measured reflection spectra. Finally, the model is modified to compensate for dispersive error in the finite difference simulation, from the filter design process. Stemming from the filter boundary problem, the implementation of pressure sources in finite difference simulation is addressed in order to assure that schemes properly converge. A class of parameterized source functions is proposed and shown to offer straightforward control of residual error in the simulation. Guided by the notion that the solution to be approximated affects the approximation error, sources are designed which reduce residual dispersive error to the size of round-off errors. The early part of a room impulse response can be characterized by a series of isolated plane waves. Measured with an array of microphones, plane waves map to a directional response of the array or spatial intensity map. Probabilistic inversion of this response results in estimates of the number and directions of image source arrivals. The model-based inversion is shown to avoid ambiguities associated with peak-finding or inspection of the spatial intensity map. For this problem, determining the number of arrivals in a given frame is critical for properly inferring the state of the sound field. This analysis is effectively compression of the spatial room response, which is useful for analysis or encoding of the spatial sound field. Parametric, model-based formulations of these problems enhance the solution in all cases, and a Bayesian interpretation provides a principled approach to model comparison and parameter estimation. v

  17. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  18. Bayesian transformation cure frailty models with multivariate failure time data.

    PubMed

    Yin, Guosheng

    2008-12-10

    We propose a class of transformation cure frailty models to accommodate a survival fraction in multivariate failure time data. Established through a general power transformation, this family of cure frailty models includes the proportional hazards and the proportional odds modeling structures as two special cases. Within the Bayesian paradigm, we obtain the joint posterior distribution and the corresponding full conditional distributions of the model parameters for the implementation of Gibbs sampling. Model selection is based on the conditional predictive ordinate statistic and deviance information criterion. As an illustration, we apply the proposed method to a real data set from dentistry.

  19. Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula

    NASA Astrophysics Data System (ADS)

    Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.

    2016-03-01

    A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.

  20. Nonparametric Bayesian models through probit stick-breaking processes

    PubMed Central

    Rodríguez, Abel; Dunson, David B.

    2013-01-01

    We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology. PMID:24358072

  1. Nonparametric Bayesian models through probit stick-breaking processes.

    PubMed

    Rodríguez, Abel; Dunson, David B

    2011-03-01

    We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.

  2. Bayesian calibration for forensic age estimation.

    PubMed

    Ferrante, Luigi; Skrami, Edlira; Gesuita, Rosaria; Cameriere, Roberto

    2015-05-10

    Forensic medicine is increasingly called upon to assess the age of individuals. Forensic age estimation is mostly required in relation to illegal immigration and identification of bodies or skeletal remains. A variety of age estimation methods are based on dental samples and use of regression models, where the age of an individual is predicted by morphological tooth changes that take place over time. From the medico-legal point of view, regression models, with age as the dependent random variable entail that age tends to be overestimated in the young and underestimated in the old. To overcome this bias, we describe a new full Bayesian calibration method (asymmetric Laplace Bayesian calibration) for forensic age estimation that uses asymmetric Laplace distribution as the probability model. The method was compared with three existing approaches (two Bayesian and a classical method) using simulated data. Although its accuracy was comparable with that of the other methods, the asymmetric Laplace Bayesian calibration appears to be significantly more reliable and robust in case of misspecification of the probability model. The proposed method was also applied to a real dataset of values of the pulp chamber of the right lower premolar measured on x-ray scans of individuals of known age. Copyright © 2015 John Wiley & Sons, Ltd.

  3. The First Comprehensive Phylogeny of Coptis (Ranunculaceae) and Its Implications for Character Evolution and Classification

    PubMed Central

    Xiang, Kun-Li; Wu, Sheng-Dan; Yu, Sheng-Xian; Liu, Yang; Jabbour, Florian; Erst, Andrey S.; Zhao, Liang; Wang, Wei; Chen, Zhi-Duan

    2016-01-01

    Coptis (Ranunculaceae) contains 15 species and is one of the pharmaceutically most important plant genera in eastern Asia. Understanding of the evolution of morphological characters and phylogenetic relationships within the genus is very limited. Here, we present the first comprehensive phylogenetic analysis of the genus based on two plastid and one nuclear markers. The phylogeny was reconstructed using Bayesian inference, as well as maximum parsimony and maximum likelihood methods. The Swofford-Olsen-Waddell-Hillis and Bayesian tests were used to assess the strength of the conflicts between traditional taxonomic units and those suggested by the phylogenetic inferences. Evolution of morphological characters was inferred using Bayesian method to identify synapomorphies for the infrageneric lineages. Our data recognize two strongly supported clades within Coptis. The first clade contains subgenus Coptis and section Japonocoptis of subgenus Metacoptis, supported by morphological characters, such as traits of the central leaflet base, petal color, and petal shape. The second clade consists of section Japonocoptis of subgenus Metacoptis. Coptis morii is not united with C. quinquefolia, in contrast with the view that C. morii is a synonym of C. quinquefolia. Two varieties of C. chinensis do not cluster together. Coptis groenlandica and C. lutescens are reduced to C. trifolia and C. japonica, respectively. Central leaflet base, sepal shape, and petal blade carry a strong phylogenetic signal in Coptis, while leaf type, sepal and petal color, and petal shape exhibit relatively higher levels of evolutionary flexibility. PMID:27044035

  4. Analysis of Spin Financial Market by GARCH Model

    NASA Astrophysics Data System (ADS)

    Takaishi, Tetsuya

    2013-08-01

    A spin model is used for simulations of financial markets. To determine return volatility in the spin financial market we use the GARCH model often used for volatility estimation in empirical finance. We apply the Bayesian inference performed by the Markov Chain Monte Carlo method to the parameter estimation of the GARCH model. It is found that volatility determined by the GARCH model exhibits "volatility clustering" also observed in the real financial markets. Using volatility determined by the GARCH model we examine the mixture-of-distribution hypothesis (MDH) suggested for the asset return dynamics. We find that the returns standardized by volatility are approximately standard normal random variables. Moreover we find that the absolute standardized returns show no significant autocorrelation. These findings are consistent with the view of the MDH for the return dynamics.

  5. Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring.

    PubMed

    Carroll, Carlos; Johnson, Devin S; Dunk, Jeffrey R; Zielinski, William J

    2010-12-01

    Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data's spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and invertebrate taxa of conservation concern (Church's sideband snails [Monadenia churchi], red tree voles [Arborimus longicaudus], and Pacific fishers [Martes pennanti pacifica]) that provide examples of a range of distributional extents and dispersal abilities. We used presence-absence data derived from regional monitoring programs to develop models with both landscape and site-level environmental covariates. We used Markov chain Monte Carlo algorithms and a conditional autoregressive or intrinsic conditional autoregressive model framework to fit spatial models. The fit of Bayesian spatial models was between 35 and 55% better than the fit of nonspatial analogue models. Bayesian spatial models outperformed analogous models developed with maximum entropy (Maxent) methods. Although the best spatial and nonspatial models included similar environmental variables, spatial models provided estimates of residual spatial effects that suggested how ecological processes might structure distribution patterns. Spatial models built from presence-absence data improved fit most for localized endemic species with ranges constrained by poorly known biogeographic factors and for widely distributed species suspected to be strongly affected by unmeasured environmental variables or population processes. By treating spatial effects as a variable of interest rather than a nuisance, hierarchical Bayesian spatial models, especially when they are based on a common broad-scale spatial lattice (here the national Forest Inventory and Analysis grid of 24 km(2) hexagons), can increase the relevance of habitat models to multispecies conservation planning. Journal compilation © 2010 Society for Conservation Biology. No claim to original US government works.

  6. A label field fusion bayesian model and its penalized maximum rand estimator for image segmentation.

    PubMed

    Mignotte, Max

    2010-06-01

    This paper presents a novel segmentation approach based on a Markov random field (MRF) fusion model which aims at combining several segmentation results associated with simpler clustering models in order to achieve a more reliable and accurate segmentation result. The proposed fusion model is derived from the recently introduced probabilistic Rand measure for comparing one segmentation result to one or more manual segmentations of the same image. This non-parametric measure allows us to easily derive an appealing fusion model of label fields, easily expressed as a Gibbs distribution, or as a nonstationary MRF model defined on a complete graph. Concretely, this Gibbs energy model encodes the set of binary constraints, in terms of pairs of pixel labels, provided by each segmentation results to be fused. Combined with a prior distribution, this energy-based Gibbs model also allows for definition of an interesting penalized maximum probabilistic rand estimator with which the fusion of simple, quickly estimated, segmentation results appears as an interesting alternative to complex segmentation models existing in the literature. This fusion framework has been successfully applied on the Berkeley image database. The experiments reported in this paper demonstrate that the proposed method is efficient in terms of visual evaluation and quantitative performance measures and performs well compared to the best existing state-of-the-art segmentation methods recently proposed in the literature.

  7. Bayesian structural equation modeling in sport and exercise psychology.

    PubMed

    Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus

    2015-08-01

    Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.

  8. A simulation study on Bayesian Ridge regression models for several collinearity levels

    NASA Astrophysics Data System (ADS)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  9. A 3D model retrieval approach based on Bayesian networks lightfield descriptor

    NASA Astrophysics Data System (ADS)

    Xiao, Qinhan; Li, Yanjun

    2009-12-01

    A new 3D model retrieval methodology is proposed by exploiting a novel Bayesian networks lightfield descriptor (BNLD). There are two key novelties in our approach: (1) a BN-based method for building lightfield descriptor; and (2) a 3D model retrieval scheme based on the proposed BNLD. To overcome the disadvantages of the existing 3D model retrieval methods, we explore BN for building a new lightfield descriptor. Firstly, 3D model is put into lightfield, about 300 binary-views can be obtained along a sphere, then Fourier descriptors and Zernike moments descriptors can be calculated out from binaryviews. Then shape feature sequence would be learned into a BN model based on BN learning algorithm; Secondly, we propose a new 3D model retrieval method by calculating Kullback-Leibler Divergence (KLD) between BNLDs. Beneficial from the statistical learning, our BNLD is noise robustness as compared to the existing methods. The comparison between our method and the lightfield descriptor-based approach is conducted to demonstrate the effectiveness of our proposed methodology.

  10. Multilocus microsatellite typing shows three different genetic clusters of Leishmania major in Iran.

    PubMed

    Mahnaz, Tashakori; Al-Jawabreh, Amer; Kuhls, Katrin; Schönian, Gabriele

    2011-10-01

    Ten polymorphic microsatellite markers were used to analyse 25 strains of Leishmania major collected from cutaneous leishmaniasis cases in different endemic areas in Iran. Nine of the markers were polymorphic, revealing 21 different genotypes. The data displayed significant microsatellite polymorphism with rare allelic heterozygosity. Bayesian statistic and distance based analyses identified three genetic clusters among the 25 strains analysed. Cluster I represented mainly strains isolated in the west and south-west of Iran, with the exception of four strains originating from central Iran. Cluster II comprised strains from the central part of Iran, and cluster III included only strains from north Iran. The geographical distribution of L. major in Iran was supported by comparing the microsatellite profiles of the 25 Iranian strains to those of 105 strains collected in 19 Asian and African countries. The Iranian clusters I and II were separated from three previously described populations comprising strains from Africa, the Middle East and Central Asia whereas cluster III grouped together with the Central Asian population. The considerable genetic variability of L. major might be related to the existence of different populations of Phlebotomus papatasi and/or to differences in reservoir host abundance in different parts of Iran. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  11. Bayesian model reduction and empirical Bayes for group (DCM) studies

    PubMed Central

    Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter

    2016-01-01

    This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570

  12. A multimembership catalogue for 1876 open clusters using UCAC4 data

    NASA Astrophysics Data System (ADS)

    Sampedro, L.; Dias, W. S.; Alfaro, E. J.; Monteiro, H.; Molino, A.

    2017-10-01

    The main objective of this work is to determine the cluster members of 1876 open clusters, using positions and proper motions of the astrometric fourth United States Naval Observatory (USNO) CCD Astrograph Catalog (UCAC4). For this purpose, we apply three different methods, all based on a Bayesian approach, but with different formulations: a purely parametric method, another completely non-parametric algorithm and a third, recently developed by Sampedro & Alfaro, using both formulations at different steps of the whole process. The first and second statistical moments of the members' phase-space subspace, obtained after applying the three methods, are compared for every cluster. Although, on average, the three methods yield similar results, there are also specific differences between them, as well as for some particular clusters. The comparison with other published catalogues shows good agreement. We have also estimated, for the first time, the mean proper motion for a sample of 18 clusters. The results are organized in a single catalogue formed by two main files, one with the most relevant information for each cluster, partially including that in UCAC4, and the other showing the individual membership probabilities for each star in the cluster area. The final catalogue, with an interface design that enables an easy interaction with the user, is available in electronic format at the Stellar Systems Group (SSG-IAA) web site (http://ssg.iaa.es/en/content/sampedro-cluster-catalog).

  13. Bayesian effect estimation accounting for adjustment uncertainty.

    PubMed

    Wang, Chi; Parmigiani, Giovanni; Dominici, Francesca

    2012-09-01

    Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter, ω, denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (ω= 1), BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC, with ω > 1, estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We, then, compare BAC, a recent approach of Crainiceanu, Dominici, and Parmigiani (2008, Biometrika 95, 635-651), and traditional BMA in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999-2005. Using each approach, we estimate the short-term effects of on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty. © 2012, The International Biometric Society.

  14. A Bayesian Framework for False Belief Reasoning in Children: A Rational Integration of Theory-Theory and Simulation Theory

    PubMed Central

    Asakura, Nobuhiko; Inui, Toshio

    2016-01-01

    Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities. PMID:28082941

  15. A Bayesian Framework for False Belief Reasoning in Children: A Rational Integration of Theory-Theory and Simulation Theory.

    PubMed

    Asakura, Nobuhiko; Inui, Toshio

    2016-01-01

    Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities.

  16. THE DYNAMICS OF MERGING CLUSTERS: A MONTE CARLO SOLUTION APPLIED TO THE BULLET AND MUSKET BALL CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dawson, William A., E-mail: wadawson@ucdavis.edu

    2013-08-01

    Merging galaxy clusters have become one of the most important probes of dark matter, providing evidence for dark matter over modified gravity and even constraints on the dark matter self-interaction cross-section. To properly constrain the dark matter cross-section it is necessary to understand the dynamics of the merger, as the inferred cross-section is a function of both the velocity of the collision and the observed time since collision. While the best understanding of merging system dynamics comes from N-body simulations, these are computationally intensive and often explore only a limited volume of the merger phase space allowed by observed parametermore » uncertainty. Simple analytic models exist but the assumptions of these methods invalidate their results near the collision time, plus error propagation of the highly correlated merger parameters is unfeasible. To address these weaknesses I develop a Monte Carlo method to discern the properties of dissociative mergers and propagate the uncertainty of the measured cluster parameters in an accurate and Bayesian manner. I introduce this method, verify it against an existing hydrodynamic N-body simulation, and apply it to two known dissociative mergers: 1ES 0657-558 (Bullet Cluster) and DLSCL J0916.2+2951 (Musket Ball Cluster). I find that this method surpasses existing analytic models-providing accurate (10% level) dynamic parameter and uncertainty estimates throughout the merger history. This, coupled with minimal required a priori information (subcluster mass, redshift, and projected separation) and relatively fast computation ({approx}6 CPU hours), makes this method ideal for large samples of dissociative merging clusters.« less

  17. Spatio Temporal EEG Source Imaging with the Hierarchical Bayesian Elastic Net and Elitist Lasso Models

    PubMed Central

    Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A.; Valdés-Hernández, Pedro A.; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A.

    2017-01-01

    The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website. PMID:29200994

  18. Spatio Temporal EEG Source Imaging with the Hierarchical Bayesian Elastic Net and Elitist Lasso Models.

    PubMed

    Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A; Valdés-Hernández, Pedro A; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A

    2017-01-01

    The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website.

  19. An introduction to using Bayesian linear regression with clinical data.

    PubMed

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Reliability modelling and analysis of a multi-state element based on a dynamic Bayesian network

    PubMed Central

    Xu, Tingxue; Gu, Junyuan; Dong, Qi; Fu, Linyu

    2018-01-01

    This paper presents a quantitative reliability modelling and analysis method for multi-state elements based on a combination of the Markov process and a dynamic Bayesian network (DBN), taking perfect repair, imperfect repair and condition-based maintenance (CBM) into consideration. The Markov models of elements without repair and under CBM are established, and an absorbing set is introduced to determine the reliability of the repairable element. According to the state-transition relations between the states determined by the Markov process, a DBN model is built. In addition, its parameters for series and parallel systems, namely, conditional probability tables, can be calculated by referring to the conditional degradation probabilities. Finally, the power of a control unit in a failure model is used as an example. A dynamic fault tree (DFT) is translated into a Bayesian network model, and subsequently extended to a DBN. The results show the state probabilities of an element and the system without repair, with perfect and imperfect repair, and under CBM, with an absorbing set plotted by differential equations and verified. Through referring forward, the reliability value of the control unit is determined in different kinds of modes. Finally, weak nodes are noted in the control unit. PMID:29765629

  1. Using Bayesian regression to test hypotheses about relationships between parameters and covariates in cognitive models.

    PubMed

    Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan

    2018-06-01

    An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.

  2. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    PubMed

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  3. Prediction and assimilation of surf-zone processes using a Bayesian network: Part I: Forward models

    USGS Publications Warehouse

    Plant, Nathaniel G.; Holland, K. Todd

    2011-01-01

    Prediction of coastal processes, including waves, currents, and sediment transport, can be obtained from a variety of detailed geophysical-process models with many simulations showing significant skill. This capability supports a wide range of research and applied efforts that can benefit from accurate numerical predictions. However, the predictions are only as accurate as the data used to drive the models and, given the large temporal and spatial variability of the surf zone, inaccuracies in data are unavoidable such that useful predictions require corresponding estimates of uncertainty. We demonstrate how a Bayesian-network model can be used to provide accurate predictions of wave-height evolution in the surf zone given very sparse and/or inaccurate boundary-condition data. The approach is based on a formal treatment of a data-assimilation problem that takes advantage of significant reduction of the dimensionality of the model system. We demonstrate that predictions of a detailed geophysical model of the wave evolution are reproduced accurately using a Bayesian approach. In this surf-zone application, forward prediction skill was 83%, and uncertainties in the model inputs were accurately transferred to uncertainty in output variables. We also demonstrate that if modeling uncertainties were not conveyed to the Bayesian network (i.e., perfect data or model were assumed), then overly optimistic prediction uncertainties were computed. More consistent predictions and uncertainties were obtained by including model-parameter errors as a source of input uncertainty. Improved predictions (skill of 90%) were achieved because the Bayesian network simultaneously estimated optimal parameters while predicting wave heights.

  4. A probabilistic model framework for evaluating year-to-year variation in crop productivity

    NASA Astrophysics Data System (ADS)

    Yokozawa, M.; Iizumi, T.; Tao, F.

    2008-12-01

    Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.

  5. Uncertainty Management for Diagnostics and Prognostics of Batteries using Bayesian Techniques

    NASA Technical Reports Server (NTRS)

    Saha, Bhaskar; Goebel, kai

    2007-01-01

    Uncertainty management has always been the key hurdle faced by diagnostics and prognostics algorithms. A Bayesian treatment of this problem provides an elegant and theoretically sound approach to the modern Condition- Based Maintenance (CBM)/Prognostic Health Management (PHM) paradigm. The application of the Bayesian techniques to regression and classification in the form of Relevance Vector Machine (RVM), and to state estimation as in Particle Filters (PF), provides a powerful tool to integrate the diagnosis and prognosis of battery health. The RVM, which is a Bayesian treatment of the Support Vector Machine (SVM), is used for model identification, while the PF framework uses the learnt model, statistical estimates of noise and anticipated operational conditions to provide estimates of remaining useful life (RUL) in the form of a probability density function (PDF). This type of prognostics generates a significant value addition to the management of any operation involving electrical systems.

  6. A Development of Nonstationary Regional Frequency Analysis Model with Large-scale Climate Information: Its Application to Korean Watershed

    NASA Astrophysics Data System (ADS)

    Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo

    2015-04-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  7. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012.

    PubMed

    Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin

    2017-06-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.

  8. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003–2012

    PubMed Central

    Khan, Diba; Rossen, Lauren M.; Hamilton, Brady E.; He, Yulei; Wei, Rong; Dienes, Erin

    2017-01-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003–2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. PMID:28552189

  9. Maximum entropy perception-action space: a Bayesian model of eye movement selection

    NASA Astrophysics Data System (ADS)

    Colas, Francis; Bessière, Pierre; Girard, Benoît

    2011-03-01

    In this article, we investigate the issue of the selection of eye movements in a free-eye Multiple Object Tracking task. We propose a Bayesian model of retinotopic maps with a complex logarithmic mapping. This model is structured in two parts: a representation of the visual scene, and a decision model based on the representation. We compare different decision models based on different features of the representation and we show that taking into account uncertainty helps predict the eye movements of subjects recorded in a psychophysics experiment. Finally, based on experimental data, we postulate that the complex logarithmic mapping has a functional relevance, as the density of objects in this space in more uniform than expected. This may indicate that the representation space and control strategies are such that the object density is of maximum entropy.

  10. Bayesian networks improve causal environmental ...

    EPA Pesticide Factsheets

    Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on value

  11. Reducing the Uncertainty in Atlantic Meridional Overturning Circulation Projections Using Bayesian Model Averaging

    NASA Astrophysics Data System (ADS)

    Olson, R.; An, S. I.

    2016-12-01

    Atlantic Meridional Overturning Circulation (AMOC) in the ocean might slow down in the future, which can lead to a host of climatic effects in North Atlantic and throughout the world. Despite improvements in climate models and availability of new observations, AMOC projections remain uncertain. Here we constrain CMIP5 multi-model ensemble output with observations of a recently developed AMOC index to provide improved Bayesian predictions of future AMOC. Specifically, we first calculate yearly AMOC index loosely based on Rahmstorf et al. (2015) for years 1880—2004 for both observations, and the CMIP5 models for which relevant output is available. We then assign a weight to each model based on a Bayesian Model Averaging method that accounts for differential model skill in terms of both mean state and variability. We include the temporal autocorrelation in climate model errors, and account for the uncertainty in the parameters of our statistical model. We use the weights to provide future weighted projections of AMOC, and compare them to un-weighted ones. Our projections use bootstrapping to account for uncertainty in internal AMOC variability. We also perform spectral and other statistical analyses to show that AMOC index variability, both in models and in observations, is consistent with red noise. Our results improve on and complement previous work by using a new ensemble of climate models, a different observational metric, and an improved Bayesian weighting method that accounts for differential model skill at reproducing internal variability. Reference: Rahmstorf, S., Box, J. E., Feulner, G., Mann, M. E., Robinson, A., Rutherford, S., & Schaffernicht, E. J. (2015). Exceptional twentieth-century slowdown in atlantic ocean overturning circulation. Nature Climate Change, 5(5), 475-480. doi:10.1038/nclimate2554

  12. Competing risk models in reliability systems, a weibull distribution model with bayesian analysis approach

    NASA Astrophysics Data System (ADS)

    Iskandar, Ismed; Satria Gondokaryono, Yudi

    2016-02-01

    In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range between the true value and the maximum likelihood estimated value lines.

  13. Bayesian median regression for temporal gene expression data

    NASA Astrophysics Data System (ADS)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  14. Comparison of 3 estimation methods of mycophenolic acid AUC based on a limited sampling strategy in renal transplant patients.

    PubMed

    Hulin, Anne; Blanchet, Benoît; Audard, Vincent; Barau, Caroline; Furlan, Valérie; Durrbach, Antoine; Taïeb, Fabrice; Lang, Philippe; Grimbert, Philippe; Tod, Michel

    2009-04-01

    A significant relationship between mycophenolic acid (MPA) area under the plasma concentration-time curve (AUC) and the risk for rejection has been reported. Based on 3 concentration measurements, 3 approaches have been proposed for the estimation of MPA AUC, involving either a multilinear regression approach model (MLRA) or a Bayesian estimation using either gamma absorption or zero-order absorption population models. The aim of the study was to compare the 3 approaches for the estimation of MPA AUC in 150 renal transplant patients treated with mycophenolate mofetil and tacrolimus. The population parameters were determined in 77 patients (learning study). The AUC estimation methods were compared in the learning population and in 73 patients from another center (validation study). In the latter study, the reference AUCs were estimated by the trapezoidal rule on 8 measurements. MPA concentrations were measured by liquid chromatography. The gamma absorption model gave the best fit. In the learning study, the AUCs estimated by both Bayesian methods were very similar, whereas the multilinear approach was highly correlated but yielded estimates about 20% lower than Bayesian methods. This resulted in dosing recommendations differing by 250 mg/12 h or more in 27% of cases. In the validation study, AUC estimates based on the Bayesian method with gamma absorption model and multilinear regression approach model were, respectively, 12% higher and 7% lower than the reference values. To conclude, the bicompartmental model with gamma absorption rate gave the best fit. The 3 AUC estimation methods are highly correlated but not concordant. For a given patient, the same estimation method should always be used.

  15. Bayesian models: A statistical primer for ecologists

    USGS Publications Warehouse

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  16. Bayesian randomized clinical trials: From fixed to adaptive design.

    PubMed

    Yin, Guosheng; Lam, Chi Kin; Shi, Haolun

    2017-08-01

    Randomized controlled studies are the gold standard for phase III clinical trials. Using α-spending functions to control the overall type I error rate, group sequential methods are well established and have been dominating phase III studies. Bayesian randomized design, on the other hand, can be viewed as a complement instead of competitive approach to the frequentist methods. For the fixed Bayesian design, the hypothesis testing can be cast in the posterior probability or Bayes factor framework, which has a direct link to the frequentist type I error rate. Bayesian group sequential design relies upon Bayesian decision-theoretic approaches based on backward induction, which is often computationally intensive. Compared with the frequentist approaches, Bayesian methods have several advantages. The posterior predictive probability serves as a useful and convenient tool for trial monitoring, and can be updated at any time as the data accrue during the trial. The Bayesian decision-theoretic framework possesses a direct link to the decision making in the practical setting, and can be modeled more realistically to reflect the actual cost-benefit analysis during the drug development process. Other merits include the possibility of hierarchical modeling and the use of informative priors, which would lead to a more comprehensive utilization of information from both historical and longitudinal data. From fixed to adaptive design, we focus on Bayesian randomized controlled clinical trials and make extensive comparisons with frequentist counterparts through numerical studies. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Bayes' theorem application in the measure information diagnostic value assessment

    NASA Astrophysics Data System (ADS)

    Orzechowski, Piotr D.; Makal, Jaroslaw; Nazarkiewicz, Andrzej

    2006-03-01

    The paper presents Bayesian method application in the measure information diagnostic value assessment that is used in the computer-aided diagnosis system. The computer system described here has been created basing on the Bayesian Network and is used in Benign Prostatic Hyperplasia (BPH) diagnosis. The graphic diagnostic model enables to juxtapose experts' knowledge with data.

  18. A comparison of adaptive sampling designs and binary spatial models: A simulation study using a census of Bromus inermis

    USGS Publications Warehouse

    Irvine, Kathryn M.; Thornton, Jamie; Backus, Vickie M.; Hohmann, Matthew G.; Lehnhoff, Erik A.; Maxwell, Bruce D.; Michels, Kurt; Rew, Lisa

    2013-01-01

    Commonly in environmental and ecological studies, species distribution data are recorded as presence or absence throughout a spatial domain of interest. Field based studies typically collect observations by sampling a subset of the spatial domain. We consider the effects of six different adaptive and two non-adaptive sampling designs and choice of three binary models on both predictions to unsampled locations and parameter estimation of the regression coefficients (species–environment relationships). Our simulation study is unique compared to others to date in that we virtually sample a true known spatial distribution of a nonindigenous plant species, Bromus inermis. The census of B. inermis provides a good example of a species distribution that is both sparsely (1.9 % prevalence) and patchily distributed. We find that modeling the spatial correlation using a random effect with an intrinsic Gaussian conditionally autoregressive prior distribution was equivalent or superior to Bayesian autologistic regression in terms of predicting to un-sampled areas when strip adaptive cluster sampling was used to survey B. inermis. However, inferences about the relationships between B. inermis presence and environmental predictors differed between the two spatial binary models. The strip adaptive cluster designs we investigate provided a significant advantage in terms of Markov chain Monte Carlo chain convergence when trying to model a sparsely distributed species across a large area. In general, there was little difference in the choice of neighborhood, although the adaptive king was preferred when transects were randomly placed throughout the spatial domain.

  19. A bayesian approach for determining velocity and uncertainty estimates from seismic cone penetrometer testing or vertical seismic profiling data

    USGS Publications Warehouse

    Pidlisecky, Adam; Haines, S.S.

    2011-01-01

    Conventional processing methods for seismic cone penetrometer data present several shortcomings, most notably the absence of a robust velocity model uncertainty estimate. We propose a new seismic cone penetrometer testing (SCPT) data-processing approach that employs Bayesian methods to map measured data errors into quantitative estimates of model uncertainty. We first calculate travel-time differences for all permutations of seismic trace pairs. That is, we cross-correlate each trace at each measurement location with every trace at every other measurement location to determine travel-time differences that are not biased by the choice of any particular reference trace and to thoroughly characterize data error. We calculate a forward operator that accounts for the different ray paths for each measurement location, including refraction at layer boundaries. We then use a Bayesian inversion scheme to obtain the most likely slowness (the reciprocal of velocity) and a distribution of probable slowness values for each model layer. The result is a velocity model that is based on correct ray paths, with uncertainty bounds that are based on the data error. ?? NRC Research Press 2011.

  20. Analysis and comparison of safety models using average daily, average hourly, and microscopic traffic.

    PubMed

    Wang, Ling; Abdel-Aty, Mohamed; Wang, Xuesong; Yu, Rongjie

    2018-02-01

    There have been plenty of traffic safety studies based on average daily traffic (ADT), average hourly traffic (AHT), or microscopic traffic at 5 min intervals. Nevertheless, not enough research has compared the performance of these three types of safety studies, and seldom of previous studies have intended to find whether the results of one type of study is transferable to the other two studies. First, this study built three models: a Bayesian Poisson-lognormal model to estimate the daily crash frequency using ADT, a Bayesian Poisson-lognormal model to estimate the hourly crash frequency using AHT, and a Bayesian logistic regression model for the real-time safety analysis using microscopic traffic. The model results showed that the crash contributing factors found by different models were comparable but not the same. Four variables, i.e., the logarithm of volume, the standard deviation of speed, the logarithm of segment length, and the existence of diverge segment, were positively significant in the three models. Additionally, weaving segments experienced higher daily and hourly crash frequencies than merge and basic segments. Then, each of the ADT-based, AHT-based, and real-time models was used to estimate safety conditions at different levels: daily and hourly, meanwhile, the real-time model was also used in 5 min intervals. The results uncovered that the ADT- and AHT-based safety models performed similar in predicting daily and hourly crash frequencies, and the real-time safety model was able to provide hourly crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

    PubMed Central

    Jacob, Benjamin G; Griffith, Daniel A; Muturi, Ephantus J; Caamano, Erick X; Githure, John I; Novak, Robert J

    2009-01-01

    Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity. PMID:19772590

  2. Recent Transmission Clustering of HIV-1 C and CRF17_BF Strains Characterized by NNRTI-Related Mutations among Newly Diagnosed Men in Central Italy

    PubMed Central

    Orchi, Nicoletta; Gori, Caterina; Bertoli, Ada; Forbici, Federica; Montella, Francesco; Pennica, Alfredo; De Carli, Gabriella; Giuliani, Massimo; Continenza, Fabio; Pinnetti, Carmela; Nicastri, Emanuele; Ceccherini-Silberstein, Francesca; Mastroianni, Claudio Maria; Girardi, Enrico; Andreoni, Massimo; Antinori, Andrea; Santoro, Maria Mercedes; Perno, Carlo Federico

    2015-01-01

    Background Increased evidence of relevant HIV-1 epidemic transmission in European countries is being reported, with an increased circulation of non-B-subtypes. Here, we present two recent HIV-1 non-B transmission clusters characterized by NNRTI-related amino-acidic mutations among newly diagnosed HIV-1 infected men, living in Rome (Central-Italy). Methods Pol and V3 sequences were available at the time of diagnosis for all individuals. Maximum-Likelihood and Bayesian phylogenetic-trees with bootstrap and Bayesian-probability supports defined transmission-clusters. HIV-1 drug-resistance and V3-tropism were also evaluated. Results Among 534 new HIV-1 non-B cases, diagnosed from 2011 to 2014, in Central-Italy, 35 carried virus gathering in two distinct clusters, including 27 HIV-1 C and 8 CRF17_BF subtypes, respectively. Both clusters were centralized in Rome, and their origin was estimated to have been after 2007. All individuals within both clusters were males and 37.1% of them had been recently-infected. While C-cluster was entirely composed by Italian men-who-have-sex-with-men, with a median-age of 34 years (IQR:30–39), individuals in CRF17_BF-cluster were older, with a median-age of 51 years (IQR:48–59) and almost all reported sexual-contacts with men and women. All carried R5-tropic viruses, with evidence of atypical or resistance amino-acidic mutations related to NNRTI-drugs (K103Q in C-cluster, and K101E+E138K in CRF17_BF-cluster). Conclusions These two epidemiological clusters provided evidence of a strong and recent circulation of C and CRF17_BF strains in central Italy, characterized by NNRTI-related mutations among men engaging in high-risk behaviours. These findings underline the role of molecular epidemiology in identifying groups at increased risk of HIV-1 transmission, and in enhancing additional prevention efforts. PMID:26270824

  3. Stoffenmanager exposure model: company-specific exposure assessments using a Bayesian methodology.

    PubMed

    van de Ven, Peter; Fransman, Wouter; Schinkel, Jody; Rubingh, Carina; Warren, Nicholas; Tielemans, Erik

    2010-04-01

    The web-based tool "Stoffenmanager" was initially developed to assist small- and medium-sized enterprises in the Netherlands to make qualitative risk assessments and to provide advice on control at the workplace. The tool uses a mechanistic model to arrive at a "Stoffenmanager score" for exposure. In a recent study it was shown that variability in exposure measurements given a certain Stoffenmanager score is still substantial. This article discusses an extension to the tool that uses a Bayesian methodology for quantitative workplace/scenario-specific exposure assessment. This methodology allows for real exposure data observed in the company of interest to be combined with the prior estimate (based on the Stoffenmanager model). The output of the tool is a company-specific assessment of exposure levels for a scenario for which data is available. The Bayesian approach provides a transparent way of synthesizing different types of information and is especially preferred in situations where available data is sparse, as is often the case in small- and medium sized-enterprises. Real-world examples as well as simulation studies were used to assess how different parameters such as sample size, difference between prior and data, uncertainty in prior, and variance in the data affect the eventual posterior distribution of a Bayesian exposure assessment.

  4. An economic growth model based on financial credits distribution to the government economy priority sectors of each regency in Indonesia using hierarchical Bayesian method

    NASA Astrophysics Data System (ADS)

    Yasmirullah, Septia Devi Prihastuti; Iriawan, Nur; Sipayung, Feronika Rosalinda

    2017-11-01

    The success of regional economic establishment could be measured by economic growth. Since the Act No. 32 of 2004 has been implemented, unbalance economic among the regency in Indonesia is increasing. This condition is contrary different with the government goal to build society welfare through the economic activity development in each region. This research aims to examine economic growth through the distribution of bank credits to each Indonesia's regency. The data analyzed in this research is hierarchically structured data which follow normal distribution in first level. Two modeling approaches are employed in this research, a global-one level Bayesian approach and two-level hierarchical Bayesian approach. The result shows that hierarchical Bayesian has succeeded to demonstrate a better estimation than a global-one level Bayesian. It proves that the different economic growth in each province is significantly influenced by the variations of micro level characteristics in each province. These variations are significantly affected by cities and province characteristics in second level.

  5. A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

    PubMed Central

    2018-01-01

    We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias. PMID:29614129

  6. Bayesian model reduction and empirical Bayes for group (DCM) studies.

    PubMed

    Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter

    2016-03-01

    This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Nonlinear finite element model updating for damage identification of civil structures using batch Bayesian estimation

    NASA Astrophysics Data System (ADS)

    Ebrahimian, Hamed; Astroza, Rodrigo; Conte, Joel P.; de Callafon, Raymond A.

    2017-02-01

    This paper presents a framework for structural health monitoring (SHM) and damage identification of civil structures. This framework integrates advanced mechanics-based nonlinear finite element (FE) modeling and analysis techniques with a batch Bayesian estimation approach to estimate time-invariant model parameters used in the FE model of the structure of interest. The framework uses input excitation and dynamic response of the structure and updates a nonlinear FE model of the structure to minimize the discrepancies between predicted and measured response time histories. The updated FE model can then be interrogated to detect, localize, classify, and quantify the state of damage and predict the remaining useful life of the structure. As opposed to recursive estimation methods, in the batch Bayesian estimation approach, the entire time history of the input excitation and output response of the structure are used as a batch of data to estimate the FE model parameters through a number of iterations. In the case of non-informative prior, the batch Bayesian method leads to an extended maximum likelihood (ML) estimation method to estimate jointly time-invariant model parameters and the measurement noise amplitude. The extended ML estimation problem is solved efficiently using a gradient-based interior-point optimization algorithm. Gradient-based optimization algorithms require the FE response sensitivities with respect to the model parameters to be identified. The FE response sensitivities are computed accurately and efficiently using the direct differentiation method (DDM). The estimation uncertainties are evaluated based on the Cramer-Rao lower bound (CRLB) theorem by computing the exact Fisher Information matrix using the FE response sensitivities with respect to the model parameters. The accuracy of the proposed uncertainty quantification approach is verified using a sampling approach based on the unscented transformation. Two validation studies, based on realistic structural FE models of a bridge pier and a moment resisting steel frame, are performed to validate the performance and accuracy of the presented nonlinear FE model updating approach and demonstrate its application to SHM. These validation studies show the excellent performance of the proposed framework for SHM and damage identification even in the presence of high measurement noise and/or way-out initial estimates of the model parameters. Furthermore, the detrimental effects of the input measurement noise on the performance of the proposed framework are illustrated and quantified through one of the validation studies.

  8. A Bayesian Hierarchical Model for Glacial Dynamics Based on the Shallow Ice Approximation and its Evaluation Using Analytical Solutions

    NASA Astrophysics Data System (ADS)

    Gopalan, Giri; Hrafnkelsson, Birgir; Aðalgeirsdóttir, Guðfinna; Jarosch, Alexander H.; Pálsson, Finnur

    2018-03-01

    Bayesian hierarchical modeling can assist the study of glacial dynamics and ice flow properties. This approach will allow glaciologists to make fully probabilistic predictions for the thickness of a glacier at unobserved spatio-temporal coordinates, and it will also allow for the derivation of posterior probability distributions for key physical parameters such as ice viscosity and basal sliding. The goal of this paper is to develop a proof of concept for a Bayesian hierarchical model constructed, which uses exact analytical solutions for the shallow ice approximation (SIA) introduced by Bueler et al. (2005). A suite of test simulations utilizing these exact solutions suggests that this approach is able to adequately model numerical errors and produce useful physical parameter posterior distributions and predictions. A byproduct of the development of the Bayesian hierarchical model is the derivation of a novel finite difference method for solving the SIA partial differential equation (PDE). An additional novelty of this work is the correction of numerical errors induced through a numerical solution using a statistical model. This error correcting process models numerical errors that accumulate forward in time and spatial variation of numerical errors between the dome, interior, and margin of a glacier.

  9. A Bayes linear Bayes method for estimation of correlated event rates.

    PubMed

    Quigley, John; Wilson, Kevin J; Walls, Lesley; Bedford, Tim

    2013-12-01

    Typically, full Bayesian estimation of correlated event rates can be computationally challenging since estimators are intractable. When estimation of event rates represents one activity within a larger modeling process, there is an incentive to develop more efficient inference than provided by a full Bayesian model. We develop a new subjective inference method for correlated event rates based on a Bayes linear Bayes model under the assumption that events are generated from a homogeneous Poisson process. To reduce the elicitation burden we introduce homogenization factors to the model and, as an alternative to a subjective prior, an empirical method using the method of moments is developed. Inference under the new method is compared against estimates obtained under a full Bayesian model, which takes a multivariate gamma prior, where the predictive and posterior distributions are derived in terms of well-known functions. The mathematical properties of both models are presented. A simulation study shows that the Bayes linear Bayes inference method and the full Bayesian model provide equally reliable estimates. An illustrative example, motivated by a problem of estimating correlated event rates across different users in a simple supply chain, shows how ignoring the correlation leads to biased estimation of event rates. © 2013 Society for Risk Analysis.

  10. Estimating the probability for major gene Alzheimer disease

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farrer, L.A.; Cupples, L.A.

    1994-02-01

    Alzheimer disease (AD) is a neuropsychiatric illness caused by multiple etiologies. Prediction of whether AD is genetically based in a given family is problematic because of censoring bias among unaffected relatives as a consequence of the late onset of the disorder, diagnostic uncertainties, heterogeneity, and limited information in a single family. The authors have developed a method based on Bayesian probability to compute values for a continuous variable that ranks AD families as having a major gene form of AD (MGAD). In addition, they have compared the Bayesian method with a maximum-likelihood approach. These methods incorporate sex- and age-adjusted riskmore » estimates and allow for phenocopies and familial clustering of age on onset. Agreement is high between the two approaches for ranking families as MGAD (Spearman rank [r] = .92). When either method is used, the numerical outcomes are sensitive to assumptions of the gene frequency and cumulative incidence of the disease in the population. Consequently, risk estimates should be used cautiously for counseling purposes; however, there are numerous valid applications of these procedures in genetic and epidemiological studies. 41 refs., 4 figs., 3 tabs.« less

  11. CCD UBVRI photometry of NGC 6811

    NASA Astrophysics Data System (ADS)

    Yontan, T.; Bilir, S.; Bostancı, Z. F.; Ak, T.; Karaali, S.; Güver, T.; Ak, S.; Duran, Ş.; Paunzen, E.

    2015-02-01

    We present the results of CCD UBVRI observations of the open cluster NGC 6811 obtained on 18th July 2012 with the 1 m telescope at the TÜBİTAK National Observatory (TUG). Using these photometric results, we determine the structural and astrophysical parameters of the cluster. The mean photometric uncertainties are better than 0.02 mag in the V magnitude and B- V, V- R, and V- I colour indices to about 0.03 mag for U- B among stars brighter than magnitude V=18. Cluster member stars were separated from the field stars using the Galaxia model of Sharma et al. (2011) together with other techniques. The core radius of the cluster is found to be r c =3.60 arcmin. The astrophysical parameters were determined simultaneously via Bayesian statistics using the colour-magnitude diagrams V versus B- V, V versus V- I, V versus V- R, and V versus R- I of the cluster. The resulting most likely parameters were further confirmed using independent methods, removing any possible degeneracies. The colour excess, distance modulus, metallicity and the age of the cluster are determined simultaneously as E( B- V)=0.05±0.01 mag, μ=10.06±0.08 mag, [ M/ H]=-0.10±0.01 dex and t=1.00±0.05 Gyr, respectively. Distances of five red clump stars which were found to be members of the cluster further confirm our distance estimation.

  12. Quantifying model-structure- and parameter-driven uncertainties in spring wheat phenology prediction with Bayesian analysis

    DOE PAGES

    Alderman, Phillip D.; Stanfill, Bryan

    2016-10-06

    Recent international efforts have brought renewed emphasis on the comparison of different agricultural systems models. Thus far, analysis of model-ensemble simulated results has not clearly differentiated between ensemble prediction uncertainties due to model structural differences per se and those due to parameter value uncertainties. Additionally, despite increasing use of Bayesian parameter estimation approaches with field-scale crop models, inadequate attention has been given to the full posterior distributions for estimated parameters. The objectives of this study were to quantify the impact of parameter value uncertainty on prediction uncertainty for modeling spring wheat phenology using Bayesian analysis and to assess the relativemore » contributions of model-structure-driven and parameter-value-driven uncertainty to overall prediction uncertainty. This study used a random walk Metropolis algorithm to estimate parameters for 30 spring wheat genotypes using nine phenology models based on multi-location trial data for days to heading and days to maturity. Across all cases, parameter-driven uncertainty accounted for between 19 and 52% of predictive uncertainty, while model-structure-driven uncertainty accounted for between 12 and 64%. Here, this study demonstrated the importance of quantifying both model-structure- and parameter-value-driven uncertainty when assessing overall prediction uncertainty in modeling spring wheat phenology. More generally, Bayesian parameter estimation provided a useful framework for quantifying and analyzing sources of prediction uncertainty.« less

  13. HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python.

    PubMed

    Wiecki, Thomas V; Sofer, Imri; Frank, Michael J

    2013-01-01

    The diffusion model is a commonly used tool to infer latent psychological processes underlying decision-making, and to link them to neural mechanisms based on response times. Although efficient open source software has been made available to quantitatively fit the model to data, current estimation methods require an abundance of response time measurements to recover meaningful parameters, and only provide point estimates of each parameter. In contrast, hierarchical Bayesian parameter estimation methods are useful for enhancing statistical power, allowing for simultaneous estimation of individual subject parameters and the group distribution that they are drawn from, while also providing measures of uncertainty in these parameters in the posterior distribution. Here, we present a novel Python-based toolbox called HDDM (hierarchical drift diffusion model), which allows fast and flexible estimation of the the drift-diffusion model and the related linear ballistic accumulator model. HDDM requires fewer data per subject/condition than non-hierarchical methods, allows for full Bayesian data analysis, and can handle outliers in the data. Finally, HDDM supports the estimation of how trial-by-trial measurements (e.g., fMRI) influence decision-making parameters. This paper will first describe the theoretical background of the drift diffusion model and Bayesian inference. We then illustrate usage of the toolbox on a real-world data set from our lab. Finally, parameter recovery studies show that HDDM beats alternative fitting methods like the χ(2)-quantile method as well as maximum likelihood estimation. The software and documentation can be downloaded at: http://ski.clps.brown.edu/hddm_docs/

  14. Joint analysis of input and parametric uncertainties in watershed water quality modeling: A formal Bayesian approach

    NASA Astrophysics Data System (ADS)

    Han, Feng; Zheng, Yi

    2018-06-01

    Significant Input uncertainty is a major source of error in watershed water quality (WWQ) modeling. It remains challenging to address the input uncertainty in a rigorous Bayesian framework. This study develops the Bayesian Analysis of Input and Parametric Uncertainties (BAIPU), an approach for the joint analysis of input and parametric uncertainties through a tight coupling of Markov Chain Monte Carlo (MCMC) analysis and Bayesian Model Averaging (BMA). The formal likelihood function for this approach is derived considering a lag-1 autocorrelated, heteroscedastic, and Skew Exponential Power (SEP) distributed error model. A series of numerical experiments were performed based on a synthetic nitrate pollution case and on a real study case in the Newport Bay Watershed, California. The Soil and Water Assessment Tool (SWAT) and Differential Evolution Adaptive Metropolis (DREAM(ZS)) were used as the representative WWQ model and MCMC algorithm, respectively. The major findings include the following: (1) the BAIPU can be implemented and used to appropriately identify the uncertain parameters and characterize the predictive uncertainty; (2) the compensation effect between the input and parametric uncertainties can seriously mislead the modeling based management decisions, if the input uncertainty is not explicitly accounted for; (3) the BAIPU accounts for the interaction between the input and parametric uncertainties and therefore provides more accurate calibration and uncertainty results than a sequential analysis of the uncertainties; and (4) the BAIPU quantifies the credibility of different input assumptions on a statistical basis and can be implemented as an effective inverse modeling approach to the joint inference of parameters and inputs.

  15. Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective

    USGS Publications Warehouse

    Barker, Richard J.; Link, William A.

    2015-01-01

    Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.

  16. Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.

    PubMed

    Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence

    2012-12-01

    A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.

  17. A study of finite mixture model: Bayesian approach on financial time series data

    NASA Astrophysics Data System (ADS)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-07-01

    Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.

  18. Statistical downscaling of GCM simulations to streamflow using relevance vector machine

    NASA Astrophysics Data System (ADS)

    Ghosh, Subimal; Mujumdar, P. P.

    2008-01-01

    General circulation models (GCMs), the climate models often used in assessing the impact of climate change, operate on a coarse scale and thus the simulation results obtained from GCMs are not particularly useful in a comparatively smaller river basin scale hydrology. The article presents a methodology of statistical downscaling based on sparse Bayesian learning and Relevance Vector Machine (RVM) to model streamflow at river basin scale for monsoon period (June, July, August, September) using GCM simulated climatic variables. NCEP/NCAR reanalysis data have been used for training the model to establish a statistical relationship between streamflow and climatic variables. The relationship thus obtained is used to project the future streamflow from GCM simulations. The statistical methodology involves principal component analysis, fuzzy clustering and RVM. Different kernel functions are used for comparison purpose. The model is applied to Mahanadi river basin in India. The results obtained using RVM are compared with those of state-of-the-art Support Vector Machine (SVM) to present the advantages of RVMs over SVMs. A decreasing trend is observed for monsoon streamflow of Mahanadi due to high surface warming in future, with the CCSR/NIES GCM and B2 scenario.

  19. Bayesian soft X-ray tomography using non-stationary Gaussian Processes

    NASA Astrophysics Data System (ADS)

    Li, Dong; Svensson, J.; Thomsen, H.; Medina, F.; Werner, A.; Wolf, R.

    2013-08-01

    In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.

  20. Bayesian soft X-ray tomography using non-stationary Gaussian Processes.

    PubMed

    Li, Dong; Svensson, J; Thomsen, H; Medina, F; Werner, A; Wolf, R

    2013-08-01

    In this study, a Bayesian based non-stationary Gaussian Process (GP) method for the inference of soft X-ray emissivity distribution along with its associated uncertainties has been developed. For the investigation of equilibrium condition and fast magnetohydrodynamic behaviors in nuclear fusion plasmas, it is of importance to infer, especially in the plasma center, spatially resolved soft X-ray profiles from a limited number of noisy line integral measurements. For this ill-posed inversion problem, Bayesian probability theory can provide a posterior probability distribution over all possible solutions under given model assumptions. Specifically, the use of a non-stationary GP to model the emission allows the model to adapt to the varying length scales of the underlying diffusion process. In contrast to other conventional methods, the prior regularization is realized in a probability form which enhances the capability of uncertainty analysis, in consequence, scientists who concern the reliability of their results will benefit from it. Under the assumption of normally distributed noise, the posterior distribution evaluated at a discrete number of points becomes a multivariate normal distribution whose mean and covariance are analytically available, making inversions and calculation of uncertainty fast. Additionally, the hyper-parameters embedded in the model assumption can be optimized through a Bayesian Occam's Razor formalism and thereby automatically adjust the model complexity. This method is shown to produce convincing reconstructions and good agreements with independently calculated results from the Maximum Entropy and Equilibrium-Based Iterative Tomography Algorithm methods.

  1. [Alcohol consumption and positive alcohol expectancies in young adults: a typological approach using TwoStep cluster].

    PubMed

    Vautier, S; Jmel, S; Fourio, C; Moncany, D

    2007-09-01

    The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.

  2. Bayesian Inversion of 2D Models from Airborne Transient EM Data

    NASA Astrophysics Data System (ADS)

    Blatter, D. B.; Key, K.; Ray, A.

    2016-12-01

    The inherent non-uniqueness in most geophysical inverse problems leads to an infinite number of Earth models that fit observed data to within an adequate tolerance. To resolve this ambiguity, traditional inversion methods based on optimization techniques such as the Gauss-Newton and conjugate gradient methods rely on an additional regularization constraint on the properties that an acceptable model can possess, such as having minimal roughness. While allowing such an inversion scheme to converge on a solution, regularization makes it difficult to estimate the uncertainty associated with the model parameters. This is because regularization biases the inversion process toward certain models that satisfy the regularization constraint and away from others that don't, even when both may suitably fit the data. By contrast, a Bayesian inversion framework aims to produce not a single `most acceptable' model but an estimate of the posterior likelihood of the model parameters, given the observed data. In this work, we develop a 2D Bayesian framework for the inversion of transient electromagnetic (TEM) data. Our method relies on a reversible-jump Markov Chain Monte Carlo (RJ-MCMC) Bayesian inverse method with parallel tempering. Previous gradient-based inversion work in this area used a spatially constrained scheme wherein individual (1D) soundings were inverted together and non-uniqueness was tackled by using lateral and vertical smoothness constraints. By contrast, our work uses a 2D model space of Voronoi cells whose parameterization (including number of cells) is fully data-driven. To make the problem work practically, we approximate the forward solution for each TEM sounding using a local 1D approximation where the model is obtained from the 2D model by retrieving a vertical profile through the Voronoi cells. The implicit parsimony of the Bayesian inversion process leads to the simplest models that adequately explain the data, obviating the need for explicit smoothness constraints. In addition, credible intervals in model space are directly obtained, resolving some of the uncertainty introduced by regularization. An example application shows how the method can be used to quantify the uncertainty in airborne EM soundings for imaging subglacial brine channels and groundwater systems.

  3. Bayesian Inference for Signal-Based Seismic Monitoring

    NASA Astrophysics Data System (ADS)

    Moore, D.

    2015-12-01

    Traditional seismic monitoring systems rely on discrete detections produced by station processing software, discarding significant information present in the original recorded signal. SIG-VISA (Signal-based Vertically Integrated Seismic Analysis) is a system for global seismic monitoring through Bayesian inference on seismic signals. By modeling signals directly, our forward model is able to incorporate a rich representation of the physics underlying the signal generation process, including source mechanisms, wave propagation, and station response. This allows inference in the model to recover the qualitative behavior of recent geophysical methods including waveform matching and double-differencing, all as part of a unified Bayesian monitoring system that simultaneously detects and locates events from a global network of stations. We demonstrate recent progress in scaling up SIG-VISA to efficiently process the data stream of global signals recorded by the International Monitoring System (IMS), including comparisons against existing processing methods that show increased sensitivity from our signal-based model and in particular the ability to locate events (including aftershock sequences that can tax analyst processing) precisely from waveform correlation effects. We also provide a Bayesian analysis of an alleged low-magnitude event near the DPRK test site in May 2010 [1] [2], investigating whether such an event could plausibly be detected through automated processing in a signal-based monitoring system. [1] Zhang, Miao and Wen, Lianxing. "Seismological Evidence for a Low-Yield Nuclear Test on 12 May 2010 in North Korea". Seismological Research Letters, January/February 2015. [2] Richards, Paul. "A Seismic Event in North Korea on 12 May 2010". CTBTO SnT 2015 oral presentation, video at https://video-archive.ctbto.org/index.php/kmc/preview/partner_id/103/uiconf_id/4421629/entry_id/0_ymmtpps0/delivery/http

  4. Numerical study on the sequential Bayesian approach for radioactive materials detection

    NASA Astrophysics Data System (ADS)

    Qingpei, Xiang; Dongfeng, Tian; Jianyu, Zhu; Fanhua, Hao; Ge, Ding; Jun, Zeng

    2013-01-01

    A new detection method, based on the sequential Bayesian approach proposed by Candy et al., offers new horizons for the research of radioactive detection. Compared with the commonly adopted detection methods incorporated with statistical theory, the sequential Bayesian approach offers the advantages of shorter verification time during the analysis of spectra that contain low total counts, especially in complex radionuclide components. In this paper, a simulation experiment platform implanted with the methodology of sequential Bayesian approach was developed. Events sequences of γ-rays associating with the true parameters of a LaBr3(Ce) detector were obtained based on an events sequence generator using Monte Carlo sampling theory to study the performance of the sequential Bayesian approach. The numerical experimental results are in accordance with those of Candy. Moreover, the relationship between the detection model and the event generator, respectively represented by the expected detection rate (Am) and the tested detection rate (Gm) parameters, is investigated. To achieve an optimal performance for this processor, the interval of the tested detection rate as a function of the expected detection rate is also presented.

  5. Bayesian Data-Model Fit Assessment for Structural Equation Modeling

    ERIC Educational Resources Information Center

    Levy, Roy

    2011-01-01

    Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…

  6. Banding of NMR-derived Methyl Order Parameters: Implications for Protein Dynamics

    PubMed Central

    Sharp, Kim A.; Kasinath, Vignesh; Wand, A. Joshua

    2014-01-01

    Our understanding of protein folding, stability and function has begun to more explicitly incorporate dynamical aspects. Nuclear magnetic resonance has emerged as a powerful experimental method for obtaining comprehensive site-resolved insight into protein motion. It has been observed that methyl-group motion tends to cluster into three “classes” when expressed in terms of the popular Lipari-Szabo model-free squared generalized order parameter. Here the origins of the three classes or bands in the distribution of order parameters are examined. As a first step, a Bayesian based approach, which makes no a priori assumption about the existence or number of bands, is developed to detect the banding of O2axis values derived either from NMR experiments or molecular dynamics simulations. The analysis is applied to seven proteins with extensive molecular dynamics simulations of these proteins in explicit water to examine the relationship between O2 and fine details of the motion of methyl bearing side chains. All of the proteins studied display banding, with some subtle differences. We propose a very simple yet plausible physical mechanism for banding. Finally, our Bayesian method is used to analyze the measured distributions of methyl group motions in the catabolite activating protein and several of its mutants in various liganded states and discuss the functional implications of the observed banding to protein dynamics and function. PMID:24677353

  7. Fast Low-Rank Bayesian Matrix Completion With Hierarchical Gaussian Prior Models

    NASA Astrophysics Data System (ADS)

    Yang, Linxiao; Fang, Jun; Duan, Huiping; Li, Hongbin; Zeng, Bing

    2018-06-01

    The problem of low rank matrix completion is considered in this paper. To exploit the underlying low-rank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the low-rank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a low-rank solution. Based on the proposed hierarchical prior model, a variational Bayesian method is developed for matrix completion, where the generalized approximate massage passing (GAMP) technique is embedded into the variational Bayesian inference in order to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over existing state-of-the-art matrix completion methods.

  8. Bayesian estimation of dynamic matching function for U-V analysis in Japan

    NASA Astrophysics Data System (ADS)

    Kyo, Koki; Noda, Hideo; Kitagawa, Genshiro

    2012-05-01

    In this paper we propose a Bayesian method for analyzing unemployment dynamics. We derive a Beveridge curve for unemployment and vacancy (U-V) analysis from a Bayesian model based on a labor market matching function. In our framework, the efficiency of matching and the elasticities of new hiring with respect to unemployment and vacancy are regarded as time varying parameters. To construct a flexible model and obtain reasonable estimates in an underdetermined estimation problem, we treat the time varying parameters as random variables and introduce smoothness priors. The model is then described in a state space representation, enabling the parameter estimation to be carried out using Kalman filter and fixed interval smoothing. In such a representation, dynamic features of the cyclic unemployment rate and the structural-frictional unemployment rate can be accurately captured.

  9. Prediction and assimilation of surf-zone processes using a Bayesian network: Part II: Inverse models

    USGS Publications Warehouse

    Plant, Nathaniel G.; Holland, K. Todd

    2011-01-01

    A Bayesian network model has been developed to simulate a relatively simple problem of wave propagation in the surf zone (detailed in Part I). Here, we demonstrate that this Bayesian model can provide both inverse modeling and data-assimilation solutions for predicting offshore wave heights and depth estimates given limited wave-height and depth information from an onshore location. The inverse method is extended to allow data assimilation using observational inputs that are not compatible with deterministic solutions of the problem. These inputs include sand bar positions (instead of bathymetry) and estimates of the intensity of wave breaking (instead of wave-height observations). Our results indicate that wave breaking information is essential to reduce prediction errors. In many practical situations, this information could be provided from a shore-based observer or from remote-sensing systems. We show that various combinations of the assimilated inputs significantly reduce the uncertainty in the estimates of water depths and wave heights in the model domain. Application of the Bayesian network model to new field data demonstrated significant predictive skill (R2 = 0.7) for the inverse estimate of a month-long time series of offshore wave heights. The Bayesian inverse results include uncertainty estimates that were shown to be most accurate when given uncertainty in the inputs (e.g., depth and tuning parameters). Furthermore, the inverse modeling was extended to directly estimate tuning parameters associated with the underlying wave-process model. The inverse estimates of the model parameters not only showed an offshore wave height dependence consistent with results of previous studies but the uncertainty estimates of the tuning parameters also explain previously reported variations in the model parameters.

  10. Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

    PubMed Central

    2012-01-01

    Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944

  11. Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: an example from a vertigo phase III study with longitudinal count data as primary endpoint.

    PubMed

    Adrion, Christine; Mansmann, Ulrich

    2012-09-10

    A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.

  12. Predicting summer residential electricity demand across the U.S.A using climate information

    NASA Astrophysics Data System (ADS)

    Sun, X.; Wang, S.; Lall, U.

    2017-12-01

    We developed a Bayesian Hierarchical model to predict monthly residential per capita electricity consumption at the state level across the USA using climate information. The summer period was selected since cooling requirements may be directly associated with electricity use, while for winter a mix of energy sources may be used to meet heating needs. Historical monthly electricity consumption data from 1990 to 2013 were used to build a predictive model with a set of corresponding climate and non-climate covariates. A clustering analysis was performed first to identify groups of states that had similar temporal patterns for the cooling degree days of each state. Then, a partial pooling model was applied to each cluster to assess the sensitivity of monthly per capita residential electricity demand to each predictor (including cooling-degree-days, gross domestic product (GDP) per capita, per capita electricity demand of previous month and previous year, and the residential electricity price). The sensitivity of residential electricity to cooling-degree-days has an identifiable geographic distribution with higher values in northeastern United States.

  13. Bayesian survival analysis in clinical trials: What methods are used in practice?

    PubMed

    Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V

    2017-02-01

    Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment effect was based on historical data in only four trials. Decision rules were pre-defined in eight cases when trials used Bayesian monitoring, and in only one case when trials adopted a Bayesian approach to the final analysis. Conclusion Few trials implemented a Bayesian survival analysis and few incorporated external data into priors. There is scope to improve the quality of reporting of Bayesian methods in survival trials. Extension of the Consolidated Standards of Reporting Trials statement for reporting Bayesian clinical trials is recommended.

  14. Statistical Bayesian method for reliability evaluation based on ADT data

    NASA Astrophysics Data System (ADS)

    Lu, Dawei; Wang, Lizhi; Sun, Yusheng; Wang, Xiaohong

    2018-05-01

    Accelerated degradation testing (ADT) is frequently conducted in the laboratory to predict the products’ reliability under normal operating conditions. Two kinds of methods, degradation path models and stochastic process models, are utilized to analyze degradation data and the latter one is the most popular method. However, some limitations like imprecise solution process and estimation result of degradation ratio still exist, which may affect the accuracy of the acceleration model and the extrapolation value. Moreover, the conducted solution of this problem, Bayesian method, lose key information when unifying the degradation data. In this paper, a new data processing and parameter inference method based on Bayesian method is proposed to handle degradation data and solve the problems above. First, Wiener process and acceleration model is chosen; Second, the initial values of degradation model and parameters of prior and posterior distribution under each level is calculated with updating and iteration of estimation values; Third, the lifetime and reliability values are estimated on the basis of the estimation parameters; Finally, a case study is provided to demonstrate the validity of the proposed method. The results illustrate that the proposed method is quite effective and accuracy in estimating the lifetime and reliability of a product.

  15. Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis.

    PubMed

    Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V

    2016-11-01

    There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.

  16. Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

    PubMed

    Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

    2015-01-01

    Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.

  17. Population structure, genetic diversity and downy mildew resistance among Ocimum species germplasm.

    PubMed

    Pyne, Robert M; Honig, Josh A; Vaiciunas, Jennifer; Wyenandt, Christian A; Simon, James E

    2018-04-23

    The basil (Ocimum spp.) genus maintains a rich diversity of phenotypes and aromatic volatiles through natural and artificial outcrossing. Characterization of population structure and genetic diversity among a representative sample of this genus is severely lacking. Absence of such information has slowed breeding efforts and the development of sweet basil (Ocimum basilicum L.) with resistance to the worldwide downy mildew epidemic, caused by the obligate oomycete Peronospora belbahrii. In an effort to improve classification of relationships 20 EST-SSR markers with species-level transferability were developed and used to resolve relationships among a diverse panel of 180 Ocimum spp. accessions with varying response to downy mildew. Results obtained from nested Bayesian model-based clustering, analysis of molecular variance and unweighted pair group method using arithmetic average (UPGMA) analyses were synergized to provide an updated phylogeny of the Ocimum genus. Three (major) and seven (sub) population (cluster) models were identified and well-supported (P < 0.001) by PhiPT (Φ PT ) values of 0.433 and 0.344, respectively. Allelic frequency among clusters supported previously developed hypotheses of allopolyploid genome structure. Evidence of cryptic population structure was demonstrated for the k1 O. basilicum cluster suggesting prevalence of gene flow. UPGMA analysis provided best resolution for the 36-accession, DM resistant k3 cluster with consistently strong bootstrap support. Although the k3 cluster is a rich source of DM resistance introgression of resistance into the commercially important k1 accessions is impeded by reproductive barriers as demonstrated by multiple sterile F1 hybrids. The k2 cluster located between k1 and k3, represents a source of transferrable tolerance evidenced by fertile backcross progeny. The 90-accession k1 cluster was largely susceptible to downy mildew with accession 'MRI' representing the only source of DM resistance. High levels of genetic diversity support the observed phenotypic diversity among Ocimum spp. accessions. EST-SSRs provided a robust evaluation of molecular diversity and can be used for additional studies to increase resolution of genetic relationships in the Ocimum genus. Elucidation of population structure and genetic relationships among Ocimum spp. germplasm provide the foundation for improved DM resistance breeding strategies and more rapid response to future disease outbreaks.

  18. A guide to Bayesian model selection for ecologists

    USGS Publications Warehouse

    Hooten, Mevin B.; Hobbs, N.T.

    2015-01-01

    The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.

  19. Observation uncertainty in reversible Markov chains.

    PubMed

    Metzner, Philipp; Weber, Marcus; Schütte, Christof

    2010-09-01

    In many applications one is interested in finding a simplified model which captures the essential dynamical behavior of a real life process. If the essential dynamics can be assumed to be (approximately) memoryless then a reasonable choice for a model is a Markov model whose parameters are estimated by means of Bayesian inference from an observed time series. We propose an efficient Monte Carlo Markov chain framework to assess the uncertainty of the Markov model and related observables. The derived Gibbs sampler allows for sampling distributions of transition matrices subject to reversibility and/or sparsity constraints. The performance of the suggested sampling scheme is demonstrated and discussed for a variety of model examples. The uncertainty analysis of functions of the Markov model under investigation is discussed in application to the identification of conformations of the trialanine molecule via Robust Perron Cluster Analysis (PCCA+) .

  20. Accelerated test system strength models based on Birnbaum-Saunders distribution: a complete Bayesian analysis and comparison.

    PubMed

    Upadhyay, S K; Mukherjee, Bhaswati; Gupta, Ashutosh

    2009-09-01

    Several models for studies related to tensile strength of materials are proposed in the literature where the size or length component has been taken to be an important factor for studying the specimens' failure behaviour. An important model, developed on the basis of cumulative damage approach, is the three-parameter extension of the Birnbaum-Saunders fatigue model that incorporates size of the specimen as an additional variable. This model is a strong competitor of the commonly used Weibull model and stands better than the traditional models, which do not incorporate the size effect. The paper considers two such cumulative damage models, checks their compatibility with a real dataset, compares them with some of the recent toolkits, and finally recommends a model, which appears an appropriate one. Throughout the study is Bayesian based on Markov chain Monte Carlo simulation.

Top