Sample records for large-scale bayesian logistic

  1. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes.

    PubMed

    Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel

    2011-05-23

    Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.

  2. Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

    PubMed

    Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

    2015-11-04

    Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.

  3. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes

    PubMed Central

    2011-01-01

    Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357

  4. Bayesian hierarchical model for large-scale covariance matrix estimation.

    PubMed

    Zhu, Dongxiao; Hero, Alfred O

    2007-12-01

    Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.

  5. Novel method to construct large-scale design space in lubrication process utilizing Bayesian estimation based on a small-scale design-of-experiment and small sets of large-scale manufacturing data.

    PubMed

    Maeda, Jin; Suzuki, Tatsuya; Takayama, Kozo

    2012-12-01

    A large-scale design space was constructed using a Bayesian estimation method with a small-scale design of experiments (DoE) and small sets of large-scale manufacturing data without enforcing a large-scale DoE. The small-scale DoE was conducted using various Froude numbers (X(1)) and blending times (X(2)) in the lubricant blending process for theophylline tablets. The response surfaces, design space, and their reliability of the compression rate of the powder mixture (Y(1)), tablet hardness (Y(2)), and dissolution rate (Y(3)) on a small scale were calculated using multivariate spline interpolation, a bootstrap resampling technique, and self-organizing map clustering. The constant Froude number was applied as a scale-up rule. Three experiments under an optimal condition and two experiments under other conditions were performed on a large scale. The response surfaces on the small scale were corrected to those on a large scale by Bayesian estimation using the large-scale results. Large-scale experiments under three additional sets of conditions showed that the corrected design space was more reliable than that on the small scale, even if there was some discrepancy in the pharmaceutical quality between the manufacturing scales. This approach is useful for setting up a design space in pharmaceutical development when a DoE cannot be performed at a commercial large manufacturing scale.

  6. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    PubMed

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  7. Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob; Poll, Scott; Kurtoglu, Tolga

    2009-01-01

    This CD contains files that support the talk (see CASI ID 20100021404). There are 24 models that relate to the ADAPT system and 1 Excel worksheet. In the paper an investigation into the use of Bayesian networks to construct large-scale diagnostic systems is described. The high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems are described in the talk. The data in the CD are the models of the 24 different power systems.

  8. Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob; Poll, Scott; Kurtoglu, Tolga

    2009-01-01

    In this paper, we investigate the use of Bayesian networks to construct large-scale diagnostic systems. In particular, we consider the development of large-scale Bayesian networks by composition. This compositional approach reflects how (often redundant) subsystems are architected to form systems such as electrical power systems. We develop high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems. The largest among these 24 Bayesian networks contains over 1,000 random variables. Another BN represents the real-world electrical power system ADAPT, which is representative of electrical power systems deployed in aerospace vehicles. In addition to demonstrating the scalability of the compositional approach, we briefly report on experimental results from the diagnostic competition DXC, where the ProADAPT team, using techniques discussed here, obtained the highest scores in both Tier 1 (among 9 international competitors) and Tier 2 (among 6 international competitors) of the industrial track. While we consider diagnosis of power systems specifically, we believe this work is relevant to other system health management problems, in particular in dependable systems such as aircraft and spacecraft. (See CASI ID 20100021910 for supplemental data disk.)

  9. Bayesian Estimation of the Logistic Positive Exponent IRT Model

    ERIC Educational Resources Information Center

    Bolfarine, Heleno; Bazan, Jorge Luis

    2010-01-01

    A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…

  10. Careful with Those Priors: A Note on Bayesian Estimation in Two-Parameter Logistic Item Response Theory Models

    ERIC Educational Resources Information Center

    Marcoulides, Katerina M.

    2018-01-01

    This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…

  11. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  12. Validating Bayesian truth serum in large-scale online human experiments.

    PubMed

    Frank, Morgan R; Cebrian, Manuel; Pickard, Galen; Rahwan, Iyad

    2017-01-01

    Bayesian truth serum (BTS) is an exciting new method for improving honesty and information quality in multiple-choice survey, but, despite the method's mathematical reliance on large sample sizes, existing literature about BTS only focuses on small experiments. Combined with the prevalence of online survey platforms, such as Amazon's Mechanical Turk, which facilitate surveys with hundreds or thousands of participants, BTS must be effective in large-scale experiments for BTS to become a readily accepted tool in real-world applications. We demonstrate that BTS quantifiably improves honesty in large-scale online surveys where the "honest" distribution of answers is known in expectation on aggregate. Furthermore, we explore a marketing application where "honest" answers cannot be known, but find that BTS treatment impacts the resulting distributions of answers.

  13. Validating Bayesian truth serum in large-scale online human experiments

    PubMed Central

    Frank, Morgan R.; Cebrian, Manuel; Pickard, Galen; Rahwan, Iyad

    2017-01-01

    Bayesian truth serum (BTS) is an exciting new method for improving honesty and information quality in multiple-choice survey, but, despite the method’s mathematical reliance on large sample sizes, existing literature about BTS only focuses on small experiments. Combined with the prevalence of online survey platforms, such as Amazon’s Mechanical Turk, which facilitate surveys with hundreds or thousands of participants, BTS must be effective in large-scale experiments for BTS to become a readily accepted tool in real-world applications. We demonstrate that BTS quantifiably improves honesty in large-scale online surveys where the “honest” distribution of answers is known in expectation on aggregate. Furthermore, we explore a marketing application where “honest” answers cannot be known, but find that BTS treatment impacts the resulting distributions of answers. PMID:28494000

  14. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.

  15. Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biros, George

    Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less

  16. Predicting Bison Migration out of Yellowstone National Park Using Bayesian Models

    PubMed Central

    Geremia, Chris; White, P. J.; Wallen, Rick L.; Watson, Fred G. R.; Treanor, John J.; Borkowski, John; Potter, Christopher S.; Crabtree, Robert L.

    2011-01-01

    Long distance migrations by ungulate species often surpass the boundaries of preservation areas where conflicts with various publics lead to management actions that can threaten populations. We chose the partially migratory bison (Bison bison) population in Yellowstone National Park as an example of integrating science into management policies to better conserve migratory ungulates. Approximately 60% of these bison have been exposed to bovine brucellosis and thousands of migrants exiting the park boundary have been culled during the past two decades to reduce the risk of disease transmission to cattle. Data were assimilated using models representing competing hypotheses of bison migration during 1990–2009 in a hierarchal Bayesian framework. Migration differed at the scale of herds, but a single unifying logistic model was useful for predicting migrations by both herds. Migration beyond the northern park boundary was affected by herd size, accumulated snow water equivalent, and aboveground dried biomass. Migration beyond the western park boundary was less influenced by these predictors and process model performance suggested an important control on recent migrations was excluded. Simulations of migrations over the next decade suggest that allowing increased numbers of bison beyond park boundaries during severe climate conditions may be the only means of avoiding episodic, large-scale reductions to the Yellowstone bison population in the foreseeable future. This research is an example of how long distance migration dynamics can be incorporated into improved management policies. PMID:21340035

  17. Applying Bayesian Modeling and Receiver Operating Characteristic Methodologies for Test Utility Analysis

    ERIC Educational Resources Information Center

    Wang, Qiu; Diemer, Matthew A.; Maier, Kimberly S.

    2013-01-01

    This study integrated Bayesian hierarchical modeling and receiver operating characteristic analysis (BROCA) to evaluate how interest strength (IS) and interest differentiation (ID) predicted low–socioeconomic status (SES) youth's interest-major congruence (IMC). Using large-scale Kuder Career Search online-assessment data, this study fit three…

  18. A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment

    PubMed Central

    2018-01-01

    We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias. PMID:29614129

  19. Bayesian Hierarchical Modeling for Big Data Fusion in Soil Hydrology

    NASA Astrophysics Data System (ADS)

    Mohanty, B.; Kathuria, D.; Katzfuss, M.

    2016-12-01

    Soil moisture datasets from remote sensing (RS) platforms (such as SMOS and SMAP) and reanalysis products from land surface models are typically available on a coarse spatial granularity of several square km. Ground based sensors on the other hand provide observations on a finer spatial scale (meter scale or less) but are sparsely available. Soil moisture is affected by high variability due to complex interactions between geologic, topographic, vegetation and atmospheric variables. Hydrologic processes usually occur at a scale of 1 km or less and therefore spatially ubiquitous and temporally periodic soil moisture products at this scale are required to aid local decision makers in agriculture, weather prediction and reservoir operations. Past literature has largely focused on downscaling RS soil moisture for a small extent of a field or a watershed and hence the applicability of such products has been limited. The present study employs a spatial Bayesian Hierarchical Model (BHM) to derive soil moisture products at a spatial scale of 1 km for the state of Oklahoma by fusing point scale Mesonet data and coarse scale RS data for soil moisture and its auxiliary covariates such as precipitation, topography, soil texture and vegetation. It is seen that the BHM model handles change of support problems easily while performing accurate uncertainty quantification arising from measurement errors and imperfect retrieval algorithms. The computational challenge arising due to the large number of measurements is tackled by utilizing basis function approaches and likelihood approximations. The BHM model can be considered as a complex Bayesian extension of traditional geostatistical prediction methods (such as Kriging) for large datasets in the presence of uncertainties.

  20. Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data

    ERIC Educational Resources Information Center

    Lee, Sik-Yum

    2006-01-01

    A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…

  1. Sequential Inverse Problems Bayesian Principles and the Logistic Map Example

    NASA Astrophysics Data System (ADS)

    Duan, Lian; Farmer, Chris L.; Moroz, Irene M.

    2010-09-01

    Bayesian statistics provides a general framework for solving inverse problems, but is not without interpretation and implementation problems. This paper discusses difficulties arising from the fact that forward models are always in error to some extent. Using a simple example based on the one-dimensional logistic map, we argue that, when implementation problems are minimal, the Bayesian framework is quite adequate. In this paper the Bayesian Filter is shown to be able to recover excellent state estimates in the perfect model scenario (PMS) and to distinguish the PMS from the imperfect model scenario (IMS). Through a quantitative comparison of the way in which the observations are assimilated in both the PMS and the IMS scenarios, we suggest that one can, sometimes, measure the degree of imperfection.

  2. A Bayesian method for assessing multiscalespecies-habitat relationships

    USGS Publications Warehouse

    Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.

    2017-01-01

    ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and testing hypotheses of scaling relationships.

  3. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  4. Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.

    PubMed

    Caillet, Pascal; Klemm, Sarah; Ducher, Michel; Aussem, Alexandre; Schott, Anne-Marie

    2015-01-01

    Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach. EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences. Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density. Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.

  5. A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.

    PubMed

    López Puga, Jorge; García García, Juan

    2012-11-01

    Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.

  6. Statistical modeling for Bayesian extrapolation of adult clinical trial information in pediatric drug evaluation.

    PubMed

    Gamalo-Siebers, Margaret; Savic, Jasmina; Basu, Cynthia; Zhao, Xin; Gopalakrishnan, Mathangi; Gao, Aijun; Song, Guochen; Baygani, Simin; Thompson, Laura; Xia, H Amy; Price, Karen; Tiwari, Ram; Carlin, Bradley P

    2017-07-01

    Children represent a large underserved population of "therapeutic orphans," as an estimated 80% of children are treated off-label. However, pediatric drug development often faces substantial challenges, including economic, logistical, technical, and ethical barriers, among others. Among many efforts trying to remove these barriers, increased recent attention has been paid to extrapolation; that is, the leveraging of available data from adults or older age groups to draw conclusions for the pediatric population. The Bayesian statistical paradigm is natural in this setting, as it permits the combining (or "borrowing") of information across disparate sources, such as the adult and pediatric data. In this paper, authored by the pediatric subteam of the Drug Information Association Bayesian Scientific Working Group and Adaptive Design Working Group, we develop, illustrate, and provide suggestions on Bayesian statistical methods that could be used to design improved pediatric development programs that use all available information in the most efficient manner. A variety of relevant Bayesian approaches are described, several of which are illustrated through 2 case studies: extrapolating adult efficacy data to expand the labeling for Remicade to include pediatric ulcerative colitis and extrapolating adult exposure-response information for antiepileptic drugs to pediatrics. Copyright © 2017 John Wiley & Sons, Ltd.

  7. The interplay of stressful life events and coping skills on risk for suicidal behavior among youth students in contemporary China: a large scale cross-sectional study.

    PubMed

    Tang, Fang; Xue, Fuzhong; Qin, Ping

    2015-07-31

    Stressful life events are common among youth students and may induce psychological problems and even suicidal behaviors in those with poor coping skills. This study aims to assess the influence of stressful life events and coping skills on risk for suicidal behavior and to elucidate the underlying mechanism using a large sample of university students in China. 5972 students, randomly selected from 6 universities, completed the questionnaire survey. Logistic regression analysis was performed to estimate the effect of stressful life events and coping skills on risk for suicidal behavior. Bayesian network was further adopted to probe their probabilistic relationships. Of the 5972 students, 7.64% reported the presence of suicidal behavior (attempt or ideation) within the past one year period. Stressful life events such as strong conflicts with classmates and a failure in study exam constituted strong risk factors for suicidal behavior. The influence of coping skills varied according to the strategies adapted toward problems with a high score of approach coping skills significantly associated with a reduced risk of suicidal behavior. The Bayesian network indicated that the probability of suicidal behavior associated with specific life events was to a large extent conditional on coping skills. For instance, a stressful experience of having strong conflicts with classmates could result in a probability of suicidal behavior of 21.25% and 15.36% respectively, for female and male students with the score of approach coping skills under the average. Stressful life events and deficient coping skills are strong risk factors for suicidal behavior among youth students. The results underscore the importance of prevention efforts to improve coping skills towards stressful life events.

  8. Impact of Colic Pain as a Significant Factor for Predicting the Stone Free Rate of One-Session Shock Wave Lithotripsy for Treating Ureter Stones: A Bayesian Logistic Regression Model Analysis

    PubMed Central

    Chung, Doo Yong; Cho, Kang Su; Lee, Dae Hun; Han, Jang Hee; Kang, Dong Hyuk; Jung, Hae Do; Kown, Jong Kyou; Ham, Won Sik; Choi, Young Deuk; Lee, Joo Yong

    2015-01-01

    Purpose This study was conducted to evaluate colic pain as a prognostic pretreatment factor that can influence ureter stone clearance and to estimate the probability of stone-free status in shock wave lithotripsy (SWL) patients with a ureter stone. Materials and Methods We retrospectively reviewed the medical records of 1,418 patients who underwent their first SWL between 2005 and 2013. Among these patients, 551 had a ureter stone measuring 4–20 mm and were thus eligible for our analyses. The colic pain as the chief complaint was defined as either subjective flank pain during history taking and physical examination. Propensity-scores for established for colic pain was calculated for each patient using multivariate logistic regression based upon the following covariates: age, maximal stone length (MSL), and mean stone density (MSD). Each factor was evaluated as predictor for stone-free status by Bayesian and non-Bayesian logistic regression model. Results After propensity-score matching, 217 patients were extracted in each group from the total patient cohort. There were no statistical differences in variables used in propensity- score matching. One-session success and stone-free rate were also higher in the painful group (73.7% and 71.0%, respectively) than in the painless group (63.6% and 60.4%, respectively). In multivariate non-Bayesian and Bayesian logistic regression models, a painful stone, shorter MSL, and lower MSD were significant factors for one-session stone-free status in patients who underwent SWL. Conclusions Colic pain in patients with ureter calculi was one of the significant predicting factors including MSL and MSD for one-session stone-free status of SWL. PMID:25902059

  9. Past and present cosmic structure in the SDSS DR7 main sample

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jasche, J.; Leclercq, F.; Wandelt, B.D., E-mail: jasche@iap.fr, E-mail: florent.leclercq@polytechnique.org, E-mail: wandelt@iap.fr

    2015-01-01

    We present a chrono-cosmography project, aiming at the inference of the four dimensional formation history of the observed large scale structure from its origin to the present epoch. To do so, we perform a full-scale Bayesian analysis of the northern galactic cap of the Sloan Digital Sky Survey (SDSS) Data Release 7 main galaxy sample, relying on a fully probabilistic, physical model of the non-linearly evolved density field. Besides inferring initial conditions from observations, our methodology naturally and accurately reconstructs non-linear features at the present epoch, such as walls and filaments, corresponding to high-order correlation functions generated by late-time structuremore » formation. Our inference framework self-consistently accounts for typical observational systematic and statistical uncertainties such as noise, survey geometry and selection effects. We further account for luminosity dependent galaxy biases and automatic noise calibration within a fully Bayesian approach. As a result, this analysis provides highly-detailed and accurate reconstructions of the present density field on scales larger than ∼ 3 Mpc/h, constrained by SDSS observations. This approach also leads to the first quantitative inference of plausible formation histories of the dynamic large scale structure underlying the observed galaxy distribution. The results described in this work constitute the first full Bayesian non-linear analysis of the cosmic large scale structure with the demonstrated capability of uncertainty quantification. Some of these results will be made publicly available along with this work. The level of detail of inferred results and the high degree of control on observational uncertainties pave the path towards high precision chrono-cosmography, the subject of simultaneously studying the dynamics and the morphology of the inhomogeneous Universe.« less

  10. Generalizability of Evidence-Based Assessment Recommendations for Pediatric Bipolar Disorder

    PubMed Central

    Jenkins, Melissa M.; Youngstrom, Eric A.; Youngstrom, Jennifer Kogos; Feeny, Norah C.; Findling, Robert L.

    2013-01-01

    Bipolar disorder is frequently clinically diagnosed in youths who do not actually satisfy DSM-IV criteria, yet cases that would satisfy full DSM-IV criteria are often undetected clinically. Evidence-based assessment methods that incorporate Bayesian reasoning have demonstrated improved diagnostic accuracy, and consistency; however, their clinical utility is largely unexplored. The present study examines the effectiveness of promising evidence-based decision-making compared to the clinical gold standard. Participants were 562 youth, ages 5-17 and predominantly African American, drawn from a community mental health clinic. Research diagnoses combined semi-structured interview with youths’ psychiatric, developmental, and family mental health histories. Independent Bayesian estimates relied on published risk estimates from other samples discriminated bipolar diagnoses, Area Under Curve=.75, p<.00005. The Bayes and confidence ratings correlated rs =.30. Agreement about an evidence-based assessment intervention “threshold model” (wait/assess/treat) had K=.24, p<.05. No potential moderators of agreement between the Bayesian estimates and confidence ratings, including type of bipolar illness, were significant. Bayesian risk estimates were highly correlated with logistic regression estimates using optimal sample weights, r=.81, p<.0005. Clinical and Bayesian approaches agree in terms of overall concordance and deciding next clinical action, even when Bayesian predictions are based on published estimates from clinically and demographically different samples. Evidence-based assessment methods may be useful in settings that cannot routinely employ gold standard assessments, and they may help decrease rates of overdiagnosis while promoting earlier identification of true cases. PMID:22004538

  11. A Bayesian Hierarchical Model for Large-Scale Educational Surveys: An Application to the National Assessment of Educational Progress. Research Report. ETS RR-04-38

    ERIC Educational Resources Information Center

    Johnson, Matthew S.; Jenkins, Frank

    2005-01-01

    Large-scale educational assessments such as the National Assessment of Educational Progress (NAEP) sample examinees to whom an exam will be administered. In most situations the sampling design is not a simple random sample and must be accounted for in the estimating model. After reviewing the current operational estimation procedure for NAEP, this…

  12. Modeling spatially-varying landscape change points in species occurrence thresholds

    USGS Publications Warehouse

    Wagner, Tyler; Midway, Stephen R.

    2014-01-01

    Predicting species distributions at scales of regions to continents is often necessary, as large-scale phenomena influence the distributions of spatially structured populations. Land use and land cover are important large-scale drivers of species distributions, and landscapes are known to create species occurrence thresholds, where small changes in a landscape characteristic results in abrupt changes in occurrence. The value of the landscape characteristic at which this change occurs is referred to as a change point. We present a hierarchical Bayesian threshold model (HBTM) that allows for estimating spatially varying parameters, including change points. Our model also allows for modeling estimated parameters in an effort to understand large-scale drivers of variability in land use and land cover on species occurrence thresholds. We use range-wide detection/nondetection data for the eastern brook trout (Salvelinus fontinalis), a stream-dwelling salmonid, to illustrate our HBTM for estimating and modeling spatially varying threshold parameters in species occurrence. We parameterized the model for investigating thresholds in landscape predictor variables that are measured as proportions, and which are therefore restricted to values between 0 and 1. Our HBTM estimated spatially varying thresholds in brook trout occurrence for both the proportion agricultural and urban land uses. There was relatively little spatial variation in change point estimates, although there was spatial variability in the overall shape of the threshold response and associated uncertainty. In addition, regional mean stream water temperature was correlated to the change point parameters for the proportion of urban land use, with the change point value increasing with increasing mean stream water temperature. We present a framework for quantify macrosystem variability in spatially varying threshold model parameters in relation to important large-scale drivers such as land use and land cover. Although the model presented is a logistic HBTM, it can easily be extended to accommodate other statistical distributions for modeling species richness or abundance.

  13. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.

    PubMed

    Duarte, Belmiro P M; Wong, Weng Kee

    2015-08-01

    This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.

  14. Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach

    PubMed Central

    Duarte, Belmiro P. M.; Wong, Weng Kee

    2014-01-01

    Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159

  15. Application of Bayesian model averaging to measurements of the primordial power spectrum

    NASA Astrophysics Data System (ADS)

    Parkinson, David; Liddle, Andrew R.

    2010-11-01

    Cosmological parameter uncertainties are often stated assuming a particular model, neglecting the model uncertainty, even when Bayesian model selection is unable to identify a conclusive best model. Bayesian model averaging is a method for assessing parameter uncertainties in situations where there is also uncertainty in the underlying model. We apply model averaging to the estimation of the parameters associated with the primordial power spectra of curvature and tensor perturbations. We use CosmoNest and MultiNest to compute the model evidences and posteriors, using cosmic microwave data from WMAP, ACBAR, BOOMERanG, and CBI, plus large-scale structure data from the SDSS DR7. We find that the model-averaged 95% credible interval for the spectral index using all of the data is 0.940

  16. Evaluating scaling models in biology using hierarchical Bayesian approaches

    PubMed Central

    Price, Charles A; Ogle, Kiona; White, Ethan P; Weitz, Joshua S

    2009-01-01

    Theoretical models for allometric relationships between organismal form and function are typically tested by comparing a single predicted relationship with empirical data. Several prominent models, however, predict more than one allometric relationship, and comparisons among alternative models have not taken this into account. Here we evaluate several different scaling models of plant morphology within a hierarchical Bayesian framework that simultaneously fits multiple scaling relationships to three large allometric datasets. The scaling models include: inflexible universal models derived from biophysical assumptions (e.g. elastic similarity or fractal networks), a flexible variation of a fractal network model, and a highly flexible model constrained only by basic algebraic relationships. We demonstrate that variation in intraspecific allometric scaling exponents is inconsistent with the universal models, and that more flexible approaches that allow for biological variability at the species level outperform universal models, even when accounting for relative increases in model complexity. PMID:19453621

  17. Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.

    PubMed

    Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo

    2016-01-01

    In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.

  18. Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.

    PubMed

    Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun

    2016-06-01

    The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.

  19. Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking.

    PubMed

    Lages, Martin; Scheel, Anne

    2016-01-01

    We investigated the proposition of a two-systems Theory of Mind in adults' belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking.

  20. Bayesian logistic regression approaches to predict incorrect DRG assignment.

    PubMed

    Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural

    2018-05-07

    Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.

  1. Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murakami, Haruko; Chen, X.; Hahn, Melanie S.

    2010-10-21

    This study presents a stochastic, three-dimensional characterization of a heterogeneous hydraulic conductivity field within DOE's Hanford 300 Area site, Washington, by assimilating large-scale, constant-rate injection test data with small-scale, three-dimensional electromagnetic borehole flowmeter (EBF) measurement data. We first inverted the injection test data to estimate the transmissivity field, using zeroth-order temporal moments of pressure buildup curves. We applied a newly developed Bayesian geostatistical inversion framework, the method of anchored distributions (MAD), to obtain a joint posterior distribution of geostatistical parameters and local log-transmissivities at multiple locations. The unique aspects of MAD that make it suitable for this purpose are itsmore » ability to integrate multi-scale, multi-type data within a Bayesian framework and to compute a nonparametric posterior distribution. After we combined the distribution of transmissivities with depth-discrete relative-conductivity profile from EBF data, we inferred the three-dimensional geostatistical parameters of the log-conductivity field, using the Bayesian model-based geostatistics. Such consistent use of the Bayesian approach throughout the procedure enabled us to systematically incorporate data uncertainty into the final posterior distribution. The method was tested in a synthetic study and validated using the actual data that was not part of the estimation. Results showed broader and skewed posterior distributions of geostatistical parameters except for the mean, which suggests the importance of inferring the entire distribution to quantify the parameter uncertainty.« less

  2. Bayesian sparse channel estimation

    NASA Astrophysics Data System (ADS)

    Chen, Chulong; Zoltowski, Michael D.

    2012-05-01

    In Orthogonal Frequency Division Multiplexing (OFDM) systems, the technique used to estimate and track the time-varying multipath channel is critical to ensure reliable, high data rate communications. It is recognized that wireless channels often exhibit a sparse structure, especially for wideband and ultra-wideband systems. In order to exploit this sparse structure to reduce the number of pilot tones and increase the channel estimation quality, the application of compressed sensing to channel estimation is proposed. In this article, to make the compressed channel estimation more feasible for practical applications, it is investigated from a perspective of Bayesian learning. Under the Bayesian learning framework, the large-scale compressed sensing problem, as well as large time delay for the estimation of the doubly selective channel over multiple consecutive OFDM symbols, can be avoided. Simulation studies show a significant improvement in channel estimation MSE and less computing time compared to the conventional compressed channel estimation techniques.

  3. A Bayesian Nonparametric Approach to Image Super-Resolution.

    PubMed

    Polatkan, Gungor; Zhou, Mingyuan; Carin, Lawrence; Blei, David; Daubechies, Ingrid

    2015-02-01

    Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called dictionary elements, from the data. Because it is nonparametric, the number of elements found is also determined from the data. We test the results on both benchmark and natural images, comparing with several other models from the research literature. We perform large-scale human evaluation experiments to assess the visual quality of the results. In a first implementation, we use Gibbs sampling to approximate the posterior. However, this algorithm is not feasible for large-scale data. To circumvent this, we then develop an online variational Bayes (VB) algorithm. This algorithm finds high quality dictionaries in a fraction of the time needed by the Gibbs sampler.

  4. Unmasking the masked Universe: the 2M++ catalogue through Bayesian eyes

    NASA Astrophysics Data System (ADS)

    Lavaux, Guilhem; Jasche, Jens

    2016-01-01

    This work describes a full Bayesian analysis of the Nearby Universe as traced by galaxies of the 2M++ survey. The analysis is run in two sequential steps. The first step self-consistently derives the luminosity-dependent galaxy biases, the power spectrum of matter fluctuations and matter density fields within a Gaussian statistic approximation. The second step makes a detailed analysis of the three-dimensional large-scale structures, assuming a fixed bias model and a fixed cosmology. This second step allows for the reconstruction of both the final density field and the initial conditions at z = 1000 assuming a fixed bias model. From these, we derive fields that self-consistently extrapolate the observed large-scale structures. We give two examples of these extrapolation and their utility for the detection of structures: the visibility of the Sloan Great Wall, and the detection and characterization of the Local Void using DIVA, a Lagrangian based technique to classify structures.

  5. Predicting site locations for biomass using facilities with Bayesian methods

    Treesearch

    Timothy M. Young; James H. Perdue; Xia Huang

    2017-01-01

    Logistic regression models combined with Bayesian inference were developed to predict locations and quantify factors that influence the siting of biomass-using facilities that use woody biomass in the Southeastern United States. Predictions were developed for two groups of mills, one representing larger capacity mills similar to pulp and paper mills (Group II...

  6. A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)

    ERIC Educational Resources Information Center

    Arenson, Ethan A.; Karabatsos, George

    2017-01-01

    Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…

  7. A Bayesian Hierarchical Modeling Scheme for Estimating Erosion Rates Under Current Climate Conditions

    NASA Astrophysics Data System (ADS)

    Lowman, L.; Barros, A. P.

    2014-12-01

    Computational modeling of surface erosion processes is inherently difficult because of the four-dimensional nature of the problem and the multiple temporal and spatial scales that govern individual mechanisms. Landscapes are modified via surface and fluvial erosion and exhumation, each of which takes place over a range of time scales. Traditional field measurements of erosion/exhumation rates are scale dependent, often valid for a single point-wise location or averaging over large aerial extents and periods with intense and mild erosion. We present a method of remotely estimating erosion rates using a Bayesian hierarchical model based upon the stream power erosion law (SPEL). A Bayesian approach allows for estimating erosion rates using the deterministic relationship given by the SPEL and data on channel slopes and precipitation at the basin and sub-basin scale. The spatial scale associated with this framework is the elevation class, where each class is characterized by distinct morphologic behavior observed through different modes in the distribution of basin outlet elevations. Interestingly, the distributions of first-order outlets are similar in shape and extent to the distribution of precipitation events (i.e. individual storms) over a 14-year period between 1998-2011. We demonstrate an application of the Bayesian hierarchical modeling framework for five basins and one intermontane basin located in the central Andes between 5S and 20S. Using remotely sensed data of current annual precipitation rates from the Tropical Rainfall Measuring Mission (TRMM) and topography from a high resolution (3 arc-seconds) digital elevation map (DEM), our erosion rate estimates are consistent with decadal-scale estimates based on landslide mapping and sediment flux observations and 1-2 orders of magnitude larger than most millennial and million year timescale estimates from thermochronology and cosmogenic nuclides.

  8. CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

    NASA Astrophysics Data System (ADS)

    Zaveri, Mazad Shaheriar

    The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC implementation. We later use this methodology to investigate the hardware implementations of cortex-scale spiking neural system, which is an approximate neural equivalent of BICM based cortex-scale system. The results of this investigation also suggest that CMOL is a promising candidate to implement such large-scale neuromorphic systems. In general, the assessment of such hypothetical baseline hardware architectures provides the prospects for building large-scale (mammalian cortex-scale) implementations of neuromorphic/Bayesian/intelligent systems using state-of-the-art and beyond state-of-the-art silicon structures.

  9. Large-Scale Optimization for Bayesian Inference in Complex Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Willcox, Karen; Marzouk, Youssef

    2013-11-12

    The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of themore » SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.« less

  10. Final Report: Large-Scale Optimization for Bayesian Inference in Complex Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghattas, Omar

    2013-10-15

    The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimiza- tion) Project focuses on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimiza- tion and inversion methods. Our research is directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. Our efforts are integrated in the context of a challenging testbed problem that considers subsurface reacting flow and transport. The MIT component of the SAGUAROmore » Project addresses the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas-Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to- observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as "reduce then sample" and "sample then reduce." In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.« less

  11. Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking

    PubMed Central

    Lages, Martin; Scheel, Anne

    2016-01-01

    We investigated the proposition of a two-systems Theory of Mind in adults’ belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking. PMID:27853440

  12. BMDS: A Collection of R Functions for Bayesian Multidimensional Scaling

    ERIC Educational Resources Information Center

    Okada, Kensuke; Shigemasu, Kazuo

    2009-01-01

    Bayesian multidimensional scaling (MDS) has attracted a great deal of attention because: (1) it provides a better fit than do classical MDS and ALSCAL; (2) it provides estimation errors of the distances; and (3) the Bayesian dimension selection criterion, MDSIC, provides a direct indication of optimal dimensionality. However, Bayesian MDS is not…

  13. Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

    DTIC Science & Technology

    2015-03-19

    learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but

  14. A functional model for characterizing long-distance movement behaviour

    USGS Publications Warehouse

    Buderman, Frances E.; Hooten, Mevin B.; Ivan, Jacob S.; Shenk, Tanya M.

    2016-01-01

    Advancements in wildlife telemetry techniques have made it possible to collect large data sets of highly accurate animal locations at a fine temporal resolution. These data sets have prompted the development of a number of statistical methodologies for modelling animal movement.Telemetry data sets are often collected for purposes other than fine-scale movement analysis. These data sets may differ substantially from those that are collected with technologies suitable for fine-scale movement modelling and may consist of locations that are irregular in time, are temporally coarse or have large measurement error. These data sets are time-consuming and costly to collect but may still provide valuable information about movement behaviour.We developed a Bayesian movement model that accounts for error from multiple data sources as well as movement behaviour at different temporal scales. The Bayesian framework allows us to calculate derived quantities that describe temporally varying movement behaviour, such as residence time, speed and persistence in direction. The model is flexible, easy to implement and computationally efficient.We apply this model to data from Colorado Canada lynx (Lynx canadensis) and use derived quantities to identify changes in movement behaviour.

  15. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.

  16. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  17. A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data.

    PubMed

    Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

    2013-06-01

    Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.

  18. A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data

    PubMed Central

    Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric

    2014-01-01

    Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720

  19. An allometric scaling relation based on logistic growth of cities

    NASA Astrophysics Data System (ADS)

    Chen, Yanguang

    2014-08-01

    The relationships between urban area and population size have been empirically demonstrated to follow the scaling law of allometric growth. This allometric scaling is based on exponential growth of city size and can be termed "exponential allometry", which is associated with the concepts of fractals. However, both city population and urban area comply with the course of logistic growth rather than exponential growth. In this paper, I will present a new allometric scaling based on logistic growth to solve the abovementioned problem. The logistic growth is a process of replacement dynamics. Defining a pair of replacement quotients as new measurements, which are functions of urban area and population, we can derive an allometric scaling relation from the logistic processes of urban growth, which can be termed "logistic allometry". The exponential allometric relation between urban area and population is the approximate expression of the logistic allometric equation when the city size is not large enough. The proper range of the allometric scaling exponent value is reconsidered through the logistic process. Then, a medium-sized city of Henan Province, China, is employed as an example to validate the new allometric relation. The logistic allometry is helpful for further understanding the fractal property and self-organized process of urban evolution in the right perspective.

  20. Logistic regression accuracy across different spatial and temporal scales for a wide-ranging species, the marbled murrelet

    Treesearch

    Carolyn B. Meyer; Sherri L. Miller; C. John Ralph

    2004-01-01

    The scale at which habitat variables are measured affects the accuracy of resource selection functions in predicting animal use of sites. We used logistic regression models for a wide-ranging species, the marbled murrelet, (Brachyramphus marmoratus) in a large region in California to address how much changing the spatial or temporal scale of...

  1. Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

    Treesearch

    Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

    2005-01-01

    We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...

  2. Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

    ERIC Educational Resources Information Center

    Tsutakawa, Robert K.; Lin, Hsin Ying

    Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…

  3. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  4. A large scale test of the gaming-enhancement hypothesis.

    PubMed

    Przybylski, Andrew K; Wang, John C

    2016-01-01

    A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis , has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people's gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work.

  5. An Evaluation of Hierarchical Bayes Estimation for the Two- Parameter Logistic Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho

    Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item parameters. Simulated data sets were analyzed using two different Bayes estimation procedures, the two-stage hierarchical Bayes estimation (HB2) and the marginal Bayesian with known hyperparameters (MB), and marginal maximum…

  6. Bayesian Analysis of High Dimensional Classification

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  7. Selected aspects of prior and likelihood information for a Bayesian classifier in a road safety analysis.

    PubMed

    Nowakowska, Marzena

    2017-04-01

    The development of the Bayesian logistic regression model classifying the road accident severity is discussed. The already exploited informative priors (method of moments, maximum likelihood estimation, and two-stage Bayesian updating), along with the original idea of a Boot prior proposal, are investigated when no expert opinion has been available. In addition, two possible approaches to updating the priors, in the form of unbalanced and balanced training data sets, are presented. The obtained logistic Bayesian models are assessed on the basis of a deviance information criterion (DIC), highest probability density (HPD) intervals, and coefficients of variation estimated for the model parameters. The verification of the model accuracy has been based on sensitivity, specificity and the harmonic mean of sensitivity and specificity, all calculated from a test data set. The models obtained from the balanced training data set have a better classification quality than the ones obtained from the unbalanced training data set. The two-stage Bayesian updating prior model and the Boot prior model, both identified with the use of the balanced training data set, outperform the non-informative, method of moments, and maximum likelihood estimation prior models. It is important to note that one should be careful when interpreting the parameters since different priors can lead to different models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Multilevel Analysis of Trachomatous Trichiasis and Corneal Opacity in Nigeria: The Role of Environmental and Climatic Risk Factors on the Distribution of Disease.

    PubMed

    Smith, Jennifer L; Sivasubramaniam, Selvaraj; Rabiu, Mansur M; Kyari, Fatima; Solomon, Anthony W; Gilbert, Clare

    2015-01-01

    The distribution of trachoma in Nigeria is spatially heterogeneous, with large-scale trends observed across the country and more local variation within areas. Relative contributions of individual and cluster-level risk factors to the geographic distribution of disease remain largely unknown. The primary aim of this analysis is to assess the relationship between climatic factors and trachomatous trichiasis (TT) and/or corneal opacity (CO) due to trachoma in Nigeria, while accounting for the effects of individual risk factors and spatial correlation. In addition, we explore the relative importance of variation in the risk of trichiasis and/or corneal opacity (TT/CO) at different levels. Data from the 2007 National Blindness and Visual Impairment Survey were used for this analysis, which included a nationally representative sample of adults aged 40 years and above. Complete data were available from 304 clusters selected using a multi-stage stratified cluster-random sampling strategy. All participants (13,543 individuals) were interviewed and examined by an ophthalmologist for the presence or absence of TT and CO. In addition to field-collected data, remotely sensed climatic data were extracted for each cluster and used to fit Bayesian hierarchical logistic models to disease outcome. The risk of TT/CO was associated with factors at both the individual and cluster levels, with approximately 14% of the total variation attributed to the cluster level. Beyond established individual risk factors (age, gender and occupation), there was strong evidence that environmental/climatic factors at the cluster-level (lower precipitation, higher land surface temperature, higher mean annual temperature and rural classification) were also associated with a greater risk of TT/CO. This study establishes the importance of large-scale risk factors in the geographical distribution of TT/CO in Nigeria, supporting anecdotal evidence that environmental conditions are associated with increased risk in this context and highlighting their potential use in improving estimates of disease burden at large scales.

  9. Bayesian adjustment for measurement error in continuous exposures in an individually matched case-control study.

    PubMed

    Espino-Hernandez, Gabriela; Gustafson, Paul; Burstyn, Igor

    2011-05-14

    In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a naive analysis. Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed. The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed. In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model.

  10. Mining pharmacovigilance data using Bayesian logistic regression with James-Stein type shrinkage estimation.

    PubMed

    An, Lihua; Fung, Karen Y; Krewski, Daniel

    2010-09-01

    Spontaneous adverse event reporting systems are widely used to identify adverse reactions to drugs following their introduction into the marketplace. In this article, a James-Stein type shrinkage estimation strategy was developed in a Bayesian logistic regression model to analyze pharmacovigilance data. This method is effective in detecting signals as it combines information and borrows strength across medically related adverse events. Computer simulation demonstrated that the shrinkage estimator is uniformly better than the maximum likelihood estimator in terms of mean squared error. This method was used to investigate the possible association of a series of diabetic drugs and the risk of cardiovascular events using data from the Canada Vigilance Online Database.

  11. Planetary micro-rover operations on Mars using a Bayesian framework for inference and control

    NASA Astrophysics Data System (ADS)

    Post, Mark A.; Li, Junquan; Quine, Brendan M.

    2016-03-01

    With the recent progress toward the application of commercially-available hardware to small-scale space missions, it is now becoming feasible for groups of small, efficient robots based on low-power embedded hardware to perform simple tasks on other planets in the place of large-scale, heavy and expensive robots. In this paper, we describe design and programming of the Beaver micro-rover developed for Northern Light, a Canadian initiative to send a small lander and rover to Mars to study the Martian surface and subsurface. For a small, hardware-limited rover to handle an uncertain and mostly unknown environment without constant management by human operators, we use a Bayesian network of discrete random variables as an abstraction of expert knowledge about the rover and its environment, and inference operations for control. A framework for efficient construction and inference into a Bayesian network using only the C language and fixed-point mathematics on embedded hardware has been developed for the Beaver to make intelligent decisions with minimal sensor data. We study the performance of the Beaver as it probabilistically maps a simple outdoor environment with sensor models that include uncertainty. Results indicate that the Beaver and other small and simple robotic platforms can make use of a Bayesian network to make intelligent decisions in uncertain planetary environments.

  12. Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent

    2015-08-18

    Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.

  13. A Conceptual Approach to Assimilating Remote Sensing Data to Improve Soil Moisture Profile Estimates in a Surface Flux/Hydrology Model. 3; Disaggregation

    NASA Technical Reports Server (NTRS)

    Caulfield, John; Crosson, William L.; Inguva, Ramarao; Laymon, Charles A.; Schamschula, Marius

    1998-01-01

    This is a followup on the preceding presentation by Crosson and Schamschula. The grid size for remote microwave measurements is much coarser than the hydrological model computational grids. To validate the hydrological models with measurements we propose mechanisms to disaggregate the microwave measurements to allow comparison with outputs from the hydrological models. Weighted interpolation and Bayesian methods are proposed to facilitate the comparison. While remote measurements occur at a large scale, they reflect underlying small-scale features. We can give continuing estimates of the small scale features by correcting the simple 0th-order, starting with each small-scale model with each large-scale measurement using a straightforward method based on Kalman filtering.

  14. Development of a large-scale transportation optimization course.

    DOT National Transportation Integrated Search

    2011-11-01

    "In this project, a course was developed to introduce transportation and logistics applications of large-scale optimization to graduate students. This report details what : similar courses exist in other universities, and the methodology used to gath...

  15. A Bayesian Semiparametric Item Response Model with Dirichlet Process Priors

    ERIC Educational Resources Information Center

    Miyazaki, Kei; Hoshino, Takahiro

    2009-01-01

    In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models…

  16. Real-time Bayesian anomaly detection in streaming environmental data

    NASA Astrophysics Data System (ADS)

    Hill, David J.; Minsker, Barbara S.; Amir, Eyal

    2009-04-01

    With large volumes of data arriving in near real time from environmental sensors, there is a need for automated detection of anomalous data caused by sensor or transmission errors or by infrequent system behaviors. This study develops and evaluates three automated anomaly detection methods using dynamic Bayesian networks (DBNs), which perform fast, incremental evaluation of data as they become available, scale to large quantities of data, and require no a priori information regarding process variables or types of anomalies that may be encountered. This study investigates these methods' abilities to identify anomalies in eight meteorological data streams from Corpus Christi, Texas. The results indicate that DBN-based detectors, using either robust Kalman filtering or Rao-Blackwellized particle filtering, outperform a DBN-based detector using Kalman filtering, with the former having false positive/negative rates of less than 2%. These methods were successful at identifying data anomalies caused by two real events: a sensor failure and a large storm.

  17. Bayesian Estimation of Multi-Unidimensional Graded Response IRT Models

    ERIC Educational Resources Information Center

    Kuo, Tzu-Chun

    2015-01-01

    Item response theory (IRT) has gained an increasing popularity in large-scale educational and psychological testing situations because of its theoretical advantages over classical test theory. Unidimensional graded response models (GRMs) are useful when polytomous response items are designed to measure a unified latent trait. They are limited in…

  18. Approximate Bayesian computation in large-scale structure: constraining the galaxy-halo connection

    NASA Astrophysics Data System (ADS)

    Hahn, ChangHoon; Vakili, Mohammadjavad; Walsh, Kilian; Hearin, Andrew P.; Hogg, David W.; Campbell, Duncan

    2017-08-01

    Standard approaches to Bayesian parameter inference in large-scale structure assume a Gaussian functional form (chi-squared form) for the likelihood. This assumption, in detail, cannot be correct. Likelihood free inferences such as approximate Bayesian computation (ABC) relax these restrictions and make inference possible without making any assumptions on the likelihood. Instead ABC relies on a forward generative model of the data and a metric for measuring the distance between the model and data. In this work, we demonstrate that ABC is feasible for LSS parameter inference by using it to constrain parameters of the halo occupation distribution (HOD) model for populating dark matter haloes with galaxies. Using specific implementation of ABC supplemented with population Monte Carlo importance sampling, a generative forward model using HOD and a distance metric based on galaxy number density, two-point correlation function and galaxy group multiplicity function, we constrain the HOD parameters of mock observation generated from selected 'true' HOD parameters. The parameter constraints we obtain from ABC are consistent with the 'true' HOD parameters, demonstrating that ABC can be reliably used for parameter inference in LSS. Furthermore, we compare our ABC constraints to constraints we obtain using a pseudo-likelihood function of Gaussian form with MCMC and find consistent HOD parameter constraints. Ultimately, our results suggest that ABC can and should be applied in parameter inference for LSS analyses.

  19. Uncertainty estimation of Intensity-Duration-Frequency relationships: A regional analysis

    NASA Astrophysics Data System (ADS)

    Mélèse, Victor; Blanchet, Juliette; Molinié, Gilles

    2018-03-01

    We propose in this article a regional study of uncertainties in IDF curves derived from point-rainfall maxima. We develop two generalized extreme value models based on the simple scaling assumption, first in the frequentist framework and second in the Bayesian framework. Within the frequentist framework, uncertainties are obtained i) from the Gaussian density stemming from the asymptotic normality theorem of the maximum likelihood and ii) with a bootstrap procedure. Within the Bayesian framework, uncertainties are obtained from the posterior densities. We confront these two frameworks on the same database covering a large region of 100, 000 km2 in southern France with contrasted rainfall regime, in order to be able to draw conclusion that are not specific to the data. The two frameworks are applied to 405 hourly stations with data back to the 1980's, accumulated in the range 3 h-120 h. We show that i) the Bayesian framework is more robust than the frequentist one to the starting point of the estimation procedure, ii) the posterior and the bootstrap densities are able to better adjust uncertainty estimation to the data than the Gaussian density, and iii) the bootstrap density give unreasonable confidence intervals, in particular for return levels associated to large return period. Therefore our recommendation goes towards the use of the Bayesian framework to compute uncertainty.

  20. Scale-invariance underlying the logistic equation and its social applications

    NASA Astrophysics Data System (ADS)

    Hernando, A.; Plastino, A.

    2013-01-01

    On the basis of dynamical principles we i) advance a derivation of the Logistic Equation (LE), widely employed (among multiple applications) in the simulation of population growth, and ii) demonstrate that scale-invariance and a mean-value constraint are sufficient and necessary conditions for obtaining it. We also generalize the LE to multi-component systems and show that the above dynamical mechanisms underlie a large number of scale-free processes. Examples are presented regarding city-populations, diffusion in complex networks, and popularity of technological products, all of them obeying the multi-component logistic equation in an either stochastic or deterministic way.

  1. Scale Mixture Models with Applications to Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Qin, Zhaohui S.; Damien, Paul; Walker, Stephen

    2003-11-01

    Scale mixtures of uniform distributions are used to model non-normal data in time series and econometrics in a Bayesian framework. Heteroscedastic and skewed data models are also tackled using scale mixture of uniform distributions.

  2. Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning

    ERIC Educational Resources Information Center

    MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.

    2015-01-01

    Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…

  3. Assessing global vegetation activity using spatio-temporal Bayesian modelling

    NASA Astrophysics Data System (ADS)

    Mulder, Vera L.; van Eck, Christel M.; Friedlingstein, Pierre; Regnier, Pierre A. G.

    2016-04-01

    This work demonstrates the potential of modelling vegetation activity using a hierarchical Bayesian spatio-temporal model. This approach allows modelling changes in vegetation and climate simultaneous in space and time. Changes of vegetation activity such as phenology are modelled as a dynamic process depending on climate variability in both space and time. Additionally, differences in observed vegetation status can be contributed to other abiotic ecosystem properties, e.g. soil and terrain properties. Although these properties do not change in time, they do change in space and may provide valuable information in addition to the climate dynamics. The spatio-temporal Bayesian models were calibrated at a regional scale because the local trends in space and time can be better captured by the model. The regional subsets were defined according to the SREX segmentation, as defined by the IPCC. Each region is considered being relatively homogeneous in terms of large-scale climate and biomes, still capturing small-scale (grid-cell level) variability. Modelling within these regions is hence expected to be less uncertain due to the absence of these large-scale patterns, compared to a global approach. This overall modelling approach allows the comparison of model behavior for the different regions and may provide insights on the main dynamic processes driving the interaction between vegetation and climate within different regions. The data employed in this study encompasses the global datasets for soil properties (SoilGrids), terrain properties (Global Relief Model based on SRTM DEM and ETOPO), monthly time series of satellite-derived vegetation indices (GIMMS NDVI3g) and climate variables (Princeton Meteorological Forcing Dataset). The findings proved the potential of a spatio-temporal Bayesian modelling approach for assessing vegetation dynamics, at a regional scale. The observed interrelationships of the employed data and the different spatial and temporal trends support our hypothesis. That is, the change of vegetation in space and time may be better understood when modelling vegetation change as both a dynamic and multivariate process. Therefore, future research will focus on a multivariate dynamical spatio-temporal modelling approach. This ongoing research is performed within the context of the project "Global impacts of hydrological and climatic extremes on vegetation" (project acronym: SAT-EX) which is part of the Belgian research programme for Earth Observation Stereo III.

  4. Spatial processes decouple management from objectives in a heterogeneous landscape: predator control as a case study.

    PubMed

    Mahoney, Peter J; Young, Julie K; Hersey, Kent R; Larsen, Randy T; McMillan, Brock R; Stoner, David C

    2018-04-01

    Predator control is often implemented with the intent of disrupting top-down regulation in sensitive prey populations. However, ambiguity surrounding the efficacy of predator management, as well as the strength of top-down effects of predators in general, is often exacerbated by the spatially implicit analytical approaches used in assessing data with explicit spatial structure. Here, we highlight the importance of considering spatial context in the case of a predator control study in south-central Utah. We assessed the spatial match between aerial removal risk in coyotes (Canis latrans) and mule deer (Odocoileus hemionus) resource selection during parturition using a spatially explicit, multi-level Bayesian model. With our model, we were able to evaluate spatial congruence between management action (i.e., coyote removal) and objective (i.e., parturient deer site selection) at two distinct scales: the level of the management unit and the individual coyote removal. In the case of the former, our results indicated substantial spatial heterogeneity in expected congruence between removal risk and parturient deer site selection across large areas, and is a reflection of logistical constraints acting on the management strategy and differences in space use between the two species. At the level of the individual removal, we demonstrated that the potential management benefits of a removed coyote were highly variable across all individuals removed and in many cases, spatially distinct from parturient deer resource selection. Our methods and results provide a means of evaluating where we might anticipate an impact of predator control, while emphasizing the need to weight individual removals based on spatial proximity to management objectives in any assessment of large-scale predator control. Although we highlight the importance of spatial context in assessments of predator control strategy, we believe our methods are readily generalizable in any management or large-scale experimental framework where spatial context is likely an important driver of outcomes. © 2018 by the Ecological Society of America.

  5. A large scale test of the gaming-enhancement hypothesis

    PubMed Central

    Wang, John C.

    2016-01-01

    A growing research literature suggests that regular electronic game play and game-based training programs may confer practically significant benefits to cognitive functioning. Most evidence supporting this idea, the gaming-enhancement hypothesis, has been collected in small-scale studies of university students and older adults. This research investigated the hypothesis in a general way with a large sample of 1,847 school-aged children. Our aim was to examine the relations between young people’s gaming experiences and an objective test of reasoning performance. Using a Bayesian hypothesis testing approach, evidence for the gaming-enhancement and null hypotheses were compared. Results provided no substantive evidence supporting the idea that having preference for or regularly playing commercially available games was positively associated with reasoning ability. Evidence ranged from equivocal to very strong in support for the null hypothesis over what was predicted. The discussion focuses on the value of Bayesian hypothesis testing for investigating electronic gaming effects, the importance of open science practices, and pre-registered designs to improve the quality of future work. PMID:27896035

  6. Conditional maximum-entropy method for selecting prior distributions in Bayesian statistics

    NASA Astrophysics Data System (ADS)

    Abe, Sumiyoshi

    2014-11-01

    The conditional maximum-entropy method (abbreviated here as C-MaxEnt) is formulated for selecting prior probability distributions in Bayesian statistics for parameter estimation. This method is inspired by a statistical-mechanical approach to systems governed by dynamics with largely separated time scales and is based on three key concepts: conjugate pairs of variables, dimensionless integration measures with coarse-graining factors and partial maximization of the joint entropy. The method enables one to calculate a prior purely from a likelihood in a simple way. It is shown, in particular, how it not only yields Jeffreys's rules but also reveals new structures hidden behind them.

  7. Variational dynamic background model for keyword spotting in handwritten documents

    NASA Astrophysics Data System (ADS)

    Kumar, Gaurav; Wshah, Safwan; Govindaraju, Venu

    2013-12-01

    We propose a bayesian framework for keyword spotting in handwritten documents. This work is an extension to our previous work where we proposed dynamic background model, DBM for keyword spotting that takes into account the local character level scores and global word level scores to learn a logistic regression classifier to separate keywords from non-keywords. In this work, we add a bayesian layer on top of the DBM called the variational dynamic background model, VDBM. The logistic regression classifier uses the sigmoid function to separate keywords from non-keywords. The sigmoid function being neither convex nor concave, exact inference of VDBM becomes intractable. An expectation maximization step is proposed to do approximate inference. The advantage of VDBM over the DBM is multi-fold. Firstly, being bayesian, it prevents over-fitting of data. Secondly, it provides better modeling of data and an improved prediction of unseen data. VDBM is evaluated on the IAM dataset and the results prove that it outperforms our prior work and other state of the art line based word spotting system.

  8. Multimethod, multistate Bayesian hierarchical modeling approach for use in regional monitoring of wolves.

    PubMed

    Jiménez, José; García, Emilio J; Llaneza, Luis; Palacios, Vicente; González, Luis Mariano; García-Domínguez, Francisco; Múñoz-Igualada, Jaime; López-Bao, José Vicente

    2016-08-01

    In many cases, the first step in large-carnivore management is to obtain objective, reliable, and cost-effective estimates of population parameters through procedures that are reproducible over time. However, monitoring predators over large areas is difficult, and the data have a high level of uncertainty. We devised a practical multimethod and multistate modeling approach based on Bayesian hierarchical-site-occupancy models that combined multiple survey methods to estimate different population states for use in monitoring large predators at a regional scale. We used wolves (Canis lupus) as our model species and generated reliable estimates of the number of sites with wolf reproduction (presence of pups). We used 2 wolf data sets from Spain (Western Galicia in 2013 and Asturias in 2004) to test the approach. Based on howling surveys, the naïve estimation (i.e., estimate based only on observations) of the number of sites with reproduction was 9 and 25 sites in Western Galicia and Asturias, respectively. Our model showed 33.4 (SD 9.6) and 34.4 (3.9) sites with wolf reproduction, respectively. The number of occupied sites with wolf reproduction was 0.67 (SD 0.19) and 0.76 (0.11), respectively. This approach can be used to design more cost-effective monitoring programs (i.e., to define the sampling effort needed per site). Our approach should inspire well-coordinated surveys across multiple administrative borders and populations and lead to improved decision making for management of large carnivores on a landscape level. The use of this Bayesian framework provides a simple way to visualize the degree of uncertainty around population-parameter estimates and thus provides managers and stakeholders an intuitive approach to interpreting monitoring results. Our approach can be widely applied to large spatial scales in wildlife monitoring where detection probabilities differ between population states and where several methods are being used to estimate different population parameters. © 2016 Society for Conservation Biology.

  9. On the Relationships between Jeffreys Modal and Weighted Likelihood Estimation of Ability under Logistic IRT Models

    ERIC Educational Resources Information Center

    Magis, David; Raiche, Gilles

    2012-01-01

    This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys' prior distribution is considered, and the corresponding estimator is referred to as the Jeffreys modal (JM) estimator. It is established that under…

  10. Estimation of Logistic Regression Models in Small Samples. A Simulation Study Using a Weakly Informative Default Prior Distribution

    ERIC Educational Resources Information Center

    Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel

    2012-01-01

    In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…

  11. Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction

    NASA Astrophysics Data System (ADS)

    Cui, Tiangang; Marzouk, Youssef; Willcox, Karen

    2016-06-01

    Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.

  12. Modular analysis of the probabilistic genetic interaction network.

    PubMed

    Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting

    2011-03-15

    Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.

  13. Bayesian Analysis of the Power Spectrum of the Cosmic Microwave Background

    NASA Technical Reports Server (NTRS)

    Jewell, Jeffrey B.; Eriksen, H. K.; O'Dwyer, I. J.; Wandelt, B. D.

    2005-01-01

    There is a wealth of cosmological information encoded in the spatial power spectrum of temperature anisotropies of the cosmic microwave background. The sky, when viewed in the microwave, is very uniform, with a nearly perfect blackbody spectrum at 2.7 degrees. Very small amplitude brightness fluctuations (to one part in a million!!) trace small density perturbations in the early universe (roughly 300,000 years after the Big Bang), which later grow through gravitational instability to the large-scale structure seen in redshift surveys... In this talk, I will discuss a Bayesian formulation of this problem; discuss a Gibbs sampling approach to numerically sampling from the Bayesian posterior, and the application of this approach to the first-year data from the Wilkinson Microwave Anisotropy Probe. I will also comment on recent algorithmic developments for this approach to be tractable for the even more massive data set to be returned from the Planck satellite.

  14. Bayesian analysis of the dynamic cosmic web in the SDSS galaxy survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leclercq, Florent; Wandelt, Benjamin; Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: jasche@iap.fr, E-mail: wandelt@iap.fr

    Recent application of the Bayesian algorithm \\textsc(borg) to the Sloan Digital Sky Survey (SDSS) main sample galaxies resulted in the physical inference of the formation history of the observed large-scale structure from its origin to the present epoch. In this work, we use these inferences as inputs for a detailed probabilistic cosmic web-type analysis. To do so, we generate a large set of data-constrained realizations of the large-scale structure using a fast, fully non-linear gravitational model. We then perform a dynamic classification of the cosmic web into four distinct components (voids, sheets, filaments, and clusters) on the basis of themore » tidal field. Our inference framework automatically and self-consistently propagates typical observational uncertainties to web-type classification. As a result, this study produces accurate cosmographic classification of large-scale structure elements in the SDSS volume. By also providing the history of these structure maps, the approach allows an analysis of the origin and growth of the early traces of the cosmic web present in the initial density field and of the evolution of global quantities such as the volume and mass filling fractions of different structures. For the problem of web-type classification, the results described in this work constitute the first connection between theory and observations at non-linear scales including a physical model of structure formation and the demonstrated capability of uncertainty quantification. A connection between cosmology and information theory using real data also naturally emerges from our probabilistic approach. Our results constitute quantitative chrono-cosmography of the complex web-like patterns underlying the observed galaxy distribution.« less

  15. Multiscale Bayesian neural networks for soil water content estimation

    NASA Astrophysics Data System (ADS)

    Jana, Raghavendra B.; Mohanty, Binayak P.; Springer, Everett P.

    2008-08-01

    Artificial neural networks (ANN) have been used for some time now to estimate soil hydraulic parameters from other available or more easily measurable soil properties. However, most such uses of ANNs as pedotransfer functions (PTFs) have been at matching spatial scales (1:1) of inputs and outputs. This approach assumes that the outputs are only required at the same scale as the input data. Unfortunately, this is rarely true. Different hydrologic, hydroclimatic, and contaminant transport models require soil hydraulic parameter data at different spatial scales, depending upon their grid sizes. While conventional (deterministic) ANNs have been traditionally used in these studies, the use of Bayesian training of ANNs is a more recent development. In this paper, we develop a Bayesian framework to derive soil water retention function including its uncertainty at the point or local scale using PTFs trained with coarser-scale Soil Survey Geographic (SSURGO)-based soil data. The approach includes an ANN trained with Bayesian techniques as a PTF tool with training and validation data collected across spatial extents (scales) in two different regions in the United States. The two study areas include the Las Cruces Trench site in the Rio Grande basin of New Mexico, and the Southern Great Plains 1997 (SGP97) hydrology experimental region in Oklahoma. Each region-specific Bayesian ANN is trained using soil texture and bulk density data from the SSURGO database (scale 1:24,000), and predictions of the soil water contents at different pressure heads with point scale data (1:1) inputs are made. The resulting outputs are corrected for bias using both linear and nonlinear correction techniques. The results show good agreement between the soil water content values measured at the point scale and those predicted by the Bayesian ANN-based PTFs for both the study sites. Overall, Bayesian ANNs coupled with nonlinear bias correction are found to be very suitable tools for deriving soil hydraulic parameters at the local/fine scale from soil physical properties at coarser-scale and across different spatial extents. This approach could potentially be used for soil hydraulic properties estimation and downscaling.

  16. Reliability of a Bayesian network to predict an elevated aldosterone-to-renin ratio.

    PubMed

    Ducher, Michel; Mounier-Véhier, Claire; Lantelme, Pierre; Vaisse, Bernard; Baguet, Jean-Philippe; Fauvel, Jean-Pierre

    2015-05-01

    Resistant hypertension is common, mainly idiopathic, but sometimes related to primary aldosteronism. Thus, most hypertension specialists recommend screening for primary aldosteronism. To optimize the selection of patients whose aldosterone-to-renin ratio (ARR) is elevated from simple clinical and biological characteristics. Data from consecutive patients referred between 1 June 2008 and 30 May 2009 were collected retrospectively from five French 'European excellence hypertension centres' institutional registers. Patients were included if they had at least one of: onset of hypertension before age 40 years, resistant hypertension, history of hypokalaemia, efficient treatment by spironolactone, and potassium supplementation. An ARR>32 ng/L and aldosterone>160 ng/L in patients treated without agents altering the renin-angiotensin system was considered as elevated. Bayesian network and stepwise logistic regression were used to predict an elevated ARR. Of 334 patients, 89 were excluded (31 for incomplete data, 32 for taking agents that alter the renin-angiotensin system and 26 for other reasons). Among 245 included patients, 110 had an elevated ARR. Sensitivity reached 100% or 63.3% using Bayesian network or logistic regression, respectively, and specificity reached 89.6% or 67.2%, respectively. The area under the receiver-operating-characteristic curve obtained with the Bayesian network was significantly higher than that obtained by stepwise regression (0.93±0.02 vs. 0.70±0.03; P<0.001). In hypertension centres, Bayesian network efficiently detected patients with an elevated ARR. An external validation study is required before use in primary clinical settings. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  17. Comparing models for quantitative risk assessment: an application to the European Registry of foreign body injuries in children.

    PubMed

    Berchialla, Paola; Scarinzi, Cecilia; Snidero, Silvia; Gregori, Dario

    2016-08-01

    Risk Assessment is the systematic study of decisions subject to uncertain consequences. An increasing interest has been focused on modeling techniques like Bayesian Networks since their capability of (1) combining in the probabilistic framework different type of evidence including both expert judgments and objective data; (2) overturning previous beliefs in the light of the new information being received and (3) making predictions even with incomplete data. In this work, we proposed a comparison among Bayesian Networks and other classical Quantitative Risk Assessment techniques such as Neural Networks, Classification Trees, Random Forests and Logistic Regression models. Hybrid approaches, combining both Classification Trees and Bayesian Networks, were also considered. Among Bayesian Networks, a clear distinction between purely data-driven approach and combination of expert knowledge with objective data is made. The aim of this paper consists in evaluating among this models which best can be applied, in the framework of Quantitative Risk Assessment, to assess the safety of children who are exposed to the risk of inhalation/insertion/aspiration of consumer products. The issue of preventing injuries in children is of paramount importance, in particular where product design is involved: quantifying the risk associated to product characteristics can be of great usefulness in addressing the product safety design regulation. Data of the European Registry of Foreign Bodies Injuries formed the starting evidence for risk assessment. Results showed that Bayesian Networks appeared to have both the ease of interpretability and accuracy in making prediction, even if simpler models like logistic regression still performed well. © The Author(s) 2013.

  18. Combined target factor analysis and Bayesian soft-classification of interference-contaminated samples: forensic fire debris analysis.

    PubMed

    Williams, Mary R; Sigman, Michael E; Lewis, Jennifer; Pitan, Kelly McHugh

    2012-10-10

    A bayesian soft classification method combined with target factor analysis (TFA) is described and tested for the analysis of fire debris data. The method relies on analysis of the average mass spectrum across the chromatographic profile (i.e., the total ion spectrum, TIS) from multiple samples taken from a single fire scene. A library of TIS from reference ignitable liquids with assigned ASTM classification is used as the target factors in TFA. The class-conditional distributions of correlations between the target and predicted factors for each ASTM class are represented by kernel functions and analyzed by bayesian decision theory. The soft classification approach assists in assessing the probability that ignitable liquid residue from a specific ASTM E1618 class, is present in a set of samples from a single fire scene, even in the presence of unspecified background contributions from pyrolysis products. The method is demonstrated with sample data sets and then tested on laboratory-scale burn data and large-scale field test burns. The overall performance achieved in laboratory and field test of the method is approximately 80% correct classification of fire debris samples. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  19. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES1

    PubMed Central

    Zhu, Xiang; Stephens, Matthew

    2017-01-01

    Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241

  20. Uncovering robust patterns of microRNA co-expression across cancers using Bayesian Relevance Networks

    PubMed Central

    2017-01-01

    Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing—with its unique statistical properties—became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca. PMID:28817636

  1. Uncovering robust patterns of microRNA co-expression across cancers using Bayesian Relevance Networks.

    PubMed

    Ramachandran, Parameswaran; Sánchez-Taltavull, Daniel; Perkins, Theodore J

    2017-01-01

    Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing-with its unique statistical properties-became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca.

  2. Comparing vector-based and Bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on Twitter and Stack Overflow.

    PubMed

    Stanley, Clayton; Byrne, Michael D

    2016-12-01

    The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  3. Extracting Prior Distributions from a Large Dataset of In-Situ Measurements to Support SWOT-based Estimation of River Discharge

    NASA Astrophysics Data System (ADS)

    Hagemann, M.; Gleason, C. J.

    2017-12-01

    The upcoming (2021) Surface Water and Ocean Topography (SWOT) NASA satellite mission aims, in part, to estimate discharge on major rivers worldwide using reach-scale measurements of stream width, slope, and height. Current formalizations of channel and floodplain hydraulics are insufficient to fully constrain this problem mathematically, resulting in an infinitely large solution set for any set of satellite observations. Recent work has reformulated this problem in a Bayesian statistical setting, in which the likelihood distributions derive directly from hydraulic flow-law equations. When coupled with prior distributions on unknown flow-law parameters, this formulation probabilistically constrains the parameter space, and results in a computationally tractable description of discharge. Using a curated dataset of over 200,000 in-situ acoustic Doppler current profiler (ADCP) discharge measurements from over 10,000 USGS gaging stations throughout the United States, we developed empirical prior distributions for flow-law parameters that are not observable by SWOT, but that are required in order to estimate discharge. This analysis quantified prior uncertainties on quantities including cross-sectional area, at-a-station hydraulic geometry width exponent, and discharge variability, that are dependent on SWOT-observable variables including reach-scale statistics of width and height. When compared against discharge estimation approaches that do not use this prior information, the Bayesian approach using ADCP-derived priors demonstrated consistently improved performance across a range of performance metrics. This Bayesian approach formally transfers information from in-situ gaging stations to remote-sensed estimation of discharge, in which the desired quantities are not directly observable. Further investigation using large in-situ datasets is therefore a promising way forward in improving satellite-based estimates of river discharge.

  4. Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

    NASA Astrophysics Data System (ADS)

    Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

    2016-12-01

    Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.

  5. An adaptive response surface method for crashworthiness optimization

    NASA Astrophysics Data System (ADS)

    Shi, Lei; Yang, Ren-Jye; Zhu, Ping

    2013-11-01

    Response surface-based design optimization has been commonly used for optimizing large-scale design problems in the automotive industry. However, most response surface models are built by a limited number of design points without considering data uncertainty. In addition, the selection of a response surface in the literature is often arbitrary. This article uses a Bayesian metric to systematically select the best available response surface among several candidates in a library while considering data uncertainty. An adaptive, efficient response surface strategy, which minimizes the number of computationally intensive simulations, was developed for design optimization of large-scale complex problems. This methodology was demonstrated by a crashworthiness optimization example.

  6. Multi-Scale Validation of a Nanodiamond Drug Delivery System and Multi-Scale Engineering Education

    ERIC Educational Resources Information Center

    Schwalbe, Michelle Kristin

    2010-01-01

    This dissertation has two primary concerns: (i) evaluating the uncertainty and prediction capabilities of a nanodiamond drug delivery model using Bayesian calibration and bias correction, and (ii) determining conceptual difficulties of multi-scale analysis from an engineering education perspective. A Bayesian uncertainty quantification scheme…

  7. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.

    PubMed

    Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun

    2017-01-01

    Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.

  8. Modeling distributional changes in winter precipitation of Canada using Bayesian spatiotemporal quantile regression subjected to different teleconnections

    NASA Astrophysics Data System (ADS)

    Tan, Xuezhi; Gan, Thian Yew; Chen, Shu; Liu, Bingjun

    2018-05-01

    Climate change and large-scale climate patterns may result in changes in probability distributions of climate variables that are associated with changes in the mean and variability, and severity of extreme climate events. In this paper, we applied a flexible framework based on the Bayesian spatiotemporal quantile (BSTQR) model to identify climate changes at different quantile levels and their teleconnections to large-scale climate patterns such as El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO) and Pacific-North American (PNA). Using the BSTQR model with time (year) as a covariate, we estimated changes in Canadian winter precipitation and their uncertainties at different quantile levels. There were some stations in eastern Canada showing distributional changes in winter precipitation such as an increase in low quantiles but a decrease in high quantiles. Because quantile functions in the BSTQR model vary with space and time and assimilate spatiotemporal precipitation data, the BSTQR model produced much spatially smoother and less uncertain quantile changes than the classic regression without considering spatiotemporal correlations. Using the BSTQR model with five teleconnection indices (i.e., SOI, PDO, PNA, NP and NAO) as covariates, we investigated effects of large-scale climate patterns on Canadian winter precipitation at different quantile levels. Winter precipitation responses to these five teleconnections were found to occur differently at different quantile levels. Effects of five teleconnections on Canadian winter precipitation were stronger at low and high than at medium quantile levels.

  9. Bayesian inference for the spatio-temporal invasion of alien species.

    PubMed

    Cook, Alex; Marion, Glenn; Butler, Adam; Gibson, Gavin

    2007-08-01

    In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.

  10. Bayesian averaging over Decision Tree models for trauma severity scoring.

    PubMed

    Schetinin, V; Jakaite, L; Krzanowski, W

    2018-01-01

    Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Analysis of Feature Intervisibility and Cumulative Visibility Using GIS, Bayesian and Spatial Statistics: A Study from the Mandara Mountains, Northern Cameroon

    PubMed Central

    Wright, David K.; MacEachern, Scott; Lee, Jaeyong

    2014-01-01

    The locations of diy-geδ-bay (DGB) sites in the Mandara Mountains, northern Cameroon are hypothesized to occur as a function of their ability to see and be seen from points on the surrounding landscape. A series of geostatistical, two-way and Bayesian logistic regression analyses were performed to test two hypotheses related to the intervisibility of the sites to one another and their visual prominence on the landscape. We determine that the intervisibility of the sites to one another is highly statistically significant when compared to 10 stratified-random permutations of DGB sites. Bayesian logistic regression additionally demonstrates that the visibility of the sites to points on the surrounding landscape is statistically significant. The location of sites appears to have also been selected on the basis of lower slope than random permutations of sites. Using statistical measures, many of which are not commonly employed in archaeological research, to evaluate aspects of visibility on the landscape, we conclude that the placement of DGB sites improved their conspicuousness for enhanced ritual, social cooperation and/or competition purposes. PMID:25383883

  12. Microworld Simulations: A New Dimension in Training Army Logistics Management Skills

    DTIC Science & Technology

    2004-01-01

    Providing effective training to Army personnelis always challenging, but the Army facessome new challenges in training its logisticsstaff managers in...soldiers are stationed and where materiel and services are readily available. The design and management of the Army’s Combat Ser- vice Support (CSS) large...scale logistics systems are increasingly important. The skills that are required to manage these systems are difficult to train. Large deployments

  13. ScreenBEAM: a novel meta-analysis algorithm for functional genomics screens via Bayesian hierarchical modeling | Office of Cancer Genomics

    Cancer.gov

    Functional genomics (FG) screens, using RNAi or CRISPR technology, have become a standard tool for systematic, genome-wide loss-of-function studies for therapeutic target discovery. As in many large-scale assays, however, off-target effects, variable reagents' potency and experimental noise must be accounted for appropriately control for false positives.

  14. Comparing Future Teachers' Beliefs across Countries: Approximate Measurement Invariance with Bayesian Elastic Constraints for Local Item Dependence and Differential Item Functioning

    ERIC Educational Resources Information Center

    Braeken, Johan; Blömeke, Sigrid

    2016-01-01

    Using data from the international Teacher Education and Development Study: Learning to Teach Mathematics (TEDS-M), the measurement equivalence of teachers' beliefs across countries is investigated for the case of "mathematics-as-a fixed-ability". Measurement equivalence is a crucial topic in all international large-scale assessments and…

  15. Nonparametric Bayesian inference of the microcanonical stochastic block model

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2017-01-01

    A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.

  16. Immigrant maternal depression and social networks. A multilevel Bayesian spatial logistic regression in South Western Sydney, Australia.

    PubMed

    Eastwood, John G; Jalaludin, Bin B; Kemp, Lynn A; Phung, Hai N; Barnett, Bryanne E W

    2013-09-01

    The purpose is to explore the multilevel spatial distribution of depressive symptoms among migrant mothers in South Western Sydney and to identify any group level associations that could inform subsequent theory building and local public health interventions. Migrant mothers (n=7256) delivering in 2002 and 2003 were assessed at 2-3 weeks after delivery for risk factors for depressive symptoms. The binary outcome variables were Edinburgh Postnatal Depression Scale scores (EPDS) of >9 and >12. Individual level variables included were: financial income, self-reported maternal health, social support network, emotional support, practical support, baby trouble sleeping, baby demanding and baby not content. The group level variable reported here is aggregated social support networks. We used Bayesian hierarchical multilevel spatial modelling with conditional autoregression. Migrant mothers were at higher risk of having depressive symptoms if they lived in a community with predominantly Australian-born mothers and strong social capital as measured by aggregated social networks. These findings suggest that migrant mothers are socially isolated and current home visiting services should be strengthened for migrant mothers living in communities where they may have poor social networks. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less

  18. Using Bayesian Adaptive Trial Designs for Comparative Effectiveness Research: A Virtual Trial Execution.

    PubMed

    Luce, Bryan R; Connor, Jason T; Broglio, Kristine R; Mullins, C Daniel; Ishak, K Jack; Saunders, Elijah; Davis, Barry R

    2016-09-20

    Bayesian and adaptive clinical trial designs offer the potential for more efficient processes that result in lower sample sizes and shorter trial durations than traditional designs. To explore the use and potential benefits of Bayesian adaptive clinical trial designs in comparative effectiveness research. Virtual execution of ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) as if it had been done according to a Bayesian adaptive trial design. Comparative effectiveness trial of antihypertensive medications. Patient data sampled from the more than 42 000 patients enrolled in ALLHAT with publicly available data. Number of patients randomly assigned between groups, trial duration, observed numbers of events, and overall trial results and conclusions. The Bayesian adaptive approach and original design yielded similar overall trial conclusions. The Bayesian adaptive trial randomly assigned more patients to the better-performing group and would probably have ended slightly earlier. This virtual trial execution required limited resampling of ALLHAT patients for inclusion in RE-ADAPT (REsearch in ADAptive methods for Pragmatic Trials). Involvement of a data monitoring committee and other trial logistics were not considered. In a comparative effectiveness research trial, Bayesian adaptive trial designs are a feasible approach and potentially generate earlier results and allocate more patients to better-performing groups. National Heart, Lung, and Blood Institute.

  19. Bayesian analysis of non-homogeneous Markov chains: application to mental health data.

    PubMed

    Sung, Minje; Soyer, Refik; Nhan, Nguyen

    2007-07-10

    In this paper we present a formal treatment of non-homogeneous Markov chains by introducing a hierarchical Bayesian framework. Our work is motivated by the analysis of correlated categorical data which arise in assessment of psychiatric treatment programs. In our development, we introduce a Markovian structure to describe the non-homogeneity of transition patterns. In doing so, we introduce a logistic regression set-up for Markov chains and incorporate covariates in our model. We present a Bayesian model using Markov chain Monte Carlo methods and develop inference procedures to address issues encountered in the analyses of data from psychiatric treatment programs. Our model and inference procedures are implemented to some real data from a psychiatric treatment study. Copyright 2006 John Wiley & Sons, Ltd.

  20. The pharmacokinetics of dexmedetomidine during long-term infusion in critically ill pediatric patients. A Bayesian approach with informative priors.

    PubMed

    Wiczling, Paweł; Bartkowska-Śniatkowska, Alicja; Szerkus, Oliwia; Siluk, Danuta; Rosada-Kurasińska, Jowita; Warzybok, Justyna; Borsuk, Agnieszka; Kaliszan, Roman; Grześkowiak, Edmund; Bienert, Agnieszka

    2016-06-01

    The purpose of this study was to assess the pharmacokinetics of dexmedetomidine in the ICU settings during the prolonged infusion and to compare it with the existing literature data using the Bayesian population modeling with literature-based informative priors. Thirty-eight patients were included in the analysis with concentration measurements obtained at two occasions: first from 0 to 24 h after infusion initiation and second from 0 to 8 h after infusion end. Data analysis was conducted using WinBUGS software. The prior information on dexmedetomidine pharmacokinetics was elicited from the literature study pooling results from a relatively large group of 95 children. A two compartment PK model, with allometrically scaled parameters, maturation of clearance and t-student residual distribution on a log-scale was used to describe the data. The incorporation of time-dependent (different between two occasions) PK parameters improved the model. It was observed that volume of distribution is 1.5-fold higher during the second occasion. There was also an evidence of increased (1.3-fold) clearance for the second occasion with posterior probability equal to 62 %. This work demonstrated the usefulness of Bayesian modeling with informative priors in analyzing pharmacokinetic data and comparing it with existing literature knowledge.

  1. a Novel Discrete Optimal Transport Method for Bayesian Inverse Problems

    NASA Astrophysics Data System (ADS)

    Bui-Thanh, T.; Myers, A.; Wang, K.; Thiery, A.

    2017-12-01

    We present the Augmented Ensemble Transform (AET) method for generating approximate samples from a high-dimensional posterior distribution as a solution to Bayesian inverse problems. Solving large-scale inverse problems is critical for some of the most relevant and impactful scientific endeavors of our time. Therefore, constructing novel methods for solving the Bayesian inverse problem in more computationally efficient ways can have a profound impact on the science community. This research derives the novel AET method for exploring a posterior by solving a sequence of linear programming problems, resulting in a series of transport maps which map prior samples to posterior samples, allowing for the computation of moments of the posterior. We show both theoretical and numerical results, indicating this method can offer superior computational efficiency when compared to other SMC methods. Most of this efficiency is derived from matrix scaling methods to solve the linear programming problem and derivative-free optimization for particle movement. We use this method to determine inter-well connectivity in a reservoir and the associated uncertainty related to certain parameters. The attached file shows the difference between the true parameter and the AET parameter in an example 3D reservoir problem. The error is within the Morozov discrepancy allowance with lower computational cost than other particle methods.

  2. Large-scale quarantine following biological terrorism in the United States: scientific examination, logistic and legal limits, and possible consequences.

    PubMed

    Barbera, J; Macintyre, A; Gostin, L; Inglesby, T; O'Toole, T; DeAtley, C; Tonat, K; Layton, M

    2001-12-05

    Concern for potential bioterrorist attacks causing mass casualties has increased recently. Particular attention has been paid to scenarios in which a biological agent capable of person-to-person transmission, such as smallpox, is intentionally released among civilians. Multiple public health interventions are possible to effect disease containment in this context. One disease control measure that has been regularly proposed in various settings is the imposition of large-scale or geographic quarantine on the potentially exposed population. Although large-scale quarantine has not been implemented in recent US history, it has been used on a small scale in biological hoaxes, and it has been invoked in federally sponsored bioterrorism exercises. This article reviews the scientific principles that are relevant to the likely effectiveness of quarantine, the logistic barriers to its implementation, legal issues that a large-scale quarantine raises, and possible adverse consequences that might result from quarantine action. Imposition of large-scale quarantine-compulsory sequestration of groups of possibly exposed persons or human confinement within certain geographic areas to prevent spread of contagious disease-should not be considered a primary public health strategy in most imaginable circumstances. In the majority of contexts, other less extreme public health actions are likely to be more effective and create fewer unintended adverse consequences than quarantine. Actions and areas for future research, policy development, and response planning efforts are provided.

  3. Deep Learning with Hierarchical Convolutional Factor Analysis

    PubMed Central

    Chen, Bo; Polatkan, Gungor; Sapiro, Guillermo; Blei, David; Dunson, David; Carin, Lawrence

    2013-01-01

    Unsupervised multi-layered (“deep”) models are considered for general data, with a particular focus on imagery. The model is represented using a hierarchical convolutional factor-analysis construction, with sparse factor loadings and scores. The computation of layer-dependent model parameters is implemented within a Bayesian setting, employing a Gibbs sampler and variational Bayesian (VB) analysis, that explicitly exploit the convolutional nature of the expansion. In order to address large-scale and streaming data, an online version of VB is also developed. The number of basis functions or dictionary elements at each layer is inferred from the data, based on a beta-Bernoulli implementation of the Indian buffet process. Example results are presented for several image-processing applications, with comparisons to related models in the literature. PMID:23787342

  4. A Bayesian Analysis of Scale-Invariant Processes

    DTIC Science & Technology

    2012-01-01

    Earth Grid (EASE- Grid). The NED raster elevation data of one arc-second resolution (30 m) over the continental US are derived from multiple satellites ...instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send...empirical and ME distributions, yet ensuring computational efficiency. Instead of com- puting empirical histograms from large amount of data , only some

  5. A Bayesian Estimate of the CMB-Large-scale Structure Cross-correlation

    NASA Astrophysics Data System (ADS)

    Moura-Santos, E.; Carvalho, F. C.; Penna-Lima, M.; Novaes, C. P.; Wuensche, C. A.

    2016-08-01

    Evidences for late-time acceleration of the universe are provided by multiple probes, such as Type Ia supernovae, the cosmic microwave background (CMB), and large-scale structure (LSS). In this work, we focus on the integrated Sachs-Wolfe (ISW) effect, I.e., secondary CMB fluctuations generated by evolving gravitational potentials due to the transition between, e.g., the matter and dark energy (DE) dominated phases. Therefore, assuming a flat universe, DE properties can be inferred from ISW detections. We present a Bayesian approach to compute the CMB-LSS cross-correlation signal. The method is based on the estimate of the likelihood for measuring a combined set consisting of a CMB temperature and galaxy contrast maps, provided that we have some information on the statistical properties of the fluctuations affecting these maps. The likelihood is estimated by a sampling algorithm, therefore avoiding the computationally demanding techniques of direct evaluation in either pixel or harmonic space. As local tracers of the matter distribution at large scales, we used the Two Micron All Sky Survey galaxy catalog and, for the CMB temperature fluctuations, the ninth-year data release of the Wilkinson Microwave Anisotropy Probe (WMAP9). The results show a dominance of cosmic variance over the weak recovered signal, due mainly to the shallowness of the catalog used, with systematics associated with the sampling algorithm playing a secondary role as sources of uncertainty. When combined with other complementary probes, the method presented in this paper is expected to be a useful tool to late-time acceleration studies in cosmology.

  6. A BAYESIAN SPATIAL AND TEMPORAL MODELING APPROACH TO MAPPING GEOGRAPHIC VARIATION IN MORTALITY RATES FOR SUBNATIONAL AREAS WITH R-INLA.

    PubMed

    Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret

    2018-01-01

    Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.

  7. Modeling Non-Gaussian Time Series with Nonparametric Bayesian Model.

    PubMed

    Xu, Zhiguang; MacEachern, Steven; Xu, Xinyi

    2015-02-01

    We present a class of Bayesian copula models whose major components are the marginal (limiting) distribution of a stationary time series and the internal dynamics of the series. We argue that these are the two features with which an analyst is typically most familiar, and hence that these are natural components with which to work. For the marginal distribution, we use a nonparametric Bayesian prior distribution along with a cdf-inverse cdf transformation to obtain large support. For the internal dynamics, we rely on the traditionally successful techniques of normal-theory time series. Coupling the two components gives us a family of (Gaussian) copula transformed autoregressive models. The models provide coherent adjustments of time scales and are compatible with many extensions, including changes in volatility of the series. We describe basic properties of the models, show their ability to recover non-Gaussian marginal distributions, and use a GARCH modification of the basic model to analyze stock index return series. The models are found to provide better fit and improved short-range and long-range predictions than Gaussian competitors. The models are extensible to a large variety of fields, including continuous time models, spatial models, models for multiple series, models driven by external covariate streams, and non-stationary models.

  8. Skilled delivery care service utilization in Ethiopia: analysis of rural-urban differentials based on national demographic and health survey (DHS) data.

    PubMed

    Fekadu, Melaku; Regassa, Nigatu

    2014-12-01

    Despite the slight progress made on Antenatal Care (ANC) utilization, skilled delivery care service utilization in Ethiopia is still far-below any acceptable standards. Only 10% of women receive assistance from skilled birth attendants either at home or at health institutions, and as a result the country is recording a high maternal mortality ratio (MMR) of 676 per 100,000 live births (EDHS, 2011). Hence, this study aimed at identifying the rural-urban differentials in the predictors of skilled delivery care service utilization in Ethiopia. The study used the recent Ethiopian Demographic and Health Survey (EDHS 2011) data. Women who had at least one birth in the five years preceding the survey were included in this study. The data were analyzed using univariate (percentage), bivariate (chi-square) and multivariate (Bayesian logistic regression). The results showed that of the total 6,641 women, only 15.6% received skilled delivery care services either at home or at health institution. Rural women were at greater disadvantage to receive the service. Only 4.5% women in rural areas received assistance from skilled birth attendants (SBAs) compared to 64.1 % of their urban counter parts. Through Bayesian logistic regression analysis, place of residence, ANC utilization, women's education, age and birth order were identified as key predictors of service utilization. The findings highlight the need for coordinated effort from government and stakeholders to improve women's education, as well as strengthen community participation. Furthermore, the study recommended the need to scale up the quality of ANC and family planning services backed by improved and equitable access, availability and quality of skilled delivery care services.

  9. Multi-agent based control of large-scale complex systems employing distributed dynamic inference engine

    NASA Astrophysics Data System (ADS)

    Zhang, Daili

    Increasing societal demand for automation has led to considerable efforts to control large-scale complex systems, especially in the area of autonomous intelligent control methods. The control system of a large-scale complex system needs to satisfy four system level requirements: robustness, flexibility, reusability, and scalability. Corresponding to the four system level requirements, there arise four major challenges. First, it is difficult to get accurate and complete information. Second, the system may be physically highly distributed. Third, the system evolves very quickly. Fourth, emergent global behaviors of the system can be caused by small disturbances at the component level. The Multi-Agent Based Control (MABC) method as an implementation of distributed intelligent control has been the focus of research since the 1970s, in an effort to solve the above-mentioned problems in controlling large-scale complex systems. However, to the author's best knowledge, all MABC systems for large-scale complex systems with significant uncertainties are problem-specific and thus difficult to extend to other domains or larger systems. This situation is partly due to the control architecture of multiple agents being determined by agent to agent coupling and interaction mechanisms. Therefore, the research objective of this dissertation is to develop a comprehensive, generalized framework for the control system design of general large-scale complex systems with significant uncertainties, with the focus on distributed control architecture design and distributed inference engine design. A Hybrid Multi-Agent Based Control (HyMABC) architecture is proposed by combining hierarchical control architecture and module control architecture with logical replication rings. First, it decomposes a complex system hierarchically; second, it combines the components in the same level as a module, and then designs common interfaces for all of the components in the same module; third, replications are made for critical agents and are organized into logical rings. This architecture maintains clear guidelines for complexity decomposition and also increases the robustness of the whole system. Multiple Sectioned Dynamic Bayesian Networks (MSDBNs) as a distributed dynamic probabilistic inference engine, can be embedded into the control architecture to handle uncertainties of general large-scale complex systems. MSDBNs decomposes a large knowledge-based system into many agents. Each agent holds its partial perspective of a large problem domain by representing its knowledge as a Dynamic Bayesian Network (DBN). Each agent accesses local evidence from its corresponding local sensors and communicates with other agents through finite message passing. If the distributed agents can be organized into a tree structure, satisfying the running intersection property and d-sep set requirements, globally consistent inferences are achievable in a distributed way. By using different frequencies for local DBN agent belief updating and global system belief updating, it balances the communication cost with the global consistency of inferences. In this dissertation, a fully factorized Boyen-Koller (BK) approximation algorithm is used for local DBN agent belief updating, and the static Junction Forest Linkage Tree (JFLT) algorithm is used for global system belief updating. MSDBNs assume a static structure and a stable communication network for the whole system. However, for a real system, sub-Bayesian networks as nodes could be lost, and the communication network could be shut down due to partial damage in the system. Therefore, on-line and automatic MSDBNs structure formation is necessary for making robust state estimations and increasing survivability of the whole system. A Distributed Spanning Tree Optimization (DSTO) algorithm, a Distributed D-Sep Set Satisfaction (DDSSS) algorithm, and a Distributed Running Intersection Satisfaction (DRIS) algorithm are proposed in this dissertation. Combining these three distributed algorithms and a Distributed Belief Propagation (DBP) algorithm in MSDBNs makes state estimations robust to partial damage in the whole system. Combining the distributed control architecture design and the distributed inference engine design leads to a process of control system design for a general large-scale complex system. As applications of the proposed methodology, the control system design of a simplified ship chilled water system and a notional ship chilled water system have been demonstrated step by step. Simulation results not only show that the proposed methodology gives a clear guideline for control system design for general large-scale complex systems with dynamic and uncertain environment, but also indicate that the combination of MSDBNs and HyMABC can provide excellent performance for controlling general large-scale complex systems.

  10. Model selection for logistic regression models

    NASA Astrophysics Data System (ADS)

    Duller, Christine

    2012-09-01

    Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.

  11. Bayesian Redshift Classification of Emission-line Galaxies with Photometric Equivalent Widths

    NASA Astrophysics Data System (ADS)

    Leung, Andrew S.; Acquaviva, Viviana; Gawiser, Eric; Ciardullo, Robin; Komatsu, Eiichiro; Malz, A. I.; Zeimann, Gregory R.; Bridge, Joanna S.; Drory, Niv; Feldmeier, John J.; Finkelstein, Steven L.; Gebhardt, Karl; Gronwall, Caryl; Hagen, Alex; Hill, Gary J.; Schneider, Donald P.

    2017-07-01

    We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyα-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW {W}{Lyα }) greater than 20 Å. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and EW distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ˜106 emission-line galaxies into LAEs and low-redshift [{{O}} {{II}}] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional {W}{Lyα } > 20 Å cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method’s ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.

  12. Large scale air pollution estimation method combining land use regression and chemical transport modeling in a geostatistical framework.

    PubMed

    Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey

    2014-04-15

    In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.

  13. Population Genetic Structure of the Tropical Two-Wing Flyingfish (Exocoetus volitans)

    PubMed Central

    Lewallen, Eric A.; Bohonak, Andrew J.; Bonin, Carolina A.; van Wijnen, Andre J.; Pitman, Robert L.; Lovejoy, Nathan R.

    2016-01-01

    Delineating populations of pantropical marine fish is a difficult process, due to widespread geographic ranges and complex life history traits in most species. Exocoetus volitans, a species of two-winged flyingfish, is a good model for understanding large-scale patterns of epipelagic fish population structure because it has a circumtropical geographic range and completes its entire life cycle in the epipelagic zone. Buoyant pelagic eggs should dictate high local dispersal capacity in this species, although a brief larval phase, small body size, and short lifespan may limit the dispersal of individuals over large spatial scales. Based on these biological features, we hypothesized that E. volitans would exhibit statistically and biologically significant population structure defined by recognized oceanographic barriers. We tested this hypothesis by analyzing cytochrome b mtDNA sequence data (1106 bps) from specimens collected in the Pacific, Atlantic and Indian oceans (n = 266). AMOVA, Bayesian, and coalescent analytical approaches were used to assess and interpret population-level genetic variability. A parsimony-based haplotype network did not reveal population subdivision among ocean basins, but AMOVA revealed limited, statistically significant population structure between the Pacific and Atlantic Oceans (ΦST = 0.035, p<0.001). A spatially-unbiased Bayesian approach identified two circumtropical population clusters north and south of the Equator (ΦST = 0.026, p<0.001), a previously unknown dispersal barrier for an epipelagic fish. Bayesian demographic modeling suggested the effective population size of this species increased by at least an order of magnitude ~150,000 years ago, to more than 1 billion individuals currently. Thus, high levels of genetic similarity observed in E. volitans can be explained by high rates of gene flow, a dramatic and recent population expansion, as well as extensive and consistent dispersal throughout the geographic range of the species. PMID:27736863

  14. Population Genetic Structure of the Tropical Two-Wing Flyingfish (Exocoetus volitans).

    PubMed

    Lewallen, Eric A; Bohonak, Andrew J; Bonin, Carolina A; van Wijnen, Andre J; Pitman, Robert L; Lovejoy, Nathan R

    2016-01-01

    Delineating populations of pantropical marine fish is a difficult process, due to widespread geographic ranges and complex life history traits in most species. Exocoetus volitans, a species of two-winged flyingfish, is a good model for understanding large-scale patterns of epipelagic fish population structure because it has a circumtropical geographic range and completes its entire life cycle in the epipelagic zone. Buoyant pelagic eggs should dictate high local dispersal capacity in this species, although a brief larval phase, small body size, and short lifespan may limit the dispersal of individuals over large spatial scales. Based on these biological features, we hypothesized that E. volitans would exhibit statistically and biologically significant population structure defined by recognized oceanographic barriers. We tested this hypothesis by analyzing cytochrome b mtDNA sequence data (1106 bps) from specimens collected in the Pacific, Atlantic and Indian oceans (n = 266). AMOVA, Bayesian, and coalescent analytical approaches were used to assess and interpret population-level genetic variability. A parsimony-based haplotype network did not reveal population subdivision among ocean basins, but AMOVA revealed limited, statistically significant population structure between the Pacific and Atlantic Oceans (ΦST = 0.035, p<0.001). A spatially-unbiased Bayesian approach identified two circumtropical population clusters north and south of the Equator (ΦST = 0.026, p<0.001), a previously unknown dispersal barrier for an epipelagic fish. Bayesian demographic modeling suggested the effective population size of this species increased by at least an order of magnitude ~150,000 years ago, to more than 1 billion individuals currently. Thus, high levels of genetic similarity observed in E. volitans can be explained by high rates of gene flow, a dramatic and recent population expansion, as well as extensive and consistent dispersal throughout the geographic range of the species.

  15. A probabilistic model framework for evaluating year-to-year variation in crop productivity

    NASA Astrophysics Data System (ADS)

    Yokozawa, M.; Iizumi, T.; Tao, F.

    2008-12-01

    Most models describing the relation between crop productivity and weather condition have so far been focused on mean changes of crop yield. For keeping stable food supply against abnormal weather as well as climate change, evaluating the year-to-year variations in crop productivity rather than the mean changes is more essential. We here propose a new framework of probabilistic model based on Bayesian inference and Monte Carlo simulation. As an example, we firstly introduce a model on paddy rice production in Japan. It is called PRYSBI (Process- based Regional rice Yield Simulator with Bayesian Inference; Iizumi et al., 2008). The model structure is the same as that of SIMRIW, which was developed and used widely in Japan. The model includes three sub- models describing phenological development, biomass accumulation and maturing of rice crop. These processes are formulated to include response nature of rice plant to weather condition. This model inherently was developed to predict rice growth and yield at plot paddy scale. We applied it to evaluate the large scale rice production with keeping the same model structure. Alternatively, we assumed the parameters as stochastic variables. In order to let the model catch up actual yield at larger scale, model parameters were determined based on agricultural statistical data of each prefecture of Japan together with weather data averaged over the region. The posterior probability distribution functions (PDFs) of parameters included in the model were obtained using Bayesian inference. The MCMC (Markov Chain Monte Carlo) algorithm was conducted to numerically solve the Bayesian theorem. For evaluating the year-to-year changes in rice growth/yield under this framework, we firstly iterate simulations with set of parameter values sampled from the estimated posterior PDF of each parameter and then take the ensemble mean weighted with the posterior PDFs. We will also present another example for maize productivity in China. The framework proposed here provides us information on uncertainties, possibilities and limitations on future improvements in crop model as well.

  16. Targeting trachoma control through risk mapping: the example of Southern Sudan.

    PubMed

    Clements, Archie C A; Kur, Lucia W; Gatpan, Gideon; Ngondi, Jeremiah M; Emerson, Paul M; Lado, Mounir; Sabasio, Anthony; Kolaczinski, Jan H

    2010-08-17

    Trachoma is a major cause of blindness in Southern Sudan. Its distribution has only been partially established and many communities in need of intervention have therefore not been identified or targeted. The present study aimed to develop a tool to improve targeting of survey and control activities. A national trachoma risk map was developed using Bayesian geostatistics models, incorporating trachoma prevalence data from 112 geo-referenced communities surveyed between 2001 and 2009. Logistic regression models were developed using active trachoma (trachomatous inflammation follicular and/or trachomatous inflammation intense) in 6345 children aged 1-9 years as the outcome, and incorporating fixed effects for age, long-term average rainfall (interpolated from weather station data) and land cover (i.e. vegetation type, derived from satellite remote sensing), as well as geostatistical random effects describing spatial clustering of trachoma. The model predicted the west of the country to be at no or low trachoma risk. Trachoma clusters in the central, northern and eastern areas had a radius of 8 km after accounting for the fixed effects. In Southern Sudan, large-scale spatial variation in the risk of active trachoma infection is associated with aridity. Spatial prediction has identified likely high-risk areas to be prioritized for more data collection, potentially to be followed by intervention.

  17. Targeting Trachoma Control through Risk Mapping: The Example of Southern Sudan

    PubMed Central

    Clements, Archie C. A.; Kur, Lucia W.; Gatpan, Gideon; Ngondi, Jeremiah M.; Emerson, Paul M.; Lado, Mounir; Sabasio, Anthony; Kolaczinski, Jan H.

    2010-01-01

    Background Trachoma is a major cause of blindness in Southern Sudan. Its distribution has only been partially established and many communities in need of intervention have therefore not been identified or targeted. The present study aimed to develop a tool to improve targeting of survey and control activities. Methods/Principal Findings A national trachoma risk map was developed using Bayesian geostatistics models, incorporating trachoma prevalence data from 112 geo-referenced communities surveyed between 2001 and 2009. Logistic regression models were developed using active trachoma (trachomatous inflammation follicular and/or trachomatous inflammation intense) in 6345 children aged 1–9 years as the outcome, and incorporating fixed effects for age, long-term average rainfall (interpolated from weather station data) and land cover (i.e. vegetation type, derived from satellite remote sensing), as well as geostatistical random effects describing spatial clustering of trachoma. The model predicted the west of the country to be at no or low trachoma risk. Trachoma clusters in the central, northern and eastern areas had a radius of 8 km after accounting for the fixed effects. Conclusion In Southern Sudan, large-scale spatial variation in the risk of active trachoma infection is associated with aridity. Spatial prediction has identified likely high-risk areas to be prioritized for more data collection, potentially to be followed by intervention. PMID:20808910

  18. A comment on priors for Bayesian occupancy models.

    PubMed

    Northrup, Joseph M; Gerber, Brian D

    2018-01-01

    Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are "uninformative" or "vague", such priors can easily be unintentionally highly informative. Here we report on how the specification of a "vague" normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts.

  19. Hypothesis testing on the fractal structure of behavioral sequences: the Bayesian assessment of scaling methodology.

    PubMed

    Moscoso del Prado Martín, Fermín

    2013-12-01

    I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  20. Self-optimized construction of transition rate matrices from accelerated atomistic simulations with Bayesian uncertainty quantification

    NASA Astrophysics Data System (ADS)

    Swinburne, Thomas D.; Perez, Danny

    2018-05-01

    A massively parallel method to build large transition rate matrices from temperature-accelerated molecular dynamics trajectories is presented. Bayesian Markov model analysis is used to estimate the expected residence time in the known state space, providing crucial uncertainty quantification for higher-scale simulation schemes such as kinetic Monte Carlo or cluster dynamics. The estimators are additionally used to optimize where exploration is performed and the degree of temperature acceleration on the fly, giving an autonomous, optimal procedure to explore the state space of complex systems. The method is tested against exactly solvable models and used to explore the dynamics of C15 interstitial defects in iron. Our uncertainty quantification scheme allows for accurate modeling of the evolution of these defects over timescales of several seconds.

  1. Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad

    2016-02-01

    Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less

  2. Understanding Short-Term Nonmigrating Tidal Variability in the Ionospheric Dynamo Region from SABER Using Information Theory and Bayesian Statistics

    NASA Astrophysics Data System (ADS)

    Kumari, K.; Oberheide, J.

    2017-12-01

    Nonmigrating tidal diagnostics of SABER temperature observations in the ionospheric dynamo region reveal a large amount of variability on time-scales of a few days to weeks. In this paper, we discuss the physical reasons for the observed short-term tidal variability using a novel approach based on Information theory and Bayesian statistics. We diagnose short-term tidal variability as a function of season, QBO, ENSO, and solar cycle and other drivers using time dependent probability density functions, Shannon entropy and Kullback-Leibler divergence. The statistical significance of the approach and its predictive capability is exemplified using SABER tidal diagnostics with emphasis on the responses to the QBO and solar cycle. Implications for F-region plasma density will be discussed.

  3. Textual and visual content-based anti-phishing: a Bayesian approach.

    PubMed

    Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin

    2011-10-01

    A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE

  4. Semiparametric time varying coefficient model for matched case-crossover studies.

    PubMed

    Ortega-Villa, Ana Maria; Kim, Inyoung; Kim, H

    2017-03-15

    In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. This is because any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. However, some matching covariates such as time often play an important role as an effect modification leading to incorrect statistical estimation and prediction. Therefore, we propose three approaches to evaluate effect modification by time. The first is a parametric approach, the second is a semiparametric penalized approach, and the third is a semiparametric Bayesian approach. Our parametric approach is a two-stage method, which uses conditional logistic regression in the first stage and then estimates polynomial regression in the second stage. Our semiparametric penalized and Bayesian approaches are one-stage approaches developed by using regression splines. Our semiparametric one stage approach allows us to not only detect the parametric relationship between the predictor and binary outcomes, but also evaluate nonparametric relationships between the predictor and time. We demonstrate the advantage of our semiparametric one-stage approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity. We also provide statistical inference for the semiparametric Bayesian approach using Bayes Factors. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  5. Statistical analysis of modal parameters of a suspension bridge based on Bayesian spectral density approach and SHM data

    NASA Astrophysics Data System (ADS)

    Li, Zhijun; Feng, Maria Q.; Luo, Longxi; Feng, Dongming; Xu, Xiuli

    2018-01-01

    Uncertainty of modal parameters estimation appear in structural health monitoring (SHM) practice of civil engineering to quite some significant extent due to environmental influences and modeling errors. Reasonable methodologies are needed for processing the uncertainty. Bayesian inference can provide a promising and feasible identification solution for the purpose of SHM. However, there are relatively few researches on the application of Bayesian spectral method in the modal identification using SHM data sets. To extract modal parameters from large data sets collected by SHM system, the Bayesian spectral density algorithm was applied to address the uncertainty of mode extraction from output-only response of a long-span suspension bridge. The posterior most possible values of modal parameters and their uncertainties were estimated through Bayesian inference. A long-term variation and statistical analysis was performed using the sensor data sets collected from the SHM system of the suspension bridge over a one-year period. The t location-scale distribution was shown to be a better candidate function for frequencies of lower modes. On the other hand, the burr distribution provided the best fitting to the higher modes which are sensitive to the temperature. In addition, wind-induced variation of modal parameters was also investigated. It was observed that both the damping ratios and modal forces increased during the period of typhoon excitations. Meanwhile, the modal damping ratios exhibit significant correlation with the spectral intensities of the corresponding modal forces.

  6. Bayesian methods to determine performance differences and to quantify variability among centers in multi-center trials: the IHAST trial.

    PubMed

    Bayman, Emine O; Chaloner, Kathryn M; Hindman, Bradley J; Todd, Michael M

    2013-01-16

    To quantify the variability among centers and to identify centers whose performance are potentially outside of normal variability in the primary outcome and to propose a guideline that they are outliers. Novel statistical methodology using a Bayesian hierarchical model is used. Bayesian methods for estimation and outlier detection are applied assuming an additive random center effect on the log odds of response: centers are similar but different (exchangeable). The Intraoperative Hypothermia for Aneurysm Surgery Trial (IHAST) is used as an example. Analyses were adjusted for treatment, age, gender, aneurysm location, World Federation of Neurological Surgeons scale, Fisher score and baseline NIH stroke scale scores. Adjustments for differences in center characteristics were also examined. Graphical and numerical summaries of the between-center standard deviation (sd) and variability, as well as the identification of potential outliers are implemented. In the IHAST, the center-to-center variation in the log odds of favorable outcome at each center is consistent with a normal distribution with posterior sd of 0.538 (95% credible interval: 0.397 to 0.726) after adjusting for the effects of important covariates. Outcome differences among centers show no outlying centers. Four potential outlying centers were identified but did not meet the proposed guideline for declaring them as outlying. Center characteristics (number of subjects enrolled from the center, geographical location, learning over time, nitrous oxide, and temporary clipping use) did not predict outcome, but subject and disease characteristics did. Bayesian hierarchical methods allow for determination of whether outcomes from a specific center differ from others and whether specific clinical practices predict outcome, even when some centers/subgroups have relatively small sample sizes. In the IHAST no outlying centers were found. The estimated variability between centers was moderately large.

  7. Informational and emotional elements in online support groups: a Bayesian approach to large-scale content analysis.

    PubMed

    Deetjen, Ulrike; Powell, John A

    2016-05-01

    This research examines the extent to which informational and emotional elements are employed in online support forums for 14 purposively sampled chronic medical conditions and the factors that influence whether posts are of a more informational or emotional nature. Large-scale qualitative data were obtained from Dailystrength.org. Based on a hand-coded training dataset, all posts were classified into informational or emotional using a Bayesian classification algorithm to generalize the findings. Posts that could not be classified with a probability of at least 75% were excluded. The overall tendency toward emotional posts differs by condition: mental health (depression, schizophrenia) and Alzheimer's disease consist of more emotional posts, while informational posts relate more to nonterminal physical conditions (irritable bowel syndrome, diabetes, asthma). There is no gender difference across conditions, although prostate cancer forums are oriented toward informational support, whereas breast cancer forums rather feature emotional support. Across diseases, the best predictors for emotional content are lower age and a higher number of overall posts by the support group member. The results are in line with previous empirical research and unify empirical findings from single/2-condition research. Limitations include the analytical restriction to predefined categories (informational, emotional) through the chosen machine-learning approach. Our findings provide an empirical foundation for building theory on informational versus emotional support across conditions, give insights for practitioners to better understand the role of online support groups for different patients, and show the usefulness of machine-learning approaches to analyze large-scale qualitative health data from online settings. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Impact of Federal drug law enforcement on the supply of heroin in Australia.

    PubMed

    Smithson, Michael; McFadden, Michael; Mwesigye, Sue-Ellen

    2005-08-01

    To conduct an empirical investigation of the efficacy of law enforcement in reducing heroin supply in Australia. Specifically, this paper addresses the question of whether heroin purity levels in the Australian Capital Territory (ACT) could be predicted by heroin seizures at the national level by the Australian Federal Police (AFP) in the preceding year. We considered two forms of evidence. First, a Bayesian Markov Chain Monte Carlo (MCMC) change-point model was used to discover (a) if there was a substantial increase in heroin seizures by the AFP, (b) when the increase began and (c) whether it occurred after increased funding to the Australian Federal Police for the purpose of drug law enforcement. Second, standard time-series methods were used to ascertain whether fluctuations in heroin seizure weights or the frequency of large-scale seizures after the aforementioned changes in seizure levels predicted fluctuations in heroin purity levels in the ACT after autocorrelation had been removed from the purity series. A Bayesian MCMC change-point model supported the hypothesis that heroin seizures rapidly increased about a year before the estimated decline in heroin purity and after the increased funding of AFP. The autoregression models suggested that 10-20% of the variance in the residuals of the heroin purity series was predicted by appropriately lagged residuals of the seizure-number and log-weight series, after autocorrelation had been removed. The overall results are consistent with the hypothesis that large-scale heroin seizures by the AFP reduce street-level heroin supply a year or so later, although the short-term dynamics suggest an 'opponent' response to residual fluctuations in seizures. To our knowledge, this is first time a connection has been identified between large-scale heroin seizures and street-level supply.

  9. Robust Bayesian Analysis of Heavy-tailed Stochastic Volatility Models using Scale Mixtures of Normal Distributions

    PubMed Central

    Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.

    2009-01-01

    A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043

  10. Structure-preserving model reduction of large-scale logistics networks. Applications for supply chains

    NASA Astrophysics Data System (ADS)

    Scholz-Reiter, B.; Wirth, F.; Dashkovskiy, S.; Makuschewitz, T.; Schönlein, M.; Kosmykov, M.

    2011-12-01

    We investigate the problem of model reduction with a view to large-scale logistics networks, specifically supply chains. Such networks are modeled by means of graphs, which describe the structure of material flow. An aim of the proposed model reduction procedure is to preserve important features within the network. As a new methodology we introduce the LogRank as a measure for the importance of locations, which is based on the structure of the flows within the network. We argue that these properties reflect relative importance of locations. Based on the LogRank we identify subgraphs of the network that can be neglected or aggregated. The effect of this is discussed for a few motifs. Using this approach we present a meta algorithm for structure-preserving model reduction that can be adapted to different mathematical modeling frameworks. The capabilities of the approach are demonstrated with a test case, where a logistics network is modeled as a Jackson network, i.e., a particular type of queueing network.

  11. Analysis of efficiency of waste reverse logistics for recycling.

    PubMed

    Veiga, Marcelo M

    2013-10-01

    Brazil is an agricultural country with the highest pesticide consumption in the world. Historically, pesticide packaging has not been disposed of properly. A federal law requires the chemical industry to provide proper waste management for pesticide-related products. A reverse logistics program was implemented, which has been hailed a great success. This program was designed to target large rural communities, where economy of scale can take place. Over the last 10 years, the recovery rate has been very poor in most small rural communities. The objective of this study was to analyze the case of this compulsory reverse logistics program for pesticide packaging under the recent Brazilian Waste Management Policy, which enforces recycling as the main waste management solution. This results of this exploratory research indicate that despite its aggregate success, the reverse logistics program is not efficient for small rural communities. It is not possible to use the same logistic strategy for small and large communities. The results also indicate that recycling might not be the optimal solution, especially in developing countries with unsatisfactory recycling infrastructure and large transportation costs. Postponement and speculation strategies could be applied for improving reverse logistics performance. In most compulsory reverse logistics programs, there is no economical solution. Companies should comply with the law by ranking cost-effective alternatives.

  12. Ultra-Scalable Algorithms for Large-Scale Uncertainty Quantification in Inverse Wave Propagation

    DTIC Science & Technology

    2016-03-04

    53] N. Petra , J. Martin , G. Stadler, and O. Ghattas, A computational framework for infinite-dimensional Bayesian inverse problems: Part II...positions: Alen Alexanderian (NC State), Tan Bui-Thanh (UT-Austin), Carsten Burstedde (University of Bonn), Noemi Petra (UC Merced), Georg Stalder (NYU), Hari...Baltimore, MD, Nov. 2002. SC2002 Best Technical Paper Award. [3] A. Alexanderian, N. Petra , G. Stadler, and O. Ghattas, A-optimal design of exper

  13. An Illustrative Guide to the Minerva Framework

    NASA Astrophysics Data System (ADS)

    Flom, Erik; Leonard, Patrick; Hoeffel, Udo; Kwak, Sehyun; Pavone, Andrea; Svensson, Jakob; Krychowiak, Maciej; Wendelstein 7-X Team Collaboration

    2017-10-01

    Modern phsyics experiments require tracking and modelling data and their associated uncertainties on a large scale, as well as the combined implementation of multiple independent data streams for sophisticated modelling and analysis. The Minerva Framework offers a centralized, user-friendly method of large-scale physics modelling and scientific inference. Currently used by teams at multiple large-scale fusion experiments including the Joint European Torus (JET) and Wendelstein 7-X (W7-X), the Minerva framework provides a forward-model friendly architecture for developing and implementing models for large-scale experiments. One aspect of the framework involves so-called data sources, which are nodes in the graphical model. These nodes are supplied with engineering and physics parameters. When end-user level code calls a node, it is checked network-wide against its dependent nodes for changes since its last implementation and returns version-specific data. Here, a filterscope data node is used as an illustrative example of the Minerva Framework's data management structure and its further application to Bayesian modelling of complex systems. This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014-2018 under Grant Agreement No. 633053.

  14. NASA Space Rocket Logistics Challenges

    NASA Technical Reports Server (NTRS)

    Bramon, Chris; Neeley, James R.; Jones, James V.; Watson, Michael D.; Inman, Sharon K.; Tuttle, Loraine

    2014-01-01

    The Space Launch System (SLS) is the new NASA heavy lift launch vehicle in development and is scheduled for its first mission in 2017. SLS has many of the same logistics challenges as any other large scale program. However, SLS also faces unique challenges. This presentation will address the SLS challenges, along with the analysis and decisions to mitigate the threats posed by each.

  15. The Emerging Role of the Data Base Manager. Report No. R-1253-PR.

    ERIC Educational Resources Information Center

    Sawtelle, Thomas K.

    The Air Force Logistics Command (AFLC) is revising and enhancing its data-processing capabilities with the development of a large-scale, multi-site, on-line, integrated data base information system known as the Advanced Logistics System (ALS). A data integrity program is to be built around a Data Base Manager (DBM), an individual or a group of…

  16. Bayesian state space models for dynamic genetic network construction across multiple tissues.

    PubMed

    Liang, Yulan; Kelemen, Arpad

    2016-08-01

    Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.

  17. Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

    PubMed Central

    Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang

    2014-01-01

    Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553

  18. Comparing Three Estimation Methods for the Three-Parameter Logistic IRT Model

    ERIC Educational Resources Information Center

    Lamsal, Sunil

    2015-01-01

    Different estimation procedures have been developed for the unidimensional three-parameter item response theory (IRT) model. These techniques include the marginal maximum likelihood estimation, the fully Bayesian estimation using Markov chain Monte Carlo simulation techniques, and the Metropolis-Hastings Robbin-Monro estimation. With each…

  19. Data Mining in Child Welfare.

    ERIC Educational Resources Information Center

    Schoech, Dick; Quinn, Andrew; Rycraft, Joan R.

    2000-01-01

    Examines the historical and larger context of data mining and describes data mining processes, techniques, and tools. Illustrates these using a child welfare dataset concerning the employee turnover that is mined, using logistic regression and a Bayesian neural network. Discusses the data mining process, the resulting models, their predictive…

  20. Enhancing a Short Measure of Big Five Personality Traits with Bayesian Scaling

    ERIC Educational Resources Information Center

    Jones, W. Paul

    2014-01-01

    A study in a university clinic/laboratory investigated adaptive Bayesian scaling as a supplement to interpretation of scores on the Mini-IPIP. A "probability of belonging" in categories of low, medium, or high on each of the Big Five traits was calculated after each item response and continued until all items had been used or until a…

  1. Bayesian data fusion for spatial prediction of categorical variables in environmental sciences

    NASA Astrophysics Data System (ADS)

    Gengler, Sarah; Bogaert, Patrick

    2014-12-01

    First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.

  2. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  3. A comment on priors for Bayesian occupancy models

    PubMed Central

    Gerber, Brian D.

    2018-01-01

    Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are “uninformative” or “vague”, such priors can easily be unintentionally highly informative. Here we report on how the specification of a “vague” normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts. PMID:29481554

  4. Logistic Stick-Breaking Process

    PubMed Central

    Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.

    2013-01-01

    A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593

  5. Hamiltonian Markov Chain Monte Carlo Methods for the CUORE Neutrinoless Double Beta Decay Sensitivity

    NASA Astrophysics Data System (ADS)

    Graham, Eleanor; Cuore Collaboration

    2017-09-01

    The CUORE experiment is a large-scale bolometric detector seeking to observe the never-before-seen process of neutrinoless double beta decay. Predictions for CUORE's sensitivity to neutrinoless double beta decay allow for an understanding of the half-life ranges that the detector can probe, and also to evaluate the relative importance of different detector parameters. Currently, CUORE uses a Bayesian analysis based in BAT, which uses Metropolis-Hastings Markov Chain Monte Carlo, for its sensitivity studies. My work evaluates the viability and potential improvements of switching the Bayesian analysis to Hamiltonian Monte Carlo, realized through the program Stan and its Morpho interface. I demonstrate that the BAT study can be successfully recreated in Stan, and perform a detailed comparison between the results and computation times of the two methods.

  6. A Bayesian analysis of inflationary primordial spectrum models using Planck data

    NASA Astrophysics Data System (ADS)

    Santos da Costa, Simony; Benetti, Micol; Alcaniz, Jailson

    2018-03-01

    The current available Cosmic Microwave Background (CMB) data show an anomalously low value of the CMB temperature fluctuations at large angular scales (l < 40). This lack of power is not explained by the minimal ΛCDM model, and one of the possible mechanisms explored in the literature to address this problem is the presence of features in the primordial power spectrum (PPS) motivated by the early universe physics. In this paper, we analyse a set of cutoff inflationary PPS models using a Bayesian model comparison approach in light of the latest CMB data from the Planck Collaboration. Our results show that the standard power-law parameterisation is preferred over all models considered in the analysis, which motivates the search for alternative explanations for the observed lack of power in the CMB anisotropy spectrum.

  7. Spotted Towhee population dynamics in a riparian restoration context

    Treesearch

    Stacy L. Small; Frank R., III Thompson; Geoffery R. Geupel; John Faaborg

    2007-01-01

    We investigated factors at multiple scales that might influence nest predation risk for Spotted Towhees (Pipilo maculates) along the Sacramento River, California, within the context of large-scale riparian habitat restoration. We used the logistic-exposure method and Akaike's information criterion (AIC) for model selection to compare predator...

  8. Bayesian structural equation modeling in sport and exercise psychology.

    PubMed

    Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus

    2015-08-01

    Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.

  9. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  10. Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.

    PubMed

    Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence

    2012-12-01

    A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.

  11. Learning Instance-Specific Predictive Models

    PubMed Central

    Visweswaran, Shyam; Cooper, Gregory F.

    2013-01-01

    This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325

  12. In-situ resource utilization for the human exploration of Mars : a Bayesian approach to valuation of precursor missions

    NASA Technical Reports Server (NTRS)

    Smith, Jeffrey H.

    2006-01-01

    The need for sufficient quantities of oxygen, water, and fuel resources to support a crew on the surface of Mars presents a critical logistical issue of whether to transport such resources from Earth or manufacture them on Mars. An approach based on the classical Wildcat Drilling Problem of Bayesian decision theory was applied to the problem of finding water in order to compute the expected value of precursor mission sample information. An implicit (required) probability of finding water on Mars was derived from the value of sample information using the expected mass savings of alternative precursor missions.

  13. A small-area ecologic study of myocardial infarction, neighborhood deprivation, and sex: a Bayesian modeling approach.

    PubMed

    Deguen, Séverine; Lalloue, Benoît; Bard, Denis; Havard, Sabrina; Arveiler, Dominique; Zmirou-Navier, Denis

    2010-07-01

    Socioeconomic inequalities in the risk of coronary heart disease (CHD) are well documented for men and women. CHD incidence is greater for men but its association with socioeconomic status is usually found to be stronger among women. We explored the sex-specific association between neighborhood deprivation level and the risk of myocardial infarction (MI) at a small-area scale. We studied 1193 myocardial infarction events in people aged 35-74 years in the Strasbourg metropolitan area, France (2000-2003). We used a deprivation index to assess the neighborhood deprivation level. To take into account spatial dependence and the variability of MI rates due to the small number of events, we used a hierarchical Bayesian modeling approach. We fitted hierarchical Bayesian models to estimate sex-specific relative and absolute MI risks across deprivation categories. We tested departure from additive joint effects of deprivation and sex. The risk of MI increased with the deprivation level for both sexes, but was higher for men for all deprivation classes. Relative rates increased along the deprivation scale more steadily for women and followed a different pattern: linear for men and nonlinear for women. Our data provide evidence of effect modification, with departure from an additive joint effect of deprivation and sex. We document sex differences in the socioeconomic gradient of MI risk in Strasbourg. Women appear more susceptible at levels of extreme deprivation; this result is not a chance finding, given the large difference in event rates between men and women.

  14. Exploring links between juvenile offenders and social disorganization at a large map scale: a Bayesian spatial modeling approach

    NASA Astrophysics Data System (ADS)

    Law, Jane; Quick, Matthew

    2013-01-01

    This paper adopts a Bayesian spatial modeling approach to investigate the distribution of young offender residences in York Region, Southern Ontario, Canada, at the census dissemination area level. Few geographic researches have analyzed offender (as opposed to offense) data at a large map scale (i.e., using a relatively small areal unit of analysis) to minimize aggregation effects. Providing context is the social disorganization theory, which hypothesizes that areas with economic deprivation, high population turnover, and high ethnic heterogeneity exhibit social disorganization and are expected to facilitate higher instances of young offenders. Non-spatial and spatial Poisson models indicate that spatial methods are superior to non-spatial models with respect to model fit and that index of ethnic heterogeneity, residential mobility (1 year moving rate), and percentage of residents receiving government transfer payments are, respectively, the most significant explanatory variables related to young offender location. These findings provide overwhelming support for social disorganization theory as it applies to offender location in York Region, Ontario. Targeting areas where prevalence of young offenders could or could not be explained by social disorganization through decomposing the estimated risk map are helpful for dealing with juvenile offenders in the region. Results prompt discussion into geographically targeted police services and young offender placement pertaining to risk of recidivism. We discuss possible reasons for differences and similarities between the previous findings (that analyzed offense data and/or were conducted at a smaller map scale) and our findings, limitations of our study, and practical outcomes of this research from a law enforcement perspective.

  15. Bayesian power spectrum inference with foreground and target contamination treatment

    NASA Astrophysics Data System (ADS)

    Jasche, J.; Lavaux, G.

    2017-10-01

    This work presents a joint and self-consistent Bayesian treatment of various foreground and target contaminations when inferring cosmological power spectra and three-dimensional density fields from galaxy redshift surveys. This is achieved by introducing additional block-sampling procedures for unknown coefficients of foreground and target contamination templates to the previously presented ARES framework for Bayesian large-scale structure analyses. As a result, the method infers jointly and fully self-consistently three-dimensional density fields, cosmological power spectra, luminosity-dependent galaxy biases, noise levels of the respective galaxy distributions, and coefficients for a set of a priori specified foreground templates. In addition, this fully Bayesian approach permits detailed quantification of correlated uncertainties amongst all inferred quantities and correctly marginalizes over observational systematic effects. We demonstrate the validity and efficiency of our approach in obtaining unbiased estimates of power spectra via applications to realistic mock galaxy observations that are subject to stellar contamination and dust extinction. While simultaneously accounting for galaxy biases and unknown noise levels, our method reliably and robustly infers three-dimensional density fields and corresponding cosmological power spectra from deep galaxy surveys. Furthermore, our approach correctly accounts for joint and correlated uncertainties between unknown coefficients of foreground templates and the amplitudes of the power spectrum. This effect amounts to correlations and anti-correlations of up to 10 per cent across wide ranges in Fourier space.

  16. Low frequency full waveform seismic inversion within a tree based Bayesian framework

    NASA Astrophysics Data System (ADS)

    Ray, Anandaroop; Kaplan, Sam; Washbourne, John; Albertin, Uwe

    2018-01-01

    Limited illumination, insufficient offset, noisy data and poor starting models can pose challenges for seismic full waveform inversion. We present an application of a tree based Bayesian inversion scheme which attempts to mitigate these problems by accounting for data uncertainty while using a mildly informative prior about subsurface structure. We sample the resulting posterior model distribution of compressional velocity using a trans-dimensional (trans-D) or Reversible Jump Markov chain Monte Carlo method in the wavelet transform domain of velocity. This allows us to attain rapid convergence to a stationary distribution of posterior models while requiring a limited number of wavelet coefficients to define a sampled model. Two synthetic, low frequency, noisy data examples are provided. The first example is a simple reflection + transmission inverse problem, and the second uses a scaled version of the Marmousi velocity model, dominated by reflections. Both examples are initially started from a semi-infinite half-space with incorrect background velocity. We find that the trans-D tree based approach together with parallel tempering for navigating rugged likelihood (i.e. misfit) topography provides a promising, easily generalized method for solving large-scale geophysical inverse problems which are difficult to optimize, but where the true model contains a hierarchy of features at multiple scales.

  17. Fractal analysis of the dark matter and gas distributions in the Mare-Nostrum universe

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaite, José, E-mail: jose.gaite@upm.es

    2010-03-01

    We develop a method of multifractal analysis of N-body cosmological simulations that improves on the customary counts-in-cells method by taking special care of the effects of discreteness and large scale homogeneity. The analysis of the Mare-Nostrum simulation with our method provides strong evidence of self-similar multifractal distributions of dark matter and gas, with a halo mass function that is of Press-Schechter type but has a power-law exponent -2, as corresponds to a multifractal. Furthermore, our analysis shows that the dark matter and gas distributions are indistinguishable as multifractals. To determine if there is any gas biasing, we calculate the cross-correlationmore » coefficient, with negative but inconclusive results. Hence, we develop an effective Bayesian analysis connected with information theory, which clearly demonstrates that the gas is biased in a long range of scales, up to the scale of homogeneity. However, entropic measures related to the Bayesian analysis show that this gas bias is small (in a precise sense) and is such that the fractal singularities of both distributions coincide and are identical. We conclude that this common multifractal cosmic web structure is determined by the dynamics and is independent of the initial conditions.« less

  18. Bayesian approach to the assessment of the population-specific risk of inhibitors in hemophilia A patients: a case study

    PubMed Central

    Cheng, Ji; Iorio, Alfonso; Marcucci, Maura; Romanov, Vadim; Pullenayegum, Eleanor M; Marshall, John K; Thabane, Lehana

    2016-01-01

    Background Developing inhibitors is a rare event during the treatment of hemophilia A. The multifacets and uncertainty surrounding the development of inhibitors further complicate the process of estimating inhibitor rate from the limited data. Bayesian statistical modeling provides a useful tool in generating, enhancing, and exploring the evidence through incorporating all the available information. Methods We built our Bayesian analysis using three study cases to estimate the inhibitor rates of patients with hemophilia A in three different scenarios: Case 1, a single cohort of previously treated patients (PTPs) or previously untreated patients; Case 2, a meta-analysis of PTP cohorts; and Case 3, a previously unexplored patient population – patients with baseline low-titer inhibitor or history of inhibitor development. The data used in this study were extracted from three published ADVATE (antihemophilic factor [recombinant] is a product of Baxter for treating hemophilia A) post-authorization surveillance studies. Noninformative and informative priors were applied to Bayesian standard (Case 1) or random-effects (Case 2 and Case 3) logistic models. Bayesian probabilities of satisfying three meaningful thresholds of the risk of developing a clinical significant inhibitor (10/100, 5/100 [high rates], and 1/86 [the Food and Drug Administration mandated cutoff rate in PTPs]) were calculated. The effect of discounting prior information or scaling up the study data was evaluated. Results Results based on noninformative priors were similar to the classical approach. Using priors from PTPs lowered the point estimate and narrowed the 95% credible intervals (Case 1: from 1.3 [0.5, 2.7] to 0.8 [0.5, 1.1]; Case 2: from 1.9 [0.6, 6.0] to 0.8 [0.5, 1.1]; Case 3: 2.3 [0.5, 6.8] to 0.7 [0.5, 1.1]). All probabilities of satisfying a threshold of 1/86 were above 0.65. Increasing the number of patients by two and ten times substantially narrowed the credible intervals for the single cohort study (1.4 [0.7, 2.3] and 1.4 [1.1, 1.8], respectively). Increasing the number of studies by two and ten times for the multiple study scenarios (Case 2: 1.9 [0.6, 4.0] and 1.9 [1.5, 2.6]; Case 3: 2.4 [0.9, 5.0] and 2.6 [1.9, 3.5], respectively) had a similar effect. Conclusion Bayesian approach as a robust, transparent, and reproducible analytic method can be efficiently used to estimate the inhibitor rate of hemophilia A in complex clinical settings. PMID:27822129

  19. Bayesian approach to the assessment of the population-specific risk of inhibitors in hemophilia A patients: a case study.

    PubMed

    Cheng, Ji; Iorio, Alfonso; Marcucci, Maura; Romanov, Vadim; Pullenayegum, Eleanor M; Marshall, John K; Thabane, Lehana

    2016-01-01

    Developing inhibitors is a rare event during the treatment of hemophilia A. The multifacets and uncertainty surrounding the development of inhibitors further complicate the process of estimating inhibitor rate from the limited data. Bayesian statistical modeling provides a useful tool in generating, enhancing, and exploring the evidence through incorporating all the available information. We built our Bayesian analysis using three study cases to estimate the inhibitor rates of patients with hemophilia A in three different scenarios: Case 1, a single cohort of previously treated patients (PTPs) or previously untreated patients; Case 2, a meta-analysis of PTP cohorts; and Case 3, a previously unexplored patient population - patients with baseline low-titer inhibitor or history of inhibitor development. The data used in this study were extracted from three published ADVATE (antihemophilic factor [recombinant] is a product of Baxter for treating hemophilia A) post-authorization surveillance studies. Noninformative and informative priors were applied to Bayesian standard (Case 1) or random-effects (Case 2 and Case 3) logistic models. Bayesian probabilities of satisfying three meaningful thresholds of the risk of developing a clinical significant inhibitor (10/100, 5/100 [high rates], and 1/86 [the Food and Drug Administration mandated cutoff rate in PTPs]) were calculated. The effect of discounting prior information or scaling up the study data was evaluated. Results based on noninformative priors were similar to the classical approach. Using priors from PTPs lowered the point estimate and narrowed the 95% credible intervals (Case 1: from 1.3 [0.5, 2.7] to 0.8 [0.5, 1.1]; Case 2: from 1.9 [0.6, 6.0] to 0.8 [0.5, 1.1]; Case 3: 2.3 [0.5, 6.8] to 0.7 [0.5, 1.1]). All probabilities of satisfying a threshold of 1/86 were above 0.65. Increasing the number of patients by two and ten times substantially narrowed the credible intervals for the single cohort study (1.4 [0.7, 2.3] and 1.4 [1.1, 1.8], respectively). Increasing the number of studies by two and ten times for the multiple study scenarios (Case 2: 1.9 [0.6, 4.0] and 1.9 [1.5, 2.6]; Case 3: 2.4 [0.9, 5.0] and 2.6 [1.9, 3.5], respectively) had a similar effect. Bayesian approach as a robust, transparent, and reproducible analytic method can be efficiently used to estimate the inhibitor rate of hemophilia A in complex clinical settings.

  20. Model-based Bayesian inference for ROC data analysis

    NASA Astrophysics Data System (ADS)

    Lei, Tianhu; Bae, K. Ty

    2013-03-01

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.

  1. Bayesian analysis of time-series data under case-crossover designs: posterior equivalence and inference.

    PubMed

    Li, Shi; Mukherjee, Bhramar; Batterman, Stuart; Ghosh, Malay

    2013-12-01

    Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations. © 2013, The International Biometric Society.

  2. Bounds on isocurvature perturbations from cosmic microwave background and large scale structure data.

    PubMed

    Crotty, Patrick; García-Bellido, Juan; Lesgourgues, Julien; Riazuelo, Alain

    2003-10-24

    We obtain very stringent bounds on the possible cold dark matter, baryon, and neutrino isocurvature contributions to the primordial fluctuations in the Universe, using recent cosmic microwave background and large scale structure data. Neglecting the possible effects of spatial curvature, tensor perturbations, and reionization, we perform a Bayesian likelihood analysis with nine free parameters, and find that the amplitude of the isocurvature component cannot be larger than about 31% for the cold dark matter mode, 91% for the baryon mode, 76% for the neutrino density mode, and 60% for the neutrino velocity mode, at 2sigma, for uncorrelated models. For correlated adiabatic and isocurvature components, the fraction could be slightly larger. However, the cross-correlation coefficient is strongly constrained, and maximally correlated/anticorrelated models are disfavored. This puts strong bounds on the curvaton model.

  3. The BANYAN-Sigma Bayesian classifier and the search for isolated planetary-mass objects

    NASA Astrophysics Data System (ADS)

    Gagné, Jonathan

    2018-01-01

    I will present new developments in the construction of a Bayesian classification tool to identify members of 22 young associations within 150 pc from partially complete kinematic data sets such as Gaia-DR1 and DR2. The new BANYAN-Sigma tool makes it possible to quickly analyze massive data sets and yields a better classification performance than all its predecessors. It will open the door to large-scale surveys to complete the stellar and substellar populations of nearby associations, which will provide deep insights in the low-mass end of the initial mass function and valuable age-calibrated targets for exoplanet surveys.I will also presents preliminary results of a search for T-type isolated planetary-mass objects in these young associations, based on BANYAN-Sigma and a cross-match between the AllWISE and 2MASS-Reject catalogs.

  4. Bayesian planet searches in radial velocity data

    NASA Astrophysics Data System (ADS)

    Gregory, Phil

    2015-08-01

    Intrinsic stellar variability caused by magnetic activity and convection has become the main limiting factor for planet searches in both transit and radial velocity (RV) data. New spectrographs are under development like ESPRESSO and EXPRES that aim to improve RV precision by a factor of approximately 100 over the current best spectrographs, HARPS and HARPS-N. This will greatly exacerbate the challenge of distinguishing planetary signals from stellar activity induced RV signals. At the same time good progress has been made in simulating stellar activity signals. At the Porto 2014 meeting, “Towards Other Earths II,” Xavier Dumusque challenged the community to a large scale blind test using the simulated RV data to understand the limitations of present solutions to deal with stellar signals and to select the best approach. My talk will focus on some of the statistical lesson learned from this challenge with an emphasis on Bayesian methodology.

  5. Reconciling multiple data sources to improve accuracy of large-scale prediction of forest disease incidence

    USGS Publications Warehouse

    Hanks, E.M.; Hooten, M.B.; Baker, F.A.

    2011-01-01

    Ecological spatial data often come from multiple sources, varying in extent and accuracy. We describe a general approach to reconciling such data sets through the use of the Bayesian hierarchical framework. This approach provides a way for the data sets to borrow strength from one another while allowing for inference on the underlying ecological process. We apply this approach to study the incidence of eastern spruce dwarf mistletoe (Arceuthobium pusillum) in Minnesota black spruce (Picea mariana). A Minnesota Department of Natural Resources operational inventory of black spruce stands in northern Minnesota found mistletoe in 11% of surveyed stands, while a small, specific-pest survey found mistletoe in 56% of the surveyed stands. We reconcile these two surveys within a Bayesian hierarchical framework and predict that 35-59% of black spruce stands in northern Minnesota are infested with dwarf mistletoe. ?? 2011 by the Ecological Society of America.

  6. Sign: large-scale gene network estimation environment for high performance computing.

    PubMed

    Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .

  7. Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression.

    PubMed

    Houpt, Joseph W; Bittner, Jennifer L

    2018-07-01

    Ideal observer analysis is a fundamental tool used widely in vision science for analyzing the efficiency with which a cognitive or perceptual system uses available information. The performance of an ideal observer provides a formal measure of the amount of information in a given experiment. The ratio of human to ideal performance is then used to compute efficiency, a construct that can be directly compared across experimental conditions while controlling for the differences due to the stimuli and/or task specific demands. In previous research using ideal observer analysis, the effects of varying experimental conditions on efficiency have been tested using ANOVAs and pairwise comparisons. In this work, we present a model that combines Bayesian estimates of psychometric functions with hierarchical logistic regression for inference about both unadjusted human performance metrics and efficiencies. Our approach improves upon the existing methods by constraining the statistical analysis using a standard model connecting stimulus intensity to human observer accuracy and by accounting for variability in the estimates of human and ideal observer performance scores. This allows for both individual and group level inferences. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees.

    PubMed

    Yang, Ziheng; Zhu, Tianqi

    2018-02-20

    The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

  9. An efficient Bayesian data-worth analysis using a multilevel Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Lu, Dan; Ricciuto, Daniel; Evans, Katherine

    2018-03-01

    Improving the understanding of subsurface systems and thus reducing prediction uncertainty requires collection of data. As the collection of subsurface data is costly, it is important that the data collection scheme is cost-effective. Design of a cost-effective data collection scheme, i.e., data-worth analysis, requires quantifying model parameter, prediction, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface hydrological model simulations using standard Monte Carlo (MC) sampling or surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose an efficient Bayesian data-worth analysis using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce computational costs using multifidelity approximations. Since the Bayesian data-worth analysis involves a great deal of expectation estimation, the cost saving of the MLMC in the assessment can be outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it for a highly heterogeneous two-phase subsurface flow simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the standard MC estimation. But compared to the standard MC, the MLMC greatly reduces the computational costs.

  10. Bayesian approach to MSD-based analysis of particle motion in live cells.

    PubMed

    Monnier, Nilah; Guo, Syuan-Ming; Mori, Masashi; He, Jun; Lénárt, Péter; Bathe, Mark

    2012-08-08

    Quantitative tracking of particle motion using live-cell imaging is a powerful approach to understanding the mechanism of transport of biological molecules, organelles, and cells. However, inferring complex stochastic motion models from single-particle trajectories in an objective manner is nontrivial due to noise from sampling limitations and biological heterogeneity. Here, we present a systematic Bayesian approach to multiple-hypothesis testing of a general set of competing motion models based on particle mean-square displacements that automatically classifies particle motion, properly accounting for sampling limitations and correlated noise while appropriately penalizing model complexity according to Occam's Razor to avoid over-fitting. We test the procedure rigorously using simulated trajectories for which the underlying physical process is known, demonstrating that it chooses the simplest physical model that explains the observed data. Further, we show that computed model probabilities provide a reliability test for the downstream biological interpretation of associated parameter values. We subsequently illustrate the broad utility of the approach by applying it to disparate biological systems including experimental particle trajectories from chromosomes, kinetochores, and membrane receptors undergoing a variety of complex motions. This automated and objective Bayesian framework easily scales to large numbers of particle trajectories, making it ideal for classifying the complex motion of large numbers of single molecules and cells from high-throughput screens, as well as single-cell-, tissue-, and organism-level studies. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  11. Recent advances in research on climate and human conflict

    NASA Astrophysics Data System (ADS)

    Hsiang, S. M.

    2014-12-01

    A rapidly growing body of empirical, quantitative research examines whether rates of human conflict can be systematically altered by climatic changes. We discuss recent advances in this field, including Bayesian meta-analyses of the effect of temperature and rainfall on current and future large-scale conflicts, the impact of climate variables on gang violence and suicides in Mexico, and probabilistic projections of personal violence and property crime in the United States under RCP scenarios. Criticisms of this research field will also be explained and addressed.

  12. Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the Americas.

    PubMed

    Llamas, Bastien; Fehren-Schmitz, Lars; Valverde, Guido; Soubrier, Julien; Mallick, Swapan; Rohland, Nadin; Nordenfelt, Susanne; Valdiosera, Cristina; Richards, Stephen M; Rohrlach, Adam; Romero, Maria Inés Barreto; Espinoza, Isabel Flores; Cagigao, Elsa Tomasto; Jiménez, Lucía Watson; Makowski, Krzysztof; Reyna, Ilán Santiago Leboreiro; Lory, Josefina Mansilla; Torrez, Julio Alejandro Ballivián; Rivera, Mario A; Burger, Richard L; Ceruti, Maria Constanza; Reinhard, Johan; Wells, R Spencer; Politis, Gustavo; Santoro, Calogero M; Standen, Vivien G; Smith, Colin; Reich, David; Ho, Simon Y W; Cooper, Alan; Haak, Wolfgang

    2016-04-01

    The exact timing, route, and process of the initial peopling of the Americas remains uncertain despite much research. Archaeological evidence indicates the presence of humans as far as southern Chile by 14.6 thousand years ago (ka), shortly after the Pleistocene ice sheets blocking access from eastern Beringia began to retreat. Genetic estimates of the timing and route of entry have been constrained by the lack of suitable calibration points and low genetic diversity of Native Americans. We sequenced 92 whole mitochondrial genomes from pre-Columbian South American skeletons dating from 8.6 to 0.5 ka, allowing a detailed, temporally calibrated reconstruction of the peopling of the Americas in a Bayesian coalescent analysis. The data suggest that a small population entered the Americas via a coastal route around 16.0 ka, following previous isolation in eastern Beringia for ~2.4 to 9 thousand years after separation from eastern Siberian populations. Following a rapid movement throughout the Americas, limited gene flow in South America resulted in a marked phylogeographic structure of populations, which persisted through time. All of the ancient mitochondrial lineages detected in this study were absent from modern data sets, suggesting a high extinction rate. To investigate this further, we applied a novel principal components multiple logistic regression test to Bayesian serial coalescent simulations. The analysis supported a scenario in which European colonization caused a substantial loss of pre-Columbian lineages.

  13. Classification of mislabelled microarrays using robust sparse logistic regression.

    PubMed

    Bootkrajang, Jakramate; Kabán, Ata

    2013-04-01

    Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.

  14. BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)

    EPA Science Inventory

    We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...

  15. Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests

    ERIC Educational Resources Information Center

    Veldkamp, Bernard P.

    2010-01-01

    Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…

  16. In Silico Syndrome Prediction for Coronary Artery Disease in Traditional Chinese Medicine

    PubMed Central

    Lu, Peng; Chen, Jianxin; Zhao, Huihui; Gao, Yibo; Luo, Liangtao; Zuo, Xiaohan; Shi, Qi; Yang, Yiping; Yi, Jianqiang; Wang, Wei

    2012-01-01

    Coronary artery disease (CAD) is the leading causes of deaths in the world. The differentiation of syndrome (ZHENG) is the criterion of diagnosis and therapeutic in TCM. Therefore, syndrome prediction in silico can be improving the performance of treatment. In this paper, we present a Bayesian network framework to construct a high-confidence syndrome predictor based on the optimum subset, that is, collected by Support Vector Machine (SVM) feature selection. Syndrome of CAD can be divided into asthenia and sthenia syndromes. According to the hierarchical characteristics of syndrome, we firstly label every case three types of syndrome (asthenia, sthenia, or both) to solve several syndromes with some patients. On basis of the three syndromes' classes, we design SVM feature selection to achieve the optimum symptom subset and compare this subset with Markov blanket feature select using ROC. Using this subset, the six predictors of CAD's syndrome are constructed by the Bayesian network technique. We also design Naïve Bayes, C4.5 Logistic, Radial basis function (RBF) network compared with Bayesian network. In a conclusion, the Bayesian network method based on the optimum symptoms shows a practical method to predict six syndromes of CAD in TCM. PMID:22567030

  17. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    NASA Astrophysics Data System (ADS)

    Frank, Philipp; Jasche, Jens; Enßlin, Torsten A.

    2016-11-01

    This work describes the implementation and application of a correlation determination method based on self organizing maps and Bayesian inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the self organizing map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian information criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide applications of our method to cosmological data. In particular, we present results of a correlation analysis between galaxy and active galactic nucleus (AGN) properties provided by the SDSS catalog with the cosmic large-scale-structure (LSS). The results indicate that the combined galaxy and LSS dataset indeed is clustered into several sub-samples of data with different average properties (for example different stellar masses or web-type classifications). The majority of data clusters appear to have a similar correlation structure between galaxy properties and the LSS. In particular we revealed a positive and linear dependency between the stellar mass, the absolute magnitude and the color of a galaxy with the corresponding cosmic density field. A remaining subset of data shows inverted correlations, which might be an artifact of non-linear redshift distortions.

  18. Defining Probability in Sex Offender Risk Assessment.

    PubMed

    Elwood, Richard W

    2016-12-01

    There is ongoing debate and confusion over using actuarial scales to predict individuals' risk of sexual recidivism. Much of the debate comes from not distinguishing Frequentist from Bayesian definitions of probability. Much of the confusion comes from applying Frequentist probability to individuals' risk. By definition, only Bayesian probability can be applied to the single case. The Bayesian concept of probability resolves most of the confusion and much of the debate in sex offender risk assessment. Although Bayesian probability is well accepted in risk assessment generally, it has not been widely used to assess the risk of sex offenders. I review the two concepts of probability and show how the Bayesian view alone provides a coherent scheme to conceptualize individuals' risk of sexual recidivism.

  19. Bioregional monitoring design and occupancy estimation for two Sierra Nevadan amphibian taxa

    EPA Science Inventory

    Land-management agencies need quantitative, statistically rigorous monitoring data, often at large spatial and temporal scales, to support resource-management decisions. Monitoring designs typically must accommodate multiple ecological, logistical, political, and economic objec...

  20. Health support for the Raid of the Seven Stones : in the footsteps of Navy physician Jules Crevaux in French Guiana.

    PubMed

    Barthes, N; Boudsocq, J-P

    2017-06-01

    In the summer of 2015, soldiers of the 3rd Foreign Infantry Regiment and civilian scientists mounted a joint expedition on foot to reconnoiter and better define the southern frontier of French Guiana with Brazil. Three doctor-nurse pairs worked in relay to provide medical support for this unprecedented 42-day, 320-km journey through a hostile and isolated environment, a mission whose success was made possible by large-scale logistic and technical prowess. The army health department, using knowledge gained from previous large-scale missions and expeditions and from its staff's local experience, provided its technical support for personnel selection, organization of the health logistics, and field support. This article describes the difficulties encountered from a medical perspective, the diseases encountered, and the final assessments of the personnel who completed this expedition.

  1. Singularity-sensitive gauge-based radar rainfall adjustment methods for urban hydrological applications

    NASA Astrophysics Data System (ADS)

    Wang, L.-P.; Ochoa-Rodríguez, S.; Onof, C.; Willems, P.

    2015-09-01

    Gauge-based radar rainfall adjustment techniques have been widely used to improve the applicability of radar rainfall estimates to large-scale hydrological modelling. However, their use for urban hydrological applications is limited as they were mostly developed based upon Gaussian approximations and therefore tend to smooth off so-called "singularities" (features of a non-Gaussian field) that can be observed in the fine-scale rainfall structure. Overlooking the singularities could be critical, given that their distribution is highly consistent with that of local extreme magnitudes. This deficiency may cause large errors in the subsequent urban hydrological modelling. To address this limitation and improve the applicability of adjustment techniques at urban scales, a method is proposed herein which incorporates a local singularity analysis into existing adjustment techniques and allows the preservation of the singularity structures throughout the adjustment process. In this paper the proposed singularity analysis is incorporated into the Bayesian merging technique and the performance of the resulting singularity-sensitive method is compared with that of the original Bayesian (non singularity-sensitive) technique and the commonly used mean field bias adjustment. This test is conducted using as case study four storm events observed in the Portobello catchment (53 km2) (Edinburgh, UK) during 2011 and for which radar estimates, dense rain gauge and sewer flow records, as well as a recently calibrated urban drainage model were available. The results suggest that, in general, the proposed singularity-sensitive method can effectively preserve the non-normality in local rainfall structure, while retaining the ability of the original adjustment techniques to generate nearly unbiased estimates. Moreover, the ability of the singularity-sensitive technique to preserve the non-normality in rainfall estimates often leads to better reproduction of the urban drainage system's dynamics, particularly of peak runoff flows.

  2. A multiscale Bayesian data integration approach for mapping air dose rates around the Fukushima Daiichi Nuclear Power Plant.

    PubMed

    Wainwright, Haruko M; Seki, Akiyuki; Chen, Jinsong; Saito, Kimiaki

    2017-02-01

    This paper presents a multiscale data integration method to estimate the spatial distribution of air dose rates in the regional scale around the Fukushima Daiichi Nuclear Power Plant. We integrate various types of datasets, such as ground-based walk and car surveys, and airborne surveys, all of which have different scales, resolutions, spatial coverage, and accuracy. This method is based on geostatistics to represent spatial heterogeneous structures, and also on Bayesian hierarchical models to integrate multiscale, multi-type datasets in a consistent manner. The Bayesian method allows us to quantify the uncertainty in the estimates, and to provide the confidence intervals that are critical for robust decision-making. Although this approach is primarily data-driven, it has great flexibility to include mechanistic models for representing radiation transport or other complex correlations. We demonstrate our approach using three types of datasets collected at the same time over Fukushima City in Japan: (1) coarse-resolution airborne surveys covering the entire area, (2) car surveys along major roads, and (3) walk surveys in multiple neighborhoods. Results show that the method can successfully integrate three types of datasets and create an integrated map (including the confidence intervals) of air dose rates over the domain in high resolution. Moreover, this study provides us with various insights into the characteristics of each dataset, as well as radiocaesium distribution. In particular, the urban areas show high heterogeneity in the contaminant distribution due to human activities as well as large discrepancy among different surveys due to such heterogeneity. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. A Development of Nonstationary Regional Frequency Analysis Model with Large-scale Climate Information: Its Application to Korean Watershed

    NASA Astrophysics Data System (ADS)

    Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo

    2015-04-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  4. Mapping brucellosis increases relative to elk density using hierarchical Bayesian models

    USGS Publications Warehouse

    Cross, Paul C.; Heisey, Dennis M.; Scurlock, Brandon M.; Edwards, William H.; Brennan, Angela; Ebinger, Michael R.

    2010-01-01

    The relationship between host density and parasite transmission is central to the effectiveness of many disease management strategies. Few studies, however, have empirically estimated this relationship particularly in large mammals. We applied hierarchical Bayesian methods to a 19-year dataset of over 6400 brucellosis tests of adult female elk (Cervus elaphus) in northwestern Wyoming. Management captures that occurred from January to March were over two times more likely to be seropositive than hunted elk that were killed in September to December, while accounting for site and year effects. Areas with supplemental feeding grounds for elk had higher seroprevalence in 1991 than other regions, but by 2009 many areas distant from the feeding grounds were of comparable seroprevalence. The increases in brucellosis seroprevalence were correlated with elk densities at the elk management unit, or hunt area, scale (mean 2070 km2; range = [95–10237]). The data, however, could not differentiate among linear and non-linear effects of host density. Therefore, control efforts that focus on reducing elk densities at a broad spatial scale were only weakly supported. Additional research on how a few, large groups within a region may be driving disease dynamics is needed for more targeted and effective management interventions. Brucellosis appears to be expanding its range into new regions and elk populations, which is likely to further complicate the United States brucellosis eradication program. This study is an example of how the dynamics of host populations can affect their ability to serve as disease reservoirs.

  5. Impact of the biorefinery size on the logistics of corn stover supply – A scenario analysis

    DOE PAGES

    Wang, Yu; Ebadian, Mahmood; Sokhansanj, Shahab; ...

    2017-03-23

    In this study, three scenarios are considered to quantify the impact of the biorefinery size on the required biomass logistical resources. The biorefinery scenarios include small scale (175 dt/day)-SS, medium scale (520 dt/day)-MS and large scale (860 dt/day)-LS. These scenarios are compared against the following logistical resources (1) harvest area and contracted fields, (2) logistics equipment fleet and the workforce to run this fleet and (3) intermediate storage sites and their biomass inventory levels. To this end, the IBSAL-MC simulation model is applied to a corn stover logistics system in Southwestern Ontario. The obtained results show (1) the harvest areamore » and the number of contracted fields increase by 65% and 78% from the SS scenario to the MS and LS scenarios, respectively, (2) the average biomass delivered costs are estimated to be $82.09, $87.49 and $93.75/dry tonne in the SS, MS and LS scenarios. The increase in the capital costs to develop a dedicated logistics equipment fleet are estimated to be far greater than the increase in the delivered costs as the size of the biorefinery increases. The upfront capital costs are estimated to be 6.72 dollars, 21.83 and 35.51 million in these scenarios. To run the logistics equipment fleet efficiently, 37, 136 and 235 well-trained operators are required in the SS, MS ad LS scenarios, respectively, and (3) the inventory level and the land requirement for storage in the MS and LS scenarios are estimated to be 225% and 425% greater than those of the SS scenario. The sensitivity analysis indicates that the logistical resources are highly sensitive to corn yield and farm participation rate. Overall, this study shows the importance of considering the size of the required logistical resources and the associated level of logistical complexity in evaluating the economic viability of a biorefinery project.« less

  6. Impact of the biorefinery size on the logistics of corn stover supply – A scenario analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Yu; Ebadian, Mahmood; Sokhansanj, Shahab

    In this study, three scenarios are considered to quantify the impact of the biorefinery size on the required biomass logistical resources. The biorefinery scenarios include small scale (175 dt/day)-SS, medium scale (520 dt/day)-MS and large scale (860 dt/day)-LS. These scenarios are compared against the following logistical resources (1) harvest area and contracted fields, (2) logistics equipment fleet and the workforce to run this fleet and (3) intermediate storage sites and their biomass inventory levels. To this end, the IBSAL-MC simulation model is applied to a corn stover logistics system in Southwestern Ontario. The obtained results show (1) the harvest areamore » and the number of contracted fields increase by 65% and 78% from the SS scenario to the MS and LS scenarios, respectively, (2) the average biomass delivered costs are estimated to be $82.09, $87.49 and $93.75/dry tonne in the SS, MS and LS scenarios. The increase in the capital costs to develop a dedicated logistics equipment fleet are estimated to be far greater than the increase in the delivered costs as the size of the biorefinery increases. The upfront capital costs are estimated to be 6.72 dollars, 21.83 and 35.51 million in these scenarios. To run the logistics equipment fleet efficiently, 37, 136 and 235 well-trained operators are required in the SS, MS ad LS scenarios, respectively, and (3) the inventory level and the land requirement for storage in the MS and LS scenarios are estimated to be 225% and 425% greater than those of the SS scenario. The sensitivity analysis indicates that the logistical resources are highly sensitive to corn yield and farm participation rate. Overall, this study shows the importance of considering the size of the required logistical resources and the associated level of logistical complexity in evaluating the economic viability of a biorefinery project.« less

  7. A FAST BAYESIAN METHOD FOR UPDATING AND FORECASTING HOURLY OZONE LEVELS

    EPA Science Inventory

    A Bayesian hierarchical space-time model is proposed by combining information from real-time ambient AIRNow air monitoring data, and output from a computer simulation model known as the Community Multi-scale Air Quality (Eta-CMAQ) forecast model. A model validation analysis shows...

  8. Taming Many-Parameter BSM Models with Bayesian Neural Networks

    NASA Astrophysics Data System (ADS)

    Kuchera, M. P.; Karbo, A.; Prosper, H. B.; Sanchez, A.; Taylor, J. Z.

    2017-09-01

    The search for physics Beyond the Standard Model (BSM) is a major focus of large-scale high energy physics experiments. One method is to look for specific deviations from the Standard Model that are predicted by BSM models. In cases where the model has a large number of free parameters, standard search methods become intractable due to computation time. This talk presents results using Bayesian Neural Networks, a supervised machine learning method, to enable the study of higher-dimensional models. The popular phenomenological Minimal Supersymmetric Standard Model was studied as an example of the feasibility and usefulness of this method. Graphics Processing Units (GPUs) are used to expedite the calculations. Cross-section predictions for 13 TeV proton collisions will be presented. My participation in the Conference Experience for Undergraduates (CEU) in 2004-2006 exposed me to the national and global significance of cutting-edge research. At the 2005 CEU, I presented work from the previous summer's SULI internship at Lawrence Berkeley Laboratory, where I learned to program while working on the Majorana Project. That work inspired me to follow a similar research path, which led me to my current work on computational methods applied to BSM physics.

  9. Supporting Regularized Logistic Regression Privately and Efficiently.

    PubMed

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  10. Supporting Regularized Logistic Regression Privately and Efficiently

    PubMed Central

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738

  11. Developing and Testing a Model to Predict Outcomes of Organizational Change

    PubMed Central

    Gustafson, David H; Sainfort, François; Eichler, Mary; Adams, Laura; Bisognano, Maureen; Steudel, Harold

    2003-01-01

    Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects. Data Sources Experts' subjective assessment data for model development and independent retrospective data on 221 healthcare improvement projects in the United States, Canada, and the Netherlands collected between 1996 and 2000 for validation. Methods A panel of theoretical and practical experts and literature in organizational change were used to identify factors predicting the outcome of improvement efforts. A Bayesian model was developed to estimate probability of successful change using subjective estimates of likelihood ratios and prior odds elicited from the panel of experts. A subsequent retrospective empirical analysis of change efforts in 198 health care organizations was performed to validate the model. Logistic regression and ROC analysis were used to evaluate the model's performance using three alternative definitions of success. Data Collection For the model development, experts' subjective assessments were elicited using an integrative group process. For the validation study, a staff person intimately involved in each improvement project responded to a written survey asking questions about model factors and project outcomes. Results Logistic regression chi-square statistics and areas under the ROC curve demonstrated a high level of model performance in predicting success. Chi-square statistics were significant at the 0.001 level and areas under the ROC curve were greater than 0.84. Conclusions A subjective Bayesian model was effective in predicting the outcome of actual improvement projects. Additional prospective evaluations as well as testing the impact of this model as an intervention are warranted. PMID:12785571

  12. Homogenization techniques for population dynamics in strongly heterogeneous landscapes.

    PubMed

    Yurk, Brian P; Cobbold, Christina A

    2018-12-01

    An important problem in spatial ecology is to understand how population-scale patterns emerge from individual-level birth, death, and movement processes. These processes, which depend on local landscape characteristics, vary spatially and may exhibit sharp transitions through behavioural responses to habitat edges, leading to discontinuous population densities. Such systems can be modelled using reaction-diffusion equations with interface conditions that capture local behaviour at patch boundaries. In this work we develop a novel homogenization technique to approximate the large-scale dynamics of the system. We illustrate our approach, which also generalizes to multiple species, with an example of logistic growth within a periodic environment. We find that population persistence and the large-scale population carrying capacity is influenced by patch residence times that depend on patch preference, as well as movement rates in adjacent patches. The forms of the homogenized coefficients yield key theoretical insights into how large-scale dynamics arise from the small-scale features.

  13. Spatio-temporal Eigenvector Filtering: Application on Bioenergy Crop Impacts

    NASA Astrophysics Data System (ADS)

    Wang, M.; Kamarianakis, Y.; Georgescu, M.

    2017-12-01

    A suite of 10-year ensemble-based simulations was conducted to investigate the hydroclimatic impacts due to large-scale deployment of perennial bioenergy crops across the continental United States. Given the large size of the simulated dataset (about 60Tb), traditional hierarchical spatio-temporal statistical modelling cannot be implemented for the evaluation of physics parameterizations and biofuel impacts. In this work, we propose a filtering algorithm that takes into account the spatio-temporal autocorrelation structure of the data while avoiding spatial confounding. This method is used to quantify the robustness of simulated hydroclimatic impacts associated with bioenergy crops to alternative physics parameterizations and observational datasets. Results are evaluated against those obtained from three alternative Bayesian spatio-temporal specifications.

  14. Process model comparison and transferability across bioreactor scales and modes of operation for a mammalian cell bioprocess.

    PubMed

    Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian

    2013-01-01

    A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).

  15. Exploring unobserved heterogeneity in bicyclists' red-light running behaviors at different crossing facilities.

    PubMed

    Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng

    2018-06-01

    Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Crowdsourcing for large-scale mosquito (Diptera: Culicidae) sampling

    USDA-ARS?s Scientific Manuscript database

    Sampling a cosmopolitan mosquito (Diptera: Culicidae) species throughout its range is logistically challenging and extremely resource intensive. Mosquito control programmes and regional networks operate at the local level and often conduct sampling activities across much of North America. A method f...

  17. Logistics cost analysis of rice residues for second generation bioenergy production in Ghana.

    PubMed

    Vijay Ramamurthi, Pooja; Cristina Fernandes, Maria; Sieverts Nielsen, Per; Pedro Nunes, Clemente

    2014-12-01

    This study explores the techno-economic potential of rice residues as a bioenergy resource to meet Ghana's energy demands. Major rice growing regions of Ghana have 70-90% of residues available for bioenergy production. To ensure cost-effective biomass logistics, a thorough cost analysis was made for two bioenergy routes. Logistics costs for a 5 MWe straw combustion plant were 39.01, 47.52 and 47.89 USD/t for Northern, Ashanti and Volta regions respectively. Logistics cost for a 0.25 MWe husk gasification plant (with roundtrip distance 10 km) was 2.64 USD/t in all regions. Capital cost (66-72%) contributes significantly to total logistics costs of straw, however for husk logistics, staff (40%) and operation and maintenance costs (46%) dominate. Baling is the major processing logistic cost for straw, contributing to 46-48% of total costs. Scale of straw unit does not have a large impact on logistic costs. Transport distance of husks has considerable impact on logistic costs. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Accuracy and Variability of Item Parameter Estimates from Marginal Maximum a Posteriori Estimation and Bayesian Inference via Gibbs Samplers

    ERIC Educational Resources Information Center

    Wu, Yi-Fang

    2015-01-01

    Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…

  19. Impact of trucking network flow on preferred biorefinery locations in the southern United States

    Treesearch

    Timothy M. Young; Lee D. Han; James H. Perdue; Stephanie R. Hargrove; Frank M. Guess; Xia Huang; Chung-Hao Chen

    2017-01-01

    The impact of the trucking transportation network flow was modeled for the southern United States. The study addresses a gap in existing research by applying a Bayesian logistic regression and Geographic Information System (GIS) geospatial analysis to predict biorefinery site locations. A one-way trucking cost assuming a 128.8 km (80-mile) haul distance was estimated...

  20. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan

    PubMed Central

    McLeod, Lianne; Bharadwaj, Lalita; Epp, Tasha; Waldner, Cheryl L.

    2017-01-01

    Groundwater drinking water supply surveillance data were accessed to summarize water quality delivered as public and private water supplies in southern Saskatchewan as part of an exposure assessment for epidemiologic analyses of associations between water quality and type 2 diabetes or cardiovascular disease. Arsenic in drinking water has been linked to a variety of chronic diseases and previous studies have identified multiple wells with arsenic above the drinking water standard of 0.01 mg/L; therefore, arsenic concentrations were of specific interest. Principal components analysis was applied to obtain principal component (PC) scores to summarize mixtures of correlated parameters identified as health standards and those identified as aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objective. Ordinary, universal, and empirical Bayesian kriging were used to interpolate arsenic concentrations and PC scores in southern Saskatchewan, and the results were compared. Empirical Bayesian kriging performed best across all analyses, based on having the greatest number of variables for which the root mean square error was lowest. While all of the kriging methods appeared to underestimate high values of arsenic and PC scores, empirical Bayesian kriging was chosen to summarize large scale geographic trends in groundwater-sourced drinking water quality and assess exposure to mixtures of trace metals and ions. PMID:28914824

  1. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan.

    PubMed

    McLeod, Lianne; Bharadwaj, Lalita; Epp, Tasha; Waldner, Cheryl L

    2017-09-15

    Groundwater drinking water supply surveillance data were accessed to summarize water quality delivered as public and private water supplies in southern Saskatchewan as part of an exposure assessment for epidemiologic analyses of associations between water quality and type 2 diabetes or cardiovascular disease. Arsenic in drinking water has been linked to a variety of chronic diseases and previous studies have identified multiple wells with arsenic above the drinking water standard of 0.01 mg/L; therefore, arsenic concentrations were of specific interest. Principal components analysis was applied to obtain principal component (PC) scores to summarize mixtures of correlated parameters identified as health standards and those identified as aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objective. Ordinary, universal, and empirical Bayesian kriging were used to interpolate arsenic concentrations and PC scores in southern Saskatchewan, and the results were compared. Empirical Bayesian kriging performed best across all analyses, based on having the greatest number of variables for which the root mean square error was lowest. While all of the kriging methods appeared to underestimate high values of arsenic and PC scores, empirical Bayesian kriging was chosen to summarize large scale geographic trends in groundwater-sourced drinking water quality and assess exposure to mixtures of trace metals and ions.

  2. Bayesian data analysis in population ecology: motivations, methods, and benefits

    USGS Publications Warehouse

    Dorazio, Robert

    2016-01-01

    During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.

  3. Reducing uncertainty in Climate Response Time Scale by Bayesian Analysis of the 8.2 ka event

    NASA Astrophysics Data System (ADS)

    Lorenz, A.; Held, H.; Bauer, E.; Schneider von Deimling, T.

    2009-04-01

    We analyze the possibility of uncertainty reduction in Climate Response Time Scale by utilizing Greenland ice-core data that contain the 8.2 ka event within a Bayesian model-data intercomparison with the Earth system model of intermediate complexity, CLIMBER-2.3. Within a stochastic version of the model it has been possible to mimic the 8.2 ka event within a plausible experimental setting and with relatively good accuracy considering the timing of the event in comparison to other modeling exercises [1]. The simulation of the centennial cold event is effectively determined by the oceanic cooling rate which depends largely on the ocean diffusivity described by diffusion coefficients of relatively wide uncertainty ranges. The idea now is to discriminate between the different values of diffusivities according to their likelihood to rightly represent the duration of the 8.2 ka event and thus to exploit the paleo data to constrain uncertainty in model parameters in analogue to [2]. Implementing this inverse Bayesian Analysis with this model the technical difficulty arises to establish the related likelihood numerically in addition to the uncertain model parameters: While mainstream uncertainty analyses can assume a quasi-Gaussian shape of likelihood, with weather fluctuating around a long term mean, the 8.2 ka event as a highly nonlinear effect precludes such an a priori assumption. As a result of this study [3] the Bayesian Analysis showed a reduction of uncertainty in vertical ocean diffusivity parameters of factor 2 compared to prior knowledge. This learning effect on the model parameters is propagated to other model outputs of interest; e.g. the inverse ocean heat capacity, which is important for the dominant time scale of climate response to anthropogenic forcing which, in combination with climate sensitivity, strongly influences the climate systems reaction for the near- and medium-term future. 1 References [1] E. Bauer, A. Ganopolski, M. Montoya: Simulation of the cold climate event 8200 years ago by meltwater outburst from lake Agassiz. Paleoceanography 19:PA3014, (2004) [2] T. Schneider von Deimling, H. Held, A. Ganopolski, S. Rahmstorf, Climate sensitivity estimated from ensemble simulations of glacial climates, Climate Dynamics 27, 149-163, DOI 10.1007/s00382-006-0126-8 (2006). [3] A. Lorenz, Diploma Thesis, U Potsdam (2007).

  4. Prediction of Large Vessel Occlusions in Acute Stroke: National Institute of Health Stroke Scale Is Hard to Beat.

    PubMed

    Vanacker, Peter; Heldner, Mirjam R; Amiguet, Michael; Faouzi, Mohamed; Cras, Patrick; Ntaios, George; Arnold, Marcel; Mattle, Heinrich P; Gralla, Jan; Fischer, Urs; Michel, Patrik

    2016-06-01

    Endovascular treatment for acute ischemic stroke with a large vessel occlusion was recently shown to be effective. We aimed to develop a score capable of predicting large vessel occlusion eligible for endovascular treatment in the early hospital management. Retrospective, cohort study. Two tertiary, Swiss stroke centers. Consecutive acute ischemic stroke patients (1,645 patients; Acute STroke Registry and Analysis of Lausanne registry), who had CT angiography within 6 and 12 hours of symptom onset, were categorized according to the occlusion site. Demographic and clinical information was used in logistic regression analysis to derive predictors of large vessel occlusion (defined as intracranial carotid, basilar, and M1 segment of middle cerebral artery occlusions). Based on logistic regression coefficients, an integer score was created and validated internally and externally (848 patients; Bernese Stroke Registry). None. Large vessel occlusions were present in 316 patients (21%) in the derivation and 566 (28%) in the external validation cohort. Five predictors added significantly to the score: National Institute of Health Stroke Scale at admission, hemineglect, female sex, atrial fibrillation, and no history of stroke and prestroke handicap (modified Rankin Scale score, < 2). Diagnostic accuracy in internal and external validation cohorts was excellent (area under the receiver operating characteristic curve, 0.84 both). The score performed slightly better than National Institute of Health Stroke Scale alone regarding prediction error (Wilcoxon signed rank test, p < 0.001) and regarding discriminatory power in derivation and pooled cohorts (area under the receiver operating characteristic curve, 0.81 vs 0.80; DeLong test, p = 0.02). Our score accurately predicts the presence of emergent large vessel occlusions, which are eligible for endovascular treatment. However, incorporation of additional demographic and historical information available on hospital arrival provides minimal incremental predictive value compared with the National Institute of Health Stroke Scale alone.

  5. Partition of genetic trends by origin in Landrace and Large-White pigs.

    PubMed

    Škorput, D; Gorjanc, G; Kasap, A; Luković, Z

    2015-10-01

    The objective of this study was to analyse the effectiveness of genetic improvement via domestic selection and import for backfat thickness and time on test in a conventional pig breeding programme for Landrace (L) and Large-White (LW) breeds. Phenotype data was available for 25 553 L and 10 432 LW pigs born between 2002 and 2012 from four large-scale farms and 72 family farms. Pedigree information indicated whether each animal was born and registered within the domestic breeding programme or has been imported. This information was used for defining the genetic groups of unknown parents in a pedigree and the partitioning analysis. Breeding values were estimated using a Bayesian analysis of an animal model with and without genetic groups. Such analysis enabled full Bayesian inference of the genetic trends and their partitioning by the origin of germplasm. Estimates of genetic group indicated that imported germplasm was overall better than domestic and substantial changes in estimates of breeding values was observed when genetic group were fitted. The estimated genetic trends in L were favourable and significantly different from zero by the end of the analysed period. Overall, the genetic trends in LW were not different from zero. The relative contribution of imported germplasm to genetic trends was large, especially towards the end of analysed period with 78% and 67% in L and from 50% to 67% in LW. The analyses suggest that domestic breeding activities and sources of imported animals need to be re-evaluated, in particular in LW breed.

  6. Correlation Between Hierarchical Bayesian and Aerosol ...

    EPA Pesticide Factsheets

    Tools to estimate PM2.5 mass have expanded in recent years, and now include: 1) stationary monitor readings, 2) Community Multi-Scale Air Quality (CMAQ) model estimates, 3) Hierarchical Bayesian (HB) estimates from combined stationary monitor readings and CMAQ model output; and, 4) calibrated Aerosol Optical Depth (AOD) readings from two Moderate Resolution Imaging Spetroradiometer (MODIS) units on National Aeronautics and Space Administration’s (NASA) Terra and Aqua satellites. Case-crossover design and conditional logistic regression were used to determine concentration response (CR) functions for three different PM2.5 levels on asthma emergency department (ED) visits and acute myocardial infarction (MI) inpatient hospitalizations in ninety-nine, 12 km2 grids in Baltimore, MD (2005 data). HB analyses for asthma ED visits produced significant results at 3-day lags for the main effect (OR=1.002, 95% CI=1.000-1.005), and two effect modifiers for females (OR=1.003, 95% CI=1.000-1.006), and non-Caucasian/non-African American persons (OR=1.010, 95% CI=1.001-1.019). HB analyses for acute MI inpatient hospitalizations also consistently produced a significant outcome for persons of other race (OR=1.031, 95% CI=1.006-1.056). Correlation coefficients computed between stationary monitor and satellite AOD PM2.5 values were significant for both asthma (rxy=0.944) and acute MI (rxy=0.940). Both monitor and AOD PM2.5 values were higher in February and June through Aug

  7. Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the Americas

    PubMed Central

    Llamas, Bastien; Fehren-Schmitz, Lars; Valverde, Guido; Soubrier, Julien; Mallick, Swapan; Rohland, Nadin; Nordenfelt, Susanne; Valdiosera, Cristina; Richards, Stephen M.; Rohrlach, Adam; Romero, Maria Inés Barreto; Espinoza, Isabel Flores; Cagigao, Elsa Tomasto; Jiménez, Lucía Watson; Makowski, Krzysztof; Reyna, Ilán Santiago Leboreiro; Lory, Josefina Mansilla; Torrez, Julio Alejandro Ballivián; Rivera, Mario A.; Burger, Richard L.; Ceruti, Maria Constanza; Reinhard, Johan; Wells, R. Spencer; Politis, Gustavo; Santoro, Calogero M.; Standen, Vivien G.; Smith, Colin; Reich, David; Ho, Simon Y. W.; Cooper, Alan; Haak, Wolfgang

    2016-01-01

    The exact timing, route, and process of the initial peopling of the Americas remains uncertain despite much research. Archaeological evidence indicates the presence of humans as far as southern Chile by 14.6 thousand years ago (ka), shortly after the Pleistocene ice sheets blocking access from eastern Beringia began to retreat. Genetic estimates of the timing and route of entry have been constrained by the lack of suitable calibration points and low genetic diversity of Native Americans. We sequenced 92 whole mitochondrial genomes from pre-Columbian South American skeletons dating from 8.6 to 0.5 ka, allowing a detailed, temporally calibrated reconstruction of the peopling of the Americas in a Bayesian coalescent analysis. The data suggest that a small population entered the Americas via a coastal route around 16.0 ka, following previous isolation in eastern Beringia for ~2.4 to 9 thousand years after separation from eastern Siberian populations. Following a rapid movement throughout the Americas, limited gene flow in South America resulted in a marked phylogeographic structure of populations, which persisted through time. All of the ancient mitochondrial lineages detected in this study were absent from modern data sets, suggesting a high extinction rate. To investigate this further, we applied a novel principal components multiple logistic regression test to Bayesian serial coalescent simulations. The analysis supported a scenario in which European colonization caused a substantial loss of pre-Columbian lineages. PMID:27051878

  8. Isotropy of low redshift type Ia supernovae: A Bayesian analysis

    NASA Astrophysics Data System (ADS)

    Andrade, U.; Bengaly, C. A. P.; Alcaniz, J. S.; Santos, B.

    2018-04-01

    The standard cosmology strongly relies upon the cosmological principle, which consists on the hypotheses of large scale isotropy and homogeneity of the Universe. Testing these assumptions is, therefore, crucial to determining if there are deviations from the standard cosmological paradigm. In this paper, we use the latest type Ia supernova compilations, namely JLA and Union2.1 to test the cosmological isotropy at low redshift ranges (z <0.1 ). This is performed through a Bayesian selection analysis, in which we compare the standard, isotropic model, with another one including a dipole correction due to peculiar velocities. The full covariance matrix of SN distance uncertainties are taken into account. We find that the JLA sample favors the standard model, whilst the Union2.1 results are inconclusive, yet the constraints from both compilations are in agreement with previous analyses. We conclude that there is no evidence for a dipole anisotropy from nearby supernova compilations, albeit this test should be greatly improved with the much-improved data sets from upcoming cosmological surveys.

  9. Spatio-temporal Bayesian model selection for disease mapping

    PubMed Central

    Carroll, R; Lawson, AB; Faes, C; Kirby, RS; Aregay, M; Watjou, K

    2016-01-01

    Spatio-temporal analysis of small area health data often involves choosing a fixed set of predictors prior to the final model fit. In this paper, we propose a spatio-temporal approach of Bayesian model selection to implement model selection for certain areas of the study region as well as certain years in the study time line. Here, we examine the usefulness of this approach by way of a large-scale simulation study accompanied by a case study. Our results suggest that a special case of the model selection methods, a mixture model allowing a weight parameter to indicate if the appropriate linear predictor is spatial, spatio-temporal, or a mixture of the two, offers the best option to fitting these spatio-temporal models. In addition, the case study illustrates the effectiveness of this mixture model within the model selection setting by easily accommodating lifestyle, socio-economic, and physical environmental variables to select a predominantly spatio-temporal linear predictor. PMID:28070156

  10. Correlation Between Hierarchical Bayesian and Aerosol Optical Depth PM2.5 Data and Respiratory-Cardiovascular Chronic Diseases

    EPA Science Inventory

    Tools to estimate PM2.5 mass have expanded in recent years, and now include: 1) stationary monitor readings, 2) Community Multi-Scale Air Quality (CMAQ) model estimates, 3) Hierarchical Bayesian (HB) estimates from combined stationary monitor readings and CMAQ model output; and, ...

  11. Coupled Land-Atmosphere Dynamics Govern Long Duration Floods: A Pilot Study in Missouri River Basin Using a Bayesian Hierarchical Model

    NASA Astrophysics Data System (ADS)

    Najibi, N.; Lu, M.; Devineni, N.

    2017-12-01

    Long duration floods cause substantial damages and prolonged interruptions to water resource facilities and critical infrastructure. We present a novel generalized statistical and physical based model for flood duration with a deeper understanding of dynamically coupled nexus of the land surface wetness, effective atmospheric circulation and moisture transport/release. We applied the model on large reservoirs in the Missouri River Basin. The results indicate that the flood duration is not only a function of available moisture in the air, but also the antecedent condition of the blocking system of atmospheric pressure, resulting in enhanced moisture convergence, as well as the effectiveness of moisture condensation process leading to release. Quantifying these dynamics with a two-layer climate informed Bayesian multilevel model, we explain more than 80% variations in flood duration. The model considers the complex interaction between moisture transport, synoptic-to-large-scale atmospheric circulation pattern, and the antecedent wetness condition in the basin. Our findings suggest that synergy between a large low-pressure blocking system and a higher rate of divergent wind often triggers a long duration flood, and the prerequisite for moisture supply to trigger such event is moderate, which is more associated with magnitude than duration. In turn, this condition causes an extremely long duration flood if the surface wetness rate advancing to the flood event was already increased.

  12. Assessing Agreement between Multiple Raters with Missing Rating Information, Applied to Breast Cancer Tumour Grading

    PubMed Central

    Ellis, Ian O.; Green, Andrew R.; Hanka, Rudolf

    2008-01-01

    Background We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples. Methodology/Principal Findings We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples. Conclusions/Significance Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in clinical studies. PMID:18698346

  13. Combined effects of smoking and HPV16 in oropharyngeal cancer

    PubMed Central

    Anantharaman, Devasena; Muller, David C; Lagiou, Pagona; Ahrens, Wolfgang; Holcátová, Ivana; Merletti, Franco; Kjærheim, Kristina; Polesel, Jerry; Simonato, Lorenzo; Canova, Cristina; Castellsague, Xavier; Macfarlane, Tatiana V; Znaor, Ariana; Thomson, Peter; Robinson, Max; Conway, David I; Healy, Claire M; Tjønneland, Anne; Westin, Ulla; Ekström, Johanna; Chang-Claude, Jenny; Kaaks, Rudolf; Overvad, Kim; Drogan, Dagmar; Hallmans, Göran; Laurell, Göran; Bueno-de-Mesquita, HB; Peeters, Petra H; Agudo, Antonio; Larrañaga, Nerea; Travis, Ruth C; Palli, Domenico; Barricarte, Aurelio; Trichopoulou, Antonia; George, Saitakis; Trichopoulos, Dimitrios; Quirós, J Ramón; Grioni, Sara; Sacerdote, Carlotta; Navarro, Carmen; Sánchez, María-José; Tumino, Rosario; Severi, Gianluca; Boutron-Ruault, Marie-Christine; Clavel-Chapelon, Francoise; Panico, Salvatore; Weiderpass, Elisabete; Lund, Eiliv; Gram, Inger T; Riboli, Elio; Pawlita, Michael; Waterboer, Tim; Kreimer, Aimée R; Johansson, Mattias; Brennan, Paul

    2016-01-01

    Abstract Background: Although smoking and HPV infection are recognized as important risk factors for oropharyngeal cancer, how their joint exposure impacts on oropharyngeal cancer risk is unclear. Specifically, whether smoking confers any additional risk to HPV-positive oropharyngeal cancer is not understood. Methods: Using HPV serology as a marker of HPV-related cancer, we examined the interaction between smoking and HPV16 in 459 oropharyngeal (and 1445 oral cavity and laryngeal) cancer patients and 3024 control participants from two large European multi-centre studies. Odds ratios and credible intervals [CrI], adjusted for potential confounders, were estimated using Bayesian logistic regression. Results: Both smoking [odds ratio (OR [CrI]: 6.82 [4.52, 10.29]) and HPV seropositivity (OR [CrI]: 235.69 [99.95, 555.74]) were independently associated with oropharyngeal cancer. The joint association of smoking and HPV seropositivity was consistent with that expected on the additive scale (synergy index [CrI]: 1.32 [0.51, 3.45]), suggesting they act as independent risk factors for oropharyngeal cancer. Conclusions: Smoking was consistently associated with increase in oropharyngeal cancer risk in models stratified by HPV16 seropositivity. In addition, we report that the prevalence of oropharyngeal cancer increases with smoking for both HPV16-positive and HPV16-negative persons. The impact of smoking on HPV16-positive oropharyngeal cancer highlights the continued need for smoking cessation programmes for primary prevention of head and neck cancer. PMID:27197530

  14. Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.

    PubMed

    Vogt, Martin; Bajorath, Jürgen

    2008-01-01

    Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.

  15. Co-endemicity of Pulmonary Tuberculosis and Intestinal Helminth Infection in the People’s Republic of China

    PubMed Central

    Li, Xin-Xu; Ren, Zhou-Peng; Wang, Li-Xia; Zhang, Hui; Jiang, Shi-Wen; Chen, Jia-Xu; Wang, Jin-Feng; Zhou, Xiao-Nong

    2016-01-01

    Both pulmonary tuberculosis (PTB) and intestinal helminth infection (IHI) affect millions of individuals every year in China. However, the national-scale estimation of prevalence predictors and prevalence maps for these diseases, as well as co-endemic relative risk (RR) maps of both diseases’ prevalence are not well developed. There are co-endemic, high prevalence areas of both diseases, whose delimitation is essential for devising effective control strategies. Bayesian geostatistical logistic regression models including socio-economic, climatic, geographical and environmental predictors were fitted separately for active PTB and IHI based on data from the national surveys for PTB and major human parasitic diseases that were completed in 2010 and 2004, respectively. Prevalence maps and co-endemic RR maps were constructed for both diseases by means of Bayesian Kriging model and Bayesian shared component model capable of appraising the fraction of variance of spatial RRs shared by both diseases, and those specific for each one, under an assumption that there are unobserved covariates common to both diseases. Our results indicate that gross domestic product (GDP) per capita had a negative association, while rural regions, the arid and polar zones and elevation had positive association with active PTB prevalence; for the IHI prevalence, GDP per capita and distance to water bodies had a negative association, the equatorial and warm zones and the normalized difference vegetation index had a positive association. Moderate to high prevalence of active PTB and low prevalence of IHI were predicted in western regions, low to moderate prevalence of active PTB and low prevalence of IHI were predicted in north-central regions and the southeast coastal regions, and moderate to high prevalence of active PTB and high prevalence of IHI were predicted in the south-western regions. Thus, co-endemic areas of active PTB and IHI were located in the south-western regions of China, which might be determined by socio-economic factors, such as GDP per capita. PMID:27088504

  16. Bayesian modeling of flexible cognitive control

    PubMed Central

    Jiang, Jiefeng; Heller, Katherine; Egner, Tobias

    2014-01-01

    “Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218

  17. Learning and Risk Exposure in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Moore, F.

    2015-12-01

    Climate change is a gradual process most apparent over long time-scales and large spatial scales, but it is experienced by those affected as changes in local weather. Climate change will gradually push the weather people experience outside the bounds of historic norms, resulting in unprecedented and extreme weather events. However, people do have the ability to learn about and respond to a changing climate. Therefore, connecting the weather people experience with their perceptions of climate change requires understanding how people infer the current state of the climate given their observations of weather. This learning process constitutes a first-order constraint on the rate of adaptation and is an important determinant of the dynamic adjustment costs associated with climate change. In this paper I explore two learning models that describe how local weather observations are translated into perceptions of climate change: an efficient Bayesian learning model and a simpler rolling-mean heuristic. Both have a period during which the learner's beliefs about the state of the climate are different from its true state, meaning the learner is exposed to a different range of extreme weather outcomes then they are prepared for. Using the example of surface temperature trends, I quantify this additional exposure to extreme heat events under both learning models and both RCP 8.5 and 2.6. Risk exposure increases for both learning models, but by substantially more for the rolling-mean learner. Moreover, there is an interaction between the learning model and the rate of climate change: the inefficient rolling-mean learner benefits much more from the slower rates of change under RCP 2.6 then the Bayesian. Finally, I present results from an experiment that suggests people are able to learn about a trending climate in a manner consistent with the Bayesian model.

  18. Large-angle correlations in the cosmic microwave background

    NASA Astrophysics Data System (ADS)

    Efstathiou, George; Ma, Yin-Zhe; Hanson, Duncan

    2010-10-01

    It has been argued recently by Copi et al. 2009 that the lack of large angular correlations of the CMB temperature field provides strong evidence against the standard, statistically isotropic, inflationary Lambda cold dark matter (ΛCDM) cosmology. We compare various estimators of the temperature correlation function showing how they depend on assumptions of statistical isotropy and how they perform on the Wilkinson Microwave Anisotropy Probe (WMAP) 5-yr Internal Linear Combination (ILC) maps with and without a sky cut. We show that the low multipole harmonics that determine the large-scale features of the temperature correlation function can be reconstructed accurately from the data that lie outside the sky cuts. The reconstructions are only weakly dependent on the assumed statistical properties of the temperature field. The temperature correlation functions computed from these reconstructions are in good agreement with those computed from the ILC map over the whole sky. We conclude that the large-scale angular correlation function for our realization of the sky is well determined. A Bayesian analysis of the large-scale correlations is presented, which shows that the data cannot exclude the standard ΛCDM model. We discuss the differences between our results and those of Copi et al. Either there exists a violation of statistical isotropy as claimed by Copi et al., or these authors have overestimated the significance of the discrepancy because of a posteriori choices of estimator, statistic and sky cut.

  19. Performance/price estimates for cortex-scale hardware: a design space exploration.

    PubMed

    Zaveri, Mazad S; Hammerstrom, Dan

    2011-04-01

    In this paper, we revisit the concept of virtualization. Virtualization is useful for understanding and investigating the performance/price and other trade-offs related to the hardware design space. Moreover, it is perhaps the most important aspect of a hardware design space exploration. Such a design space exploration is a necessary part of the study of hardware architectures for large-scale computational models for intelligent computing, including AI, Bayesian, bio-inspired and neural models. A methodical exploration is needed to identify potentially interesting regions in the design space, and to assess the relative performance/price points of these implementations. As an example, in this paper we investigate the performance/price of (digital and mixed-signal) CMOS and hypothetical CMOL (nanogrid) technology based hardware implementations of human cortex-scale spiking neural systems. Through this analysis, and the resulting performance/price points, we demonstrate, in general, the importance of virtualization, and of doing these kinds of design space explorations. The specific results suggest that hybrid nanotechnology such as CMOL is a promising candidate to implement very large-scale spiking neural systems, providing a more efficient utilization of the density and storage benefits of emerging nano-scale technologies. In general, we believe that the study of such hypothetical designs/architectures will guide the neuromorphic hardware community towards building large-scale systems, and help guide research trends in intelligent computing, and computer engineering. Copyright © 2010 Elsevier Ltd. All rights reserved.

  20. A generative model of whole-brain effective connectivity.

    PubMed

    Frässle, Stefan; Lomakina, Ekaterina I; Kasper, Lars; Manjaly, Zina M; Leff, Alex; Pruessmann, Klaas P; Buhmann, Joachim M; Stephan, Klaas E

    2018-05-25

    The development of whole-brain models that can infer effective (directed) connection strengths from fMRI data represents a central challenge for computational neuroimaging. A recently introduced generative model of fMRI data, regression dynamic causal modeling (rDCM), moves towards this goal as it scales gracefully to very large networks. However, large-scale networks with thousands of connections are difficult to interpret; additionally, one typically lacks information (data points per free parameter) for precise estimation of all model parameters. This paper introduces sparsity constraints to the variational Bayesian framework of rDCM as a solution to these problems in the domain of task-based fMRI. This sparse rDCM approach enables highly efficient effective connectivity analyses in whole-brain networks and does not require a priori assumptions about the network's connectivity structure but prunes fully (all-to-all) connected networks as part of model inversion. Following the derivation of the variational Bayesian update equations for sparse rDCM, we use both simulated and empirical data to assess the face validity of the model. In particular, we show that it is feasible to infer effective connection strengths from fMRI data using a network with more than 100 regions and 10,000 connections. This demonstrates the feasibility of whole-brain inference on effective connectivity from fMRI data - in single subjects and with a run-time below 1 min when using parallelized code. We anticipate that sparse rDCM may find useful application in connectomics and clinical neuromodeling - for example, for phenotyping individual patients in terms of whole-brain network structure. Copyright © 2018. Published by Elsevier Inc.

  1. Bayesian estimation and use of high-throughput remote sensing indices for quantitative genetic analyses of leaf growth.

    PubMed

    Baker, Robert L; Leong, Wen Fung; An, Nan; Brock, Marcus T; Rubin, Matthew J; Welch, Stephen; Weinig, Cynthia

    2018-02-01

    We develop Bayesian function-valued trait models that mathematically isolate genetic mechanisms underlying leaf growth trajectories by factoring out genotype-specific differences in photosynthesis. Remote sensing data can be used instead of leaf-level physiological measurements. Characterizing the genetic basis of traits that vary during ontogeny and affect plant performance is a major goal in evolutionary biology and agronomy. Describing genetic programs that specifically regulate morphological traits can be complicated by genotypic differences in physiological traits. We describe the growth trajectories of leaves using novel Bayesian function-valued trait (FVT) modeling approaches in Brassica rapa recombinant inbred lines raised in heterogeneous field settings. While frequentist approaches estimate parameter values by treating each experimental replicate discretely, Bayesian models can utilize information in the global dataset, potentially leading to more robust trait estimation. We illustrate this principle by estimating growth asymptotes in the face of missing data and comparing heritabilities of growth trajectory parameters estimated by Bayesian and frequentist approaches. Using pseudo-Bayes factors, we compare the performance of an initial Bayesian logistic growth model and a model that incorporates carbon assimilation (A max ) as a cofactor, thus statistically accounting for genotypic differences in carbon resources. We further evaluate two remotely sensed spectroradiometric indices, photochemical reflectance (pri2) and MERIS Terrestrial Chlorophyll Index (mtci) as covariates in lieu of A max , because these two indices were genetically correlated with A max across years and treatments yet allow much higher throughput compared to direct leaf-level gas-exchange measurements. For leaf lengths in uncrowded settings, including A max improves model fit over the initial model. The mtci and pri2 indices also outperform direct A max measurements. Of particular importance for evolutionary biologists and plant breeders, hierarchical Bayesian models estimating FVT parameters improve heritabilities compared to frequentist approaches.

  2. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  3. Biomass logistics analysis for large scale biofuel production: case study of loblolly pine and switchgrass.

    PubMed

    Lu, Xiaoming; Withers, Mitch R; Seifkar, Navid; Field, Randall P; Barrett, Steven R H; Herzog, Howard J

    2015-05-01

    The objective of this study was to assess the costs, energy consumption and greenhouse gas (GHG) emissions throughout the biomass supply chain for large scale biofuel production. Two types of energy crop were considered, switchgrass and loblolly pine, as representative of herbaceous and woody biomass. A biomass logistics model has been developed to estimate the feedstock supply system from biomass production through transportation. Biomass in the form of woodchip, bale and pellet was investigated with road, railway and waterway transportation options. Our analysis indicated that the farm or forest gate cost is lowest for loblolly pine whole tree woodchip at $39.7/dry tonne and highest for switchgrass round bale at $72.3/dry tonne. Switchgrass farm gate GHG emissions is approximately 146kgCO2e/dry tonne, about 4 times higher than loblolly pine. The optimum biomass transportation mode and delivered form are determined by the tradeoff between fixed and variable costs for feedstock shipment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Bayesian analysis of factors associated with fibromyalgia syndrome subjects

    NASA Astrophysics Data System (ADS)

    Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie

    2015-01-01

    Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.

  5. Boosting Bayesian parameter inference of nonlinear stochastic differential equation models by Hamiltonian scale separation.

    PubMed

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.

  6. Large-Scale Disasters

    NASA Astrophysics Data System (ADS)

    Gad-El-Hak, Mohamed

    "Extreme" events - including climatic events, such as hurricanes, tornadoes, and drought - can cause massive disruption to society, including large death tolls and property damage in the billions of dollars. Events in recent years have shown the importance of being prepared and that countries need to work together to help alleviate the resulting pain and suffering. This volume presents a review of the broad research field of large-scale disasters. It establishes a common framework for predicting, controlling and managing both manmade and natural disasters. There is a particular focus on events caused by weather and climate change. Other topics include air pollution, tsunamis, disaster modeling, the use of remote sensing and the logistics of disaster management. It will appeal to scientists, engineers, first responders and health-care professionals, in addition to graduate students and researchers who have an interest in the prediction, prevention or mitigation of large-scale disasters.

  7. Landscape object-based analysis of wetland plant functional types: the effects of spatial scale, vegetation classes and classifier methods

    NASA Astrophysics Data System (ADS)

    Dronova, I.; Gong, P.; Wang, L.; Clinton, N.; Fu, W.; Qi, S.

    2011-12-01

    Remote sensing-based vegetation classifications representing plant function such as photosynthesis and productivity are challenging in wetlands with complex cover and difficult field access. Recent advances in object-based image analysis (OBIA) and machine-learning algorithms offer new classification tools; however, few comparisons of different algorithms and spatial scales have been discussed to date. We applied OBIA to delineate wetland plant functional types (PFTs) for Poyang Lake, the largest freshwater lake in China and Ramsar wetland conservation site, from 30-m Landsat TM scene at the peak of spring growing season. We targeted major PFTs (C3 grasses, C3 forbs and different types of C4 grasses and aquatic vegetation) that are both key players in system's biogeochemical cycles and critical providers of waterbird habitat. Classification results were compared among: a) several object segmentation scales (with average object sizes 900-9000 m2); b) several families of statistical classifiers (including Bayesian, Logistic, Neural Network, Decision Trees and Support Vector Machines) and c) two hierarchical levels of vegetation classification, a generalized 3-class set and more detailed 6-class set. We found that classification benefited from object-based approach which allowed including object shape, texture and context descriptors in classification. While a number of classifiers achieved high accuracy at the finest pixel-equivalent segmentation scale, the highest accuracies and best agreement among algorithms occurred at coarser object scales. No single classifier was consistently superior across all scales, although selected algorithms of Neural Network, Logistic and K-Nearest Neighbors families frequently provided the best discrimination of classes at different scales. The choice of vegetation categories also affected classification accuracy. The 6-class set allowed for higher individual class accuracies but lower overall accuracies than the 3-class set because individual classes differed in scales at which they were best discriminated from others. Main classification challenges included a) presence of C3 grasses in C4-grass areas, particularly following harvesting of C4 reeds and b) mixtures of emergent, floating and submerged aquatic plants at sub-object and sub-pixel scales. We conclude that OBIA with advanced statistical classifiers offers useful instruments for landscape vegetation analyses, and that spatial scale considerations are critical in mapping PFTs, while multi-scale comparisons can be used to guide class selection. Future work will further apply fuzzy classification and field-collected spectral data for PFT analysis and compare results with MODIS PFT products.

  8. Driver injury severity outcome analysis in rural interstate highway crashes: a two-level Bayesian logistic regression interpretation.

    PubMed

    Chen, Cong; Zhang, Guohui; Liu, Xiaoyue Cathy; Ci, Yusheng; Huang, Helai; Ma, Jianming; Chen, Yanyan; Guan, Hongzhi

    2016-12-01

    There is a high potential of severe injury outcomes in traffic crashes on rural interstate highways due to the significant amount of high speed traffic on these corridors. Hierarchical Bayesian models are capable of incorporating between-crash variance and within-crash correlations into traffic crash data analysis and are increasingly utilized in traffic crash severity analysis. This paper applies a hierarchical Bayesian logistic model to examine the significant factors at crash and vehicle/driver levels and their heterogeneous impacts on driver injury severity in rural interstate highway crashes. Analysis results indicate that the majority of the total variance is induced by the between-crash variance, showing the appropriateness of the utilized hierarchical modeling approach. Three crash-level variables and six vehicle/driver-level variables are found significant in predicting driver injury severities: road curve, maximum vehicle damage in a crash, number of vehicles in a crash, wet road surface, vehicle type, driver age, driver gender, driver seatbelt use and driver alcohol or drug involvement. Among these variables, road curve, functional and disabled vehicle damage in crash, single-vehicle crashes, female drivers, senior drivers, motorcycles and driver alcohol or drug involvement tend to increase the odds of drivers being incapably injured or killed in rural interstate crashes, while wet road surface, male drivers and driver seatbelt use are more likely to decrease the probability of severe driver injuries. The developed methodology and estimation results provide insightful understanding of the internal mechanism of rural interstate crashes and beneficial references for developing effective countermeasures for rural interstate crash prevention. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.

    PubMed

    Chen, Shizhi; Yang, Xiaodong; Tian, Yingli

    2015-09-01

    A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.

  10. An introduction to Bayesian statistics in health psychology.

    PubMed

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  11. Construction of monitoring model and algorithm design on passenger security during shipping based on improved Bayesian network.

    PubMed

    Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng

    2014-01-01

    A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping.

  12. Construction of Monitoring Model and Algorithm Design on Passenger Security during Shipping Based on Improved Bayesian Network

    PubMed Central

    Wang, Jiali; Zhang, Qingnian; Ji, Wenfeng

    2014-01-01

    A large number of data is needed by the computation of the objective Bayesian network, but the data is hard to get in actual computation. The calculation method of Bayesian network was improved in this paper, and the fuzzy-precise Bayesian network was obtained. Then, the fuzzy-precise Bayesian network was used to reason Bayesian network model when the data is limited. The security of passengers during shipping is affected by various factors, and it is hard to predict and control. The index system that has the impact on the passenger safety during shipping was established on basis of the multifield coupling theory in this paper. Meanwhile, the fuzzy-precise Bayesian network was applied to monitor the security of passengers in the shipping process. The model was applied to monitor the passenger safety during shipping of a shipping company in Hainan, and the effectiveness of this model was examined. This research work provides guidance for guaranteeing security of passengers during shipping. PMID:25254227

  13. Bayesian ensemble refinement by replica simulations and reweighting.

    PubMed

    Hummer, Gerhard; Köfinger, Jürgen

    2015-12-28

    We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

  14. Bayesian ensemble refinement by replica simulations and reweighting

    NASA Astrophysics Data System (ADS)

    Hummer, Gerhard; Köfinger, Jürgen

    2015-12-01

    We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.

  15. A quantification of the effectiveness of EPID dosimetry and software-based plan verification systems in detecting incidents in radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bojechko, Casey; Phillps, Mark; Kalet, Alan

    Purpose: Complex treatments in radiation therapy require robust verification in order to prevent errors that can adversely affect the patient. For this purpose, the authors estimate the effectiveness of detecting errors with a “defense in depth” system composed of electronic portal imaging device (EPID) based dosimetry and a software-based system composed of rules-based and Bayesian network verifications. Methods: The authors analyzed incidents with a high potential severity score, scored as a 3 or 4 on a 4 point scale, recorded in an in-house voluntary incident reporting system, collected from February 2012 to August 2014. The incidents were categorized into differentmore » failure modes. The detectability, defined as the number of incidents that are detectable divided total number of incidents, was calculated for each failure mode. Results: In total, 343 incidents were used in this study. Of the incidents 67% were related to photon external beam therapy (EBRT). The majority of the EBRT incidents were related to patient positioning and only a small number of these could be detected by EPID dosimetry when performed prior to treatment (6%). A large fraction could be detected by in vivo dosimetry performed during the first fraction (74%). Rules-based and Bayesian network verifications were found to be complimentary to EPID dosimetry, able to detect errors related to patient prescriptions and documentation, and errors unrelated to photon EBRT. Combining all of the verification steps together, 91% of all EBRT incidents could be detected. Conclusions: This study shows that the defense in depth system is potentially able to detect a large majority of incidents. The most effective EPID-based dosimetry verification is in vivo measurements during the first fraction and is complemented by rules-based and Bayesian network plan checking.« less

  16. Power in Bayesian Mediation Analysis for Small Sample Research

    PubMed Central

    Miočević, Milica; MacKinnon, David P.; Levy, Roy

    2018-01-01

    It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results. PMID:29662296

  17. Power in Bayesian Mediation Analysis for Small Sample Research.

    PubMed

    Miočević, Milica; MacKinnon, David P; Levy, Roy

    2017-01-01

    It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.

  18. Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data

    PubMed Central

    Liao, J. G.; Mcmurry, Timothy; Berg, Arthur

    2014-01-01

    Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology. PMID:23934072

  19. Logistics modelling: improving resource management and public information strategies in Florida.

    PubMed

    Walsh, Daniel M; Van Groningen, Chuck; Craig, Brian

    2011-10-01

    One of the most time-sensitive and logistically-challenging emergency response operations today is to provide mass prophylaxis to every man, woman and child in a community within 48 hours of a bioterrorism attack. To meet this challenge, federal, state and local public health departments in the USA have joined forces to develop, test and execute large-scale bioterrorism response plans. This preparedness and response effort is funded through the US Centers for Disease Control and Prevention's Cities Readiness Initiative, a programme dedicated to providing oral antibiotics to an entire population within 48 hours of a weaponised inhalation anthrax attack. This paper will demonstrate how the State of Florida used a logistics modelling tool to improve its CRI mass prophylaxis plans. Special focus will be on how logistics modelling strengthened Florida's resource management policies and validated its public information strategies.

  20. Conservation of reef manta rays (Manta alfredi) in a UNESCO World Heritage Site: Large-scale island development or sustainable tourism?

    PubMed Central

    Elamin, Nasreldin Alhasan; Yurkowski, David James; Chekchak, Tarik; Walter, Ryan Patrick; Klaus, Rebecca; Hill, Graham; Hussey, Nigel Edward

    2017-01-01

    A large reef manta ray (Manta alfredi) aggregation has been observed off the north Sudanese Red Sea coast since the 1950s. Sightings have been predominantly within the boundaries of a marine protected area (MPA), which was designated a UNESCO World Heritage Site in July 2016. Contrasting economic development trajectories have been proposed for the area (small-scale ecotourism and large-scale island development). To examine space-use, Wildlife Computers® SPOT 5 tags were secured to three manta rays. A two-state switching Bayesian state space model (BSSM), that allowed movement parameters to switch between resident and travelling, was fit to the recorded locations, and 50% and 95% kernel utilization distributions (KUD) home ranges calculated. A total of 682 BSSM locations were recorded between 30 October 2012 and 6 November 2013. Of these, 98.5% fell within the MPA boundaries; 99.5% for manta 1, 91.5% for manta 2, and 100% for manta 3. The BSSM identified that all three mantas were resident during 99% of transmissions, with 50% and 95% KUD home ranges falling mainly within the MPA boundaries. For all three mantas combined (88.4%), and all individuals (manta 1–92.4%, manta 2–64.9%, manta 3–91.9%), the majority of locations occurred within 15 km of the proposed large-scale island development. Results indicated that the MPA boundaries are spatially appropriate for manta rays in the region, however, a close association to the proposed large-scale development highlights the potential threat of disruption. Conversely, the focused nature of spatial use highlights the potential for reliable ecotourism opportunities. PMID:29069079

  1. Conservation of reef manta rays (Manta alfredi) in a UNESCO World Heritage Site: Large-scale island development or sustainable tourism?

    PubMed

    Kessel, Steven Thomas; Elamin, Nasreldin Alhasan; Yurkowski, David James; Chekchak, Tarik; Walter, Ryan Patrick; Klaus, Rebecca; Hill, Graham; Hussey, Nigel Edward

    2017-01-01

    A large reef manta ray (Manta alfredi) aggregation has been observed off the north Sudanese Red Sea coast since the 1950s. Sightings have been predominantly within the boundaries of a marine protected area (MPA), which was designated a UNESCO World Heritage Site in July 2016. Contrasting economic development trajectories have been proposed for the area (small-scale ecotourism and large-scale island development). To examine space-use, Wildlife Computers® SPOT 5 tags were secured to three manta rays. A two-state switching Bayesian state space model (BSSM), that allowed movement parameters to switch between resident and travelling, was fit to the recorded locations, and 50% and 95% kernel utilization distributions (KUD) home ranges calculated. A total of 682 BSSM locations were recorded between 30 October 2012 and 6 November 2013. Of these, 98.5% fell within the MPA boundaries; 99.5% for manta 1, 91.5% for manta 2, and 100% for manta 3. The BSSM identified that all three mantas were resident during 99% of transmissions, with 50% and 95% KUD home ranges falling mainly within the MPA boundaries. For all three mantas combined (88.4%), and all individuals (manta 1-92.4%, manta 2-64.9%, manta 3-91.9%), the majority of locations occurred within 15 km of the proposed large-scale island development. Results indicated that the MPA boundaries are spatially appropriate for manta rays in the region, however, a close association to the proposed large-scale development highlights the potential threat of disruption. Conversely, the focused nature of spatial use highlights the potential for reliable ecotourism opportunities.

  2. Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence

    PubMed Central

    Ogunsakin, Ropo Ebenezer; Siaka, Lougue

    2017-01-01

    Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics. Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion. Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. PMID:29072396

  3. Improving Photometric Redshifts for Hyper Suprime-Cam

    NASA Astrophysics Data System (ADS)

    Speagle, Josh S.; Leauthaud, Alexie; Eisenstein, Daniel; Bundy, Kevin; Capak, Peter L.; Leistedt, Boris; Masters, Daniel C.; Mortlock, Daniel; Peiris, Hiranya; HSC Photo-z Team; HSC Weak Lensing Team

    2017-01-01

    Deriving accurate photometric redshift (photo-z) probability distribution functions (PDFs) are crucial science components for current and upcoming large-scale surveys. We outline how rigorous Bayesian inference and machine learning can be combined to quickly derive joint photo-z PDFs to individual galaxies and their parent populations. Using the first 170 deg^2 of data from the ongoing Hyper Suprime-Cam survey, we demonstrate our method is able to generate accurate predictions and reliable credible intervals over ~370k high-quality redshifts. We then use galaxy-galaxy lensing to empirically validate our predicted photo-z's over ~14M objects, finding a robust signal.

  4. Vertical land motion controls regional sea level rise patterns on the United States east coast since 1900

    NASA Astrophysics Data System (ADS)

    Piecuch, C. G.; Huybers, P. J.; Hay, C.; Mitrovica, J. X.; Little, C. M.; Ponte, R. M.; Tingley, M.

    2017-12-01

    Understanding observed spatial variations in centennial relative sea level trends on the United States east coast has important scientific and societal applications. Past studies based on models and proxies variously suggest roles for crustal displacement, ocean dynamics, and melting of the Greenland ice sheet. Here we perform joint Bayesian inference on regional relative sea level, vertical land motion, and absolute sea level fields based on tide gauge records and GPS data. Posterior solutions show that regional vertical land motion explains most (80% median estimate) of the spatial variance in the large-scale relative sea level trend field on the east coast over 1900-2016. The posterior estimate for coastal absolute sea level rise is remarkably spatially uniform compared to previous studies, with a spatial average of 1.4-2.3 mm/yr (95% credible interval). Results corroborate glacial isostatic adjustment models and reveal that meaningful long-period, large-scale vertical velocity signals can be extracted from short GPS records.

  5. Cosmology and the neutrino mass ordering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hannestad, Steen; Schwetz, Thomas, E-mail: sth@phys.au.dk, E-mail: schwetz@kit.edu

    We propose a simple method to quantify a possible exclusion of the inverted neutrino mass ordering from cosmological bounds on the sum of the neutrino masses. The method is based on Bayesian inference and allows for a calculation of the posterior odds of normal versus inverted ordering. We apply the method for a specific set of current data from Planck CMB data and large-scale structure surveys, providing an upper bound on the sum of neutrino masses of 0.14 eV at 95% CL. With this analysis we obtain posterior odds for normal versus inverted ordering of about 2:1. If cosmological datamore » is combined with data from oscillation experiments the odds reduce to about 3:2. For an exclusion of the inverted ordering from cosmology at more than 95% CL, an accuracy of better than 0.02 eV is needed for the sum. We demonstrate that such a value could be reached with planned observations of large scale structure by analysing artificial mock data for a EUCLID-like survey.« less

  6. A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates

    NASA Astrophysics Data System (ADS)

    Lima, Carlos H. R.; Lall, Upmanu; Troy, Tara; Devineni, Naresh

    2016-10-01

    We estimate local and regional Generalized Extreme Value (GEV) distribution parameters for flood frequency analysis in a multilevel, hierarchical Bayesian framework, to explicitly model and reduce uncertainties. As prior information for the model, we assume that the GEV location and scale parameters for each site come from independent log-normal distributions, whose mean parameter scales with the drainage area. From empirical and theoretical arguments, the shape parameter for each site is shrunk towards a common mean. Non-informative prior distributions are assumed for the hyperparameters and the MCMC method is used to sample from the joint posterior distribution. The model is tested using annual maximum series from 20 streamflow gauges located in an 83,000 km2 flood prone basin in Southeast Brazil. The results show a significant reduction of uncertainty estimates of flood quantile estimates over the traditional GEV model, particularly for sites with shorter records. For return periods within the range of the data (around 50 years), the Bayesian credible intervals for the flood quantiles tend to be narrower than the classical confidence limits based on the delta method. As the return period increases beyond the range of the data, the confidence limits from the delta method become unreliable and the Bayesian credible intervals provide a way to estimate satisfactory confidence bands for the flood quantiles considering parameter uncertainties and regional information. In order to evaluate the applicability of the proposed hierarchical Bayesian model for regional flood frequency analysis, we estimate flood quantiles for three randomly chosen out-of-sample sites and compare with classical estimates using the index flood method. The posterior distributions of the scaling law coefficients are used to define the predictive distributions of the GEV location and scale parameters for the out-of-sample sites given only their drainage areas and the posterior distribution of the average shape parameter is taken as the regional predictive distribution for this parameter. While the index flood method does not provide a straightforward way to consider the uncertainties in the index flood and in the regional parameters, the results obtained here show that the proposed Bayesian method is able to produce adequate credible intervals for flood quantiles that are in accordance with empirical estimates.

  7. Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination.

    PubMed

    Zhao, Qibin; Zhang, Liqing; Cichocki, Andrzej

    2015-09-01

    CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank . In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.

  8. On parametrized cold dense matter equation-of-state inference

    NASA Astrophysics Data System (ADS)

    Riley, Thomas E.; Raaijmakers, Geert; Watts, Anna L.

    2018-07-01

    Constraining the equation of state of cold dense matter in compact stars is a major science goal for observing programmes being conducted using X-ray, radio, and gravitational wave telescopes. We discuss Bayesian hierarchical inference of parametrized dense matter equations of state. In particular, we generalize and examine two inference paradigms from the literature: (i) direct posterior equation-of-state parameter estimation, conditioned on observations of a set of rotating compact stars; and (ii) indirect parameter estimation, via transformation of an intermediary joint posterior distribution of exterior spacetime parameters (such as gravitational masses and coordinate equatorial radii). We conclude that the former paradigm is not only tractable for large-scale analyses, but is principled and flexible from a Bayesian perspective while the latter paradigm is not. The thematic problem of Bayesian prior definition emerges as the crux of the difference between these paradigms. The second paradigm should in general only be considered as an ill-defined approach to the problem of utilizing archival posterior constraints on exterior spacetime parameters; we advocate for an alternative approach whereby such information is repurposed as an approximative likelihood function. We also discuss why conditioning on a piecewise-polytropic equation-of-state model - currently standard in the field of dense matter study - can easily violate conditions required for transformation of a probability density distribution between spaces of exterior (spacetime) and interior (source matter) parameters.

  9. On parametrised cold dense matter equation of state inference

    NASA Astrophysics Data System (ADS)

    Riley, Thomas E.; Raaijmakers, Geert; Watts, Anna L.

    2018-04-01

    Constraining the equation of state of cold dense matter in compact stars is a major science goal for observing programmes being conducted using X-ray, radio, and gravitational wave telescopes. We discuss Bayesian hierarchical inference of parametrised dense matter equations of state. In particular we generalise and examine two inference paradigms from the literature: (i) direct posterior equation of state parameter estimation, conditioned on observations of a set of rotating compact stars; and (ii) indirect parameter estimation, via transformation of an intermediary joint posterior distribution of exterior spacetime parameters (such as gravitational masses and coordinate equatorial radii). We conclude that the former paradigm is not only tractable for large-scale analyses, but is principled and flexible from a Bayesian perspective whilst the latter paradigm is not. The thematic problem of Bayesian prior definition emerges as the crux of the difference between these paradigms. The second paradigm should in general only be considered as an ill-defined approach to the problem of utilising archival posterior constraints on exterior spacetime parameters; we advocate for an alternative approach whereby such information is repurposed as an approximative likelihood function. We also discuss why conditioning on a piecewise-polytropic equation of state model - currently standard in the field of dense matter study - can easily violate conditions required for transformation of a probability density distribution between spaces of exterior (spacetime) and interior (source matter) parameters.

  10. 2D Bayesian automated tilted-ring fitting of disc galaxies in large H I galaxy surveys: 2DBAT

    NASA Astrophysics Data System (ADS)

    Oh, Se-Heon; Staveley-Smith, Lister; Spekkens, Kristine; Kamphuis, Peter; Koribalski, Bärbel S.

    2018-01-01

    We present a novel algorithm based on a Bayesian method for 2D tilted-ring analysis of disc galaxy velocity fields. Compared to the conventional algorithms based on a chi-squared minimization procedure, this new Bayesian-based algorithm suffers less from local minima of the model parameters even with highly multimodal posterior distributions. Moreover, the Bayesian analysis, implemented via Markov Chain Monte Carlo sampling, only requires broad ranges of posterior distributions of the parameters, which makes the fitting procedure fully automated. This feature will be essential when performing kinematic analysis on the large number of resolved galaxies expected to be detected in neutral hydrogen (H I) surveys with the Square Kilometre Array and its pathfinders. The so-called 2D Bayesian Automated Tilted-ring fitter (2DBAT) implements Bayesian fits of 2D tilted-ring models in order to derive rotation curves of galaxies. We explore 2DBAT performance on (a) artificial H I data cubes built based on representative rotation curves of intermediate-mass and massive spiral galaxies, and (b) Australia Telescope Compact Array H I data from the Local Volume H I Survey. We find that 2DBAT works best for well-resolved galaxies with intermediate inclinations (20° < i < 70°), complementing 3D techniques better suited to modelling inclined galaxies.

  11. Investigating the Theoretical Structure of the DAS-II Core Battery at School Age Using Bayesian Structural Equation Modeling

    ERIC Educational Resources Information Center

    Dombrowski, Stefan C.; Golay, Philippe; McGill, Ryan J.; Canivez, Gary L.

    2018-01-01

    Bayesian structural equation modeling (BSEM) was used to investigate the latent structure of the Differential Ability Scales-Second Edition core battery using the standardization sample normative data for ages 7-17. Results revealed plausibility of a three-factor model, consistent with publisher theory, expressed as either a higher-order (HO) or a…

  12. Bayesian Estimation in the One-Parameter Latent Trait Model.

    DTIC Science & Technology

    1980-03-01

    Journal of Mathematical and Statistical Psychology , 1973, 26, 31-44. (a) Andersen, E. B. A goodness of fit test for the Rasch model. Psychometrika, 1973, 28...technique for estimating latent trait mental test parameters. Educational and Psychological Measurement, 1976, 36, 705-715. Lindley, D. V. The...Lord, F. M. An analysis of verbal Scholastic Aptitude Test using Birnbaum’s three-parameter logistic model. Educational and Psychological

  13. Soldier Quality of Life Assessment

    DTIC Science & Technology

    2016-09-01

    ABSTRACT This report documents survey research and modeling of Soldier quality of life (QoL) on contingency base camps by the U.S. Army Natick...Science and Technology Objective Demonstration, was to develop a way to quantify QoL for camps housing fewer than 1000 personnel. A discrete choice survey ... Survey results were analyzed using hierarchical Bayesian logistic regression to develop a quantitative model for estimating QoL based on base camp

  14. Predicting the geographical distribution of two invasive termite species from occurrence data.

    PubMed

    Tonini, Francesco; Divino, Fabio; Lasinio, Giovanna Jona; Hochmair, Hartwig H; Scheffrahn, Rudolf H

    2014-10-01

    Predicting the potential habitat of species under both current and future climate change scenarios is crucial for monitoring invasive species and understanding a species' response to different environmental conditions. Frequently, the only data available on a species is the location of its occurrence (presence-only data). Using occurrence records only, two models were used to predict the geographical distribution of two destructive invasive termite species, Coptotermes gestroi (Wasmann) and Coptotermes formosanus Shiraki. The first model uses a Bayesian linear logistic regression approach adjusted for presence-only data while the second one is the widely used maximum entropy approach (Maxent). Results show that the predicted distributions of both C. gestroi and C. formosanus are strongly linked to urban development. The impact of future scenarios such as climate warming and population growth on the biotic distribution of both termite species was also assessed. Future climate warming seems to affect their projected probability of presence to a lesser extent than population growth. The Bayesian logistic approach outperformed Maxent consistently in all models according to evaluation criteria such as model sensitivity and ecological realism. The importance of further studies for an explicit treatment of residual spatial autocorrelation and a more comprehensive comparison between both statistical approaches is suggested.

  15. Acquiring data for large aquatic resource surveys: the art of ompromise among science, logistics, and reality

    EPA Science Inventory

    The US Environmental Protection Agency (EPA) is revising its strategy to obtain the information needed to answer questions pertinent to water-quality management efficiently and rigorously at national scales. One tool of this revised strategy is use of statistically based surveys ...

  16. General Purpose Sampling in the Domain of Higher Education.

    ERIC Educational Resources Information Center

    Creager, John A.

    The experience of the American Council on Education's Cooperative Institutional Research Program indicates that large-scale national surveys in the domain of higher education can be performed with scientific integrity within the constraints of costs, logistics, and technical resources. The purposes of this report are to provide complete and…

  17. Uncertainty quantification in LES of channel flow

    DOE PAGES

    Safta, Cosmin; Blaylock, Myra; Templeton, Jeremy; ...

    2016-07-12

    Here, in this paper, we present a Bayesian framework for estimating joint densities for large eddy simulation (LES) sub-grid scale model parameters based on canonical forced isotropic turbulence direct numerical simulation (DNS) data. The framework accounts for noise in the independent variables, and we present alternative formulations for accounting for discrepancies between model and data. To generate probability densities for flow characteristics, posterior densities for sub-grid scale model parameters are propagated forward through LES of channel flow and compared with DNS data. Synthesis of the calibration and prediction results demonstrates that model parameters have an explicit filter width dependence andmore » are highly correlated. Discrepancies between DNS and calibrated LES results point to additional model form inadequacies that need to be accounted for.« less

  18. Transboundary fisheries science: Meeting the challenges of inland fisheries management in the 21st century

    USGS Publications Warehouse

    Midway, Stephen R.; Wagner, Tyler; Zydlewski, Joseph D.; Irwin, Brian J.; Paukert, Craig P.

    2016-01-01

    Managing inland fisheries in the 21st century presents several obstacles, including the need to view fisheries from multiple spatial and temporal scales, which usually involves populations and resources spanning sociopolitical boundaries. Though collaboration is not new to fisheries science, inland aquatic systems have historically been managed at local scales and present different challenges than in marine or large freshwater systems like the Laurentian Great Lakes. Therefore, we outline a flexible strategy that highlights organization, cooperation, analytics, and implementation as building blocks toward effectively addressing transboundary fisheries issues. Additionally, we discuss the use of Bayesian hierarchical models (within the analytical stage), due to their flexibility in dealing with the variability present in data from multiple scales. With growing recognition of both ecological drivers that span spatial and temporal scales and the subsequent need for collaboration to effectively manage heterogeneous resources, we expect implementation of transboundary approaches to become increasingly critical for effective inland fisheries management.

  19. A large-scale assessment of two-way SNP interactions in breast cancer susceptibility using 46 450 cases and 42 461 controls from the breast cancer association consortium

    PubMed Central

    Milne, Roger L.; Herranz, Jesús; Michailidou, Kyriaki; Dennis, Joe; Tyrer, Jonathan P.; Zamora, M. Pilar; Arias-Perez, José Ignacio; González-Neira, Anna; Pita, Guillermo; Alonso, M. Rosario; Wang, Qin; Bolla, Manjeet K.; Czene, Kamila; Eriksson, Mikael; Humphreys, Keith; Darabi, Hatef; Li, Jingmei; Anton-Culver, Hoda; Neuhausen, Susan L.; Ziogas, Argyrios; Clarke, Christina A.; Hopper, John L.; Dite, Gillian S.; Apicella, Carmel; Southey, Melissa C.; Chenevix-Trench, Georgia; Swerdlow, Anthony; Ashworth, Alan; Orr, Nicholas; Schoemaker, Minouk; Jakubowska, Anna; Lubinski, Jan; Jaworska-Bieniek, Katarzyna; Durda, Katarzyna; Andrulis, Irene L.; Knight, Julia A.; Glendon, Gord; Mulligan, Anna Marie; Bojesen, Stig E.; Nordestgaard, Børge G.; Flyger, Henrik; Nevanlinna, Heli; Muranen, Taru A.; Aittomäki, Kristiina; Blomqvist, Carl; Chang-Claude, Jenny; Rudolph, Anja; Seibold, Petra; Flesch-Janys, Dieter; Wang, Xianshu; Olson, Janet E.; Vachon, Celine; Purrington, Kristen; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Dunning, Alison M.; Shah, Mitul; Guénel, Pascal; Truong, Thérèse; Sanchez, Marie; Mulot, Claire; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Lindblom, Annika; Margolin, Sara; Hooning, Maartje J.; Hollestelle, Antoinette; Collée, J. Margriet; Jager, Agnes; Cox, Angela; Brock, Ian W.; Reed, Malcolm W.R.; Devilee, Peter; Tollenaar, Robert A.E.M.; Seynaeve, Caroline; Haiman, Christopher A.; Henderson, Brian E.; Schumacher, Fredrick; Le Marchand, Loic; Simard, Jacques; Dumont, Martine; Soucy, Penny; Dörk, Thilo; Bogdanova, Natalia V.; Hamann, Ute; Försti, Asta; Rüdiger, Thomas; Ulmer, Hans-Ulrich; Fasching, Peter A.; Häberle, Lothar; Ekici, Arif B.; Beckmann, Matthias W.; Fletcher, Olivia; Johnson, Nichola; dos Santos Silva, Isabel; Peto, Julian; Radice, Paolo; Peterlongo, Paolo; Peissel, Bernard; Mariani, Paolo; Giles, Graham G.; Severi, Gianluca; Baglietto, Laura; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Miller, Nicola; Marme, Federik; Burwinkel, Barbara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M.; Lambrechts, Diether; Yesilyurt, Betul T.; Floris, Giuseppe; Leunen, Karin; Alnæs, Grethe Grenaker; Kristensen, Vessela; Børresen-Dale, Anne-Lise; García-Closas, Montserrat; Chanock, Stephen J.; Lissowska, Jolanta; Figueroa, Jonine D.; Schmidt, Marjanka K.; Broeks, Annegien; Verhoef, Senno; Rutgers, Emiel J.; Brauch, Hiltrud; Brüning, Thomas; Ko, Yon-Dschun; Couch, Fergus J.; Toland, Amanda E.; Yannoukakos, Drakoulis; Pharoah, Paul D.P.; Hall, Per; Benítez, Javier; Malats, Núria; Easton, Douglas F.

    2014-01-01

    Part of the substantial unexplained familial aggregation of breast cancer may be due to interactions between common variants, but few studies have had adequate statistical power to detect interactions of realistic magnitude. We aimed to assess all two-way interactions in breast cancer susceptibility between 70 917 single nucleotide polymorphisms (SNPs) selected primarily based on prior evidence of a marginal effect. Thirty-eight international studies contributed data for 46 450 breast cancer cases and 42 461 controls of European origin as part of a multi-consortium project (COGS). First, SNPs were preselected based on evidence (P < 0.01) of a per-allele main effect, and all two-way combinations of those were evaluated by a per-allele (1 d.f.) test for interaction using logistic regression. Second, all 2.5 billion possible two-SNP combinations were evaluated using Boolean operation-based screening and testing, and SNP pairs with the strongest evidence of interaction (P < 10−4) were selected for more careful assessment by logistic regression. Under the first approach, 3277 SNPs were preselected, but an evaluation of all possible two-SNP combinations (1 d.f.) identified no interactions at P < 10−8. Results from the second analytic approach were consistent with those from the first (P > 10−10). In summary, we observed little evidence of two-way SNP interactions in breast cancer susceptibility, despite the large number of SNPs with potential marginal effects considered and the very large sample size. This finding may have important implications for risk prediction, simplifying the modelling required. Further comprehensive, large-scale genome-wide interaction studies may identify novel interacting loci if the inherent logistic and computational challenges can be overcome. PMID:24242184

  20. A large-scale assessment of two-way SNP interactions in breast cancer susceptibility using 46,450 cases and 42,461 controls from the breast cancer association consortium.

    PubMed

    Milne, Roger L; Herranz, Jesús; Michailidou, Kyriaki; Dennis, Joe; Tyrer, Jonathan P; Zamora, M Pilar; Arias-Perez, José Ignacio; González-Neira, Anna; Pita, Guillermo; Alonso, M Rosario; Wang, Qin; Bolla, Manjeet K; Czene, Kamila; Eriksson, Mikael; Humphreys, Keith; Darabi, Hatef; Li, Jingmei; Anton-Culver, Hoda; Neuhausen, Susan L; Ziogas, Argyrios; Clarke, Christina A; Hopper, John L; Dite, Gillian S; Apicella, Carmel; Southey, Melissa C; Chenevix-Trench, Georgia; Swerdlow, Anthony; Ashworth, Alan; Orr, Nicholas; Schoemaker, Minouk; Jakubowska, Anna; Lubinski, Jan; Jaworska-Bieniek, Katarzyna; Durda, Katarzyna; Andrulis, Irene L; Knight, Julia A; Glendon, Gord; Mulligan, Anna Marie; Bojesen, Stig E; Nordestgaard, Børge G; Flyger, Henrik; Nevanlinna, Heli; Muranen, Taru A; Aittomäki, Kristiina; Blomqvist, Carl; Chang-Claude, Jenny; Rudolph, Anja; Seibold, Petra; Flesch-Janys, Dieter; Wang, Xianshu; Olson, Janet E; Vachon, Celine; Purrington, Kristen; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Dunning, Alison M; Shah, Mitul; Guénel, Pascal; Truong, Thérèse; Sanchez, Marie; Mulot, Claire; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Lindblom, Annika; Margolin, Sara; Hooning, Maartje J; Hollestelle, Antoinette; Collée, J Margriet; Jager, Agnes; Cox, Angela; Brock, Ian W; Reed, Malcolm W R; Devilee, Peter; Tollenaar, Robert A E M; Seynaeve, Caroline; Haiman, Christopher A; Henderson, Brian E; Schumacher, Fredrick; Le Marchand, Loic; Simard, Jacques; Dumont, Martine; Soucy, Penny; Dörk, Thilo; Bogdanova, Natalia V; Hamann, Ute; Försti, Asta; Rüdiger, Thomas; Ulmer, Hans-Ulrich; Fasching, Peter A; Häberle, Lothar; Ekici, Arif B; Beckmann, Matthias W; Fletcher, Olivia; Johnson, Nichola; dos Santos Silva, Isabel; Peto, Julian; Radice, Paolo; Peterlongo, Paolo; Peissel, Bernard; Mariani, Paolo; Giles, Graham G; Severi, Gianluca; Baglietto, Laura; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Miller, Nicola; Marme, Federik; Burwinkel, Barbara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Lambrechts, Diether; Yesilyurt, Betul T; Floris, Giuseppe; Leunen, Karin; Alnæs, Grethe Grenaker; Kristensen, Vessela; Børresen-Dale, Anne-Lise; García-Closas, Montserrat; Chanock, Stephen J; Lissowska, Jolanta; Figueroa, Jonine D; Schmidt, Marjanka K; Broeks, Annegien; Verhoef, Senno; Rutgers, Emiel J; Brauch, Hiltrud; Brüning, Thomas; Ko, Yon-Dschun; Couch, Fergus J; Toland, Amanda E; Yannoukakos, Drakoulis; Pharoah, Paul D P; Hall, Per; Benítez, Javier; Malats, Núria; Easton, Douglas F

    2014-04-01

    Part of the substantial unexplained familial aggregation of breast cancer may be due to interactions between common variants, but few studies have had adequate statistical power to detect interactions of realistic magnitude. We aimed to assess all two-way interactions in breast cancer susceptibility between 70,917 single nucleotide polymorphisms (SNPs) selected primarily based on prior evidence of a marginal effect. Thirty-eight international studies contributed data for 46,450 breast cancer cases and 42,461 controls of European origin as part of a multi-consortium project (COGS). First, SNPs were preselected based on evidence (P < 0.01) of a per-allele main effect, and all two-way combinations of those were evaluated by a per-allele (1 d.f.) test for interaction using logistic regression. Second, all 2.5 billion possible two-SNP combinations were evaluated using Boolean operation-based screening and testing, and SNP pairs with the strongest evidence of interaction (P < 10(-4)) were selected for more careful assessment by logistic regression. Under the first approach, 3277 SNPs were preselected, but an evaluation of all possible two-SNP combinations (1 d.f.) identified no interactions at P < 10(-8). Results from the second analytic approach were consistent with those from the first (P > 10(-10)). In summary, we observed little evidence of two-way SNP interactions in breast cancer susceptibility, despite the large number of SNPs with potential marginal effects considered and the very large sample size. This finding may have important implications for risk prediction, simplifying the modelling required. Further comprehensive, large-scale genome-wide interaction studies may identify novel interacting loci if the inherent logistic and computational challenges can be overcome.

  1. Towards quantifying uncertainty in Greenland's contribution to 21st century sea-level rise

    NASA Astrophysics Data System (ADS)

    Perego, M.; Tezaur, I.; Price, S. F.; Jakeman, J.; Eldred, M.; Salinger, A.; Hoffman, M. J.

    2015-12-01

    We present recent work towards developing a methodology for quantifying uncertainty in Greenland's 21st century contribution to sea-level rise. While we focus on uncertainties associated with the optimization and calibration of the basal sliding parameter field, the methodology is largely generic and could be applied to other (or multiple) sets of uncertain model parameter fields. The first step in the workflow is the solution of a large-scale, deterministic inverse problem, which minimizes the mismatch between observed and computed surface velocities by optimizing the two-dimensional coefficient field in a linear-friction sliding law. We then expand the deviation in this coefficient field from its estimated "mean" state using a reduced basis of Karhunen-Loeve Expansion (KLE) vectors. A Bayesian calibration is used to determine the optimal coefficient values for this expansion. The prior for the Bayesian calibration can be computed using the Hessian of the deterministic inversion or using an exponential covariance kernel. The posterior distribution is then obtained using Markov Chain Monte Carlo run on an emulator of the forward model. Finally, the uncertainty in the modeled sea-level rise is obtained by performing an ensemble of forward propagation runs. We present and discuss preliminary results obtained using a moderate-resolution model of the Greenland Ice sheet. As demonstrated in previous work, the primary difficulty in applying the complete workflow to realistic, high-resolution problems is that the effective dimension of the parameter space is very large.

  2. Technology and testing.

    PubMed

    Quellmalz, Edys S; Pellegrino, James W

    2009-01-02

    Large-scale testing of educational outcomes benefits already from technological applications that address logistics such as development, administration, and scoring of tests, as well as reporting of results. Innovative applications of technology also provide rich, authentic tasks that challenge the sorts of integrated knowledge, critical thinking, and problem solving seldom well addressed in paper-based tests. Such tasks can be used on both large-scale and classroom-based assessments. Balanced assessment systems can be developed that integrate curriculum-embedded, benchmark, and summative assessments across classroom, district, state, national, and international levels. We discuss here the potential of technology to launch a new era of integrated, learning-centered assessment systems.

  3. Stochasticity of convection in Giga-LES data

    NASA Astrophysics Data System (ADS)

    De La Chevrotière, Michèle; Khouider, Boualem; Majda, Andrew J.

    2016-09-01

    The poor representation of tropical convection in general circulation models (GCMs) is believed to be responsible for much of the uncertainty in the predictions of weather and climate in the tropics. The stochastic multicloud model (SMCM) was recently developed by Khouider et al. (Commun Math Sci 8(1):187-216, 2010) to represent the missing variability in GCMs due to unresolved features of organized tropical convection. The SMCM is based on three cloud types (congestus, deep and stratiform), and transitions between these cloud types are formalized in terms of probability rules that are functions of the large-scale environment convective state and a set of seven arbitrary cloud timescale parameters. Here, a statistical inference method based on the Bayesian paradigm is applied to estimate these key cloud timescales from the Giga-LES dataset, a 24-h large-eddy simulation (LES) of deep tropical convection (Khairoutdinov et al. in J Adv Model Earth Syst 1(12), 2009) over a domain comparable to a GCM gridbox. A sequential learning strategy is used where the Giga-LES domain is partitioned into a few subdomains, and atmospheric time series obtained on each subdomain are used to train the Bayesian procedure incrementally. Convergence of the marginal posterior densities for all seven parameters is demonstrated for two different grid partitions, and sensitivity tests to other model parameters are also presented. A single column model simulation using the SMCM parameterization with the Giga-LES inferred parameters reproduces many important statistical features of the Giga-LES run, without any further tuning. In particular it exhibits intermittent dynamical behavior in both the stochastic cloud fractions and the large scale dynamics, with periods of dry phases followed by a coherent sequence of congestus, deep, and stratiform convection, varying on timescales of a few hours consistent with the Giga-LES time series. The chaotic variations of the cloud area fractions were captured fairly well both qualitatively and quantitatively demonstrating the stochastic nature of convection in the Giga-LES simulation.

  4. False Discovery Control in Large-Scale Spatial Multiple Testing

    PubMed Central

    Sun, Wenguang; Reich, Brian J.; Cai, T. Tony; Guindani, Michele; Schwartzman, Armin

    2014-01-01

    Summary This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US. PMID:25642138

  5. Constraining ecosystem processes from tower fluxes and atmospheric profiles.

    PubMed

    Hill, T C; Williams, M; Woodward, F I; Moncrieff, J B

    2011-07-01

    The planetary boundary layer (PBL) provides an important link between the scales and processes resolved by global atmospheric sampling/modeling and site-based flux measurements. The PBL is in direct contact with the land surface, both driving and responding to ecosystem processes. Measurements within the PBL (e.g., by radiosondes, aircraft profiles, and flask measurements) have a footprint, and thus an integrating scale, on the order of 1-100 km. We use the coupled atmosphere-biosphere model (CAB) and a Bayesian data assimilation framework to investigate the amount of biosphere process information that can be inferred from PBL measurements. We investigate the information content of PBL measurements in a two-stage study. First, we demonstrate consistency between the coupled model (CAB) and measurements, by comparing the model to eddy covariance flux tower measurements (i.e., water and carbon fluxes) and also PBL scalar profile measurements (i.e., water, carbon dioxide, and temperature) from Canadian boreal forest. Second, we use the CAB model in a set of Bayesian inversions experiments using synthetic data for a single day. In the synthetic experiment, leaf area and respiration were relatively well constrained, whereas surface albedo and plant hydraulic conductance were only moderately constrained. Finally, the abilities of the PBL profiles and the eddy covariance data to constrain the parameters were largely similar and only slightly lower than the combination of both observations.

  6. Local differentiation amidst extensive allele sharing in Oryza nivara and O. rufipogon

    PubMed Central

    Banaticla-Hilario, Maria Celeste N; van den Berg, Ronald G; Hamilton, Nigel Ruaraidh Sackville; McNally, Kenneth L

    2013-01-01

    Genetic variation patterns within and between species may change along geographic gradients and at different spatial scales. This was revealed by microsatellite data at 29 loci obtained from 119 accessions of three Oryza series Sativae species in Asia Pacific: Oryza nivara Sharma and Shastry, O. rufipogon Griff., and O. meridionalis Ng. Genetic similarities between O. nivara and O. rufipogon across their distribution are evident in the clustering and ordination results and in the large proportion of shared alleles between these taxa. However, local-level species separation is recognized by Bayesian clustering and neighbor-joining analyses. At the regional scale, the two species seem more differentiated in South Asia than in Southeast Asia as revealed by FST analysis. The presence of strong gene flow barriers in smaller spatial units is also suggested in the analysis of molecular variance (AMOVA) results where 64% of the genetic variation is contained among populations (as compared to 26% within populations and 10% among species). Oryza nivara (HE = 0.67) exhibits slightly lower diversity and greater population differentiation than O. rufipogon (HE = 0.70). Bayesian inference identified four, and at a finer structural level eight, genetically distinct population groups that correspond to geographic populations within the three taxa. Oryza meridionalis and the Nepalese O. nivara seemed diverged from all the population groups of the series, whereas the Australasian O. rufipogon appeared distinct from the rest of the species. PMID:24101993

  7. Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

    PubMed

    Waller, Niels G; Feuerstahler, Leah

    2017-01-01

    In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).

  8. Bayesian-Driven First-Principles Calculations for Accelerating Exploration of Fast Ion Conductors for Rechargeable Battery Application.

    PubMed

    Jalem, Randy; Kanamori, Kenta; Takeuchi, Ichiro; Nakayama, Masanobu; Yamasaki, Hisatsugu; Saito, Toshiya

    2018-04-11

    Safe and robust batteries are urgently requested today for power sources of electric vehicles. Thus, a growing interest has been noted for fabricating those with solid electrolytes. Materials search by density functional theory (DFT) methods offers great promise for finding new solid electrolytes but the evaluation is known to be computationally expensive, particularly on ion migration property. In this work, we proposed a Bayesian-optimization-driven DFT-based approach to efficiently screen for compounds with low ion migration energies ([Formula: see text]. We demonstrated this on 318 tavorite-type Li- and Na-containing compounds. We found that the scheme only requires ~30% of the total DFT-[Formula: see text] evaluations on the average to recover the optimal compound ~90% of the time. Its recovery performance for desired compounds in the tavorite search space is ~2× more than random search (i.e., for [Formula: see text] < 0.3 eV). Our approach offers a promising way for addressing computational bottlenecks in large-scale material screening for fast ionic conductors.

  9. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  10. On Bayesian Rules for Selecting 3PL Binary Items for Criterion-Referenced Interpretations and Creating Booklets for Bookmark Standard Setting.

    ERIC Educational Resources Information Center

    Huynh, Huynh

    By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…

  11. A New Family of Models for the Multiple-Choice Item.

    DTIC Science & Technology

    1979-12-19

    analysis of the verbal scholastic aptitude test using Birnhaum’s three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020...16. [8] McBride, J. R. Some properties of a Bayesian adaptive ability testing strategy. Applied Psychological Measurement, 1, 121-140, 1977. [9...University of Michigan Ann Arbor, MI 48106 ’~KL -137- Non Govt Mon Govt 1 Dr. Earl Hunt 1 Dr. Frederick N. Lord Dept. of Psychology Educational Testing

  12. Bayesian cloud detection for MERIS, AATSR, and their combination

    NASA Astrophysics Data System (ADS)

    Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.

    2014-11-01

    A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud masks were designed to be numerically efficient and suited for the processing of large amounts of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient amounts of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.

  13. Bayesian cloud detection for MERIS, AATSR, and their combination

    NASA Astrophysics Data System (ADS)

    Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.

    2015-04-01

    A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.

  14. A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers

    PubMed Central

    Puig, X; Ginebra, J; Graffelman, J

    2017-01-01

    The X chromosome is a relatively large chromosome, harboring a lot of genetic information. Much of the statistical analysis of X-chromosomal information is complicated by the fact that males only have one copy. Recently, frequentist statistical tests for Hardy–Weinberg equilibrium have been proposed specifically for dealing with markers on the X chromosome. Bayesian test procedures for Hardy–Weinberg equilibrium for the autosomes have been described, but Bayesian work on the X chromosome in this context is lacking. This paper gives the first Bayesian approach for testing Hardy–Weinberg equilibrium with biallelic markers at the X chromosome. Marginal and joint posterior distributions for the inbreeding coefficient in females and the male to female allele frequency ratio are computed, and used for statistical inference. The paper gives a detailed account of the proposed Bayesian test, and illustrates it with data from the 1000 Genomes project. In that implementation, a novel approach to tackle multiple testing from a Bayesian perspective through posterior predictive checks is used. PMID:28900292

  15. Nonlinear and non-Gaussian Bayesian based handwriting beautification

    NASA Astrophysics Data System (ADS)

    Shi, Cao; Xiao, Jianguo; Xu, Canhui; Jia, Wenhua

    2013-03-01

    A framework is proposed in this paper to effectively and efficiently beautify handwriting by means of a novel nonlinear and non-Gaussian Bayesian algorithm. In the proposed framework, format and size of handwriting image are firstly normalized, and then typeface in computer system is applied to optimize vision effect of handwriting. The Bayesian statistics is exploited to characterize the handwriting beautification process as a Bayesian dynamic model. The model parameters to translate, rotate and scale typeface in computer system are controlled by state equation, and the matching optimization between handwriting and transformed typeface is employed by measurement equation. Finally, the new typeface, which is transformed from the original one and gains the best nonlinear and non-Gaussian optimization, is the beautification result of handwriting. Experimental results demonstrate the proposed framework provides a creative handwriting beautification methodology to improve visual acceptance.

  16. Municipal Sludge Application in Forests of Northern Michigan: a Case Study.

    Treesearch

    D.G. Brockway; P.V. Nguyen

    1986-01-01

    A large-scale operational demonstration and research project was cooperatively established by the US. Environmental Protection Agency, Michigan Department of Natural Resources, and Michigan State University to evaluate the practice of forest land application as an option for sludge utilization. Project objectives included completing (1) a logistic and economic...

  17. Probabilistic mapping of descriptive health status responses onto health state utilities using Bayesian networks: an empirical analysis converting SF-12 into EQ-5D utility index in a national US sample.

    PubMed

    Le, Quang A; Doctor, Jason N

    2011-05-01

    As quality-adjusted life years have become the standard metric in health economic evaluations, mapping health-profile or disease-specific measures onto preference-based measures to obtain quality-adjusted life years has become a solution when health utilities are not directly available. However, current mapping methods are limited due to their predictive validity, reliability, and/or other methodological issues. We employ probability theory together with a graphical model, called a Bayesian network, to convert health-profile measures into preference-based measures and to compare the results to those estimated with current mapping methods. A sample of 19,678 adults who completed both the 12-item Short Form Health Survey (SF-12v2) and EuroQoL 5D (EQ-5D) questionnaires from the 2003 Medical Expenditure Panel Survey was split into training and validation sets. Bayesian networks were constructed to explore the probabilistic relationships between each EQ-5D domain and 12 items of the SF-12v2. The EQ-5D utility scores were estimated on the basis of the predicted probability of each response level of the 5 EQ-5D domains obtained from the Bayesian inference process using the following methods: Monte Carlo simulation, expected utility, and most-likely probability. Results were then compared with current mapping methods including multinomial logistic regression, ordinary least squares, and censored least absolute deviations. The Bayesian networks consistently outperformed other mapping models in the overall sample (mean absolute error=0.077, mean square error=0.013, and R overall=0.802), in different age groups, number of chronic conditions, and ranges of the EQ-5D index. Bayesian networks provide a new robust and natural approach to map health status responses into health utility measures for health economic evaluations.

  18. eDNAoccupancy: An R package for multi-scale occupancy modeling of environmental DNA data

    USGS Publications Warehouse

    Dorazio, Robert; Erickson, Richard A.

    2017-01-01

    In this article we describe eDNAoccupancy, an R package for fitting Bayesian, multi-scale occupancy models. These models are appropriate for occupancy surveys that include three, nested levels of sampling: primary sample units within a study area, secondary sample units collected from each primary unit, and replicates of each secondary sample unit. This design is commonly used in occupancy surveys of environmental DNA (eDNA). eDNAoccupancy allows users to specify and fit multi-scale occupancy models with or without covariates, to estimate posterior summaries of occurrence and detection probabilities, and to compare different models using Bayesian model-selection criteria. We illustrate these features by analyzing two published data sets: eDNA surveys of a fungal pathogen of amphibians and eDNA surveys of an endangered fish species.

  19. Predictors of Outcome in Traumatic Brain Injury: New Insight Using Receiver Operating Curve Indices and Bayesian Network Analysis.

    PubMed

    Zador, Zsolt; Sperrin, Matthew; King, Andrew T

    2016-01-01

    Traumatic brain injury remains a global health problem. Understanding the relative importance of outcome predictors helps optimize our treatment strategies by informing assessment protocols, clinical decisions and trial designs. In this study we establish importance ranking for outcome predictors based on receiver operating indices to identify key predictors of outcome and create simple predictive models. We then explore the associations between key outcome predictors using Bayesian networks to gain further insight into predictor importance. We analyzed the corticosteroid randomization after significant head injury (CRASH) trial database of 10008 patients and included patients for whom demographics, injury characteristics, computer tomography (CT) findings and Glasgow Outcome Scale (GCS) were recorded (total of 13 predictors, which would be available to clinicians within a few hours following the injury in 6945 patients). Predictions of clinical outcome (death or severe disability at 6 months) were performed using logistic regression models with 5-fold cross validation. Predictive performance was measured using standardized partial area (pAUC) under the receiver operating curve (ROC) and we used Delong test for comparisons. Variable importance ranking was based on pAUC targeted at specificity (pAUCSP) and sensitivity (pAUCSE) intervals of 90-100%. Probabilistic associations were depicted using Bayesian networks. Complete AUC analysis showed very good predictive power (AUC = 0.8237, 95% CI: 0.8138-0.8336) for the complete model. Specificity focused importance ranking highlighted age, pupillary, motor responses, obliteration of basal cisterns/3rd ventricle and midline shift. Interestingly when targeting model sensitivity, the highest-ranking variables were age, severe extracranial injury, verbal response, hematoma on CT and motor response. Simplified models, which included only these key predictors, had similar performance (pAUCSP = 0.6523, 95% CI: 0.6402-0.6641 and pAUCSE = 0.6332, 95% CI: 0.62-0.6477) compared to the complete models (pAUCSP = 0.6664, 95% CI: 0.6543-0.679, pAUCSE = 0.6436, 95% CI: 0.6289-0.6585, de Long p value 0.1165 and 0.3448 respectively). Bayesian networks showed the predictors that did not feature in the simplified models were associated with those that did. We demonstrate that importance based variable selection allows simplified predictive models to be created while maintaining prediction accuracy. Variable selection targeting specificity confirmed key components of clinical assessment in TBI whereas sensitivity based ranking suggested extracranial injury as one of the important predictors. These results help refine our approach to head injury assessment, decision-making and outcome prediction targeted at model sensitivity and specificity. Bayesian networks proved to be a comprehensive tool for depicting probabilistic associations for key predictors giving insight into why the simplified model has maintained accuracy.

  20. A computationally efficient Bayesian sequential simulation approach for the assimilation of vast and diverse hydrogeophysical datasets

    NASA Astrophysics Data System (ADS)

    Nussbaumer, Raphaël; Gloaguen, Erwan; Mariéthoz, Grégoire; Holliger, Klaus

    2016-04-01

    Bayesian sequential simulation (BSS) is a powerful geostatistical technique, which notably has shown significant potential for the assimilation of datasets that are diverse with regard to the spatial resolution and their relationship. However, these types of applications of BSS require a large number of realizations to adequately explore the solution space and to assess the corresponding uncertainties. Moreover, such simulations generally need to be performed on very fine grids in order to adequately exploit the technique's potential for characterizing heterogeneous environments. Correspondingly, the computational cost of BSS algorithms in their classical form is very high, which so far has limited an effective application of this method to large models and/or vast datasets. In this context, it is also important to note that the inherent assumption regarding the independence of the considered datasets is generally regarded as being too strong in the context of sequential simulation. To alleviate these problems, we have revisited the classical implementation of BSS and incorporated two key features to increase the computational efficiency. The first feature is a combined quadrant spiral - superblock search, which targets run-time savings on large grids and adds flexibility with regard to the selection of neighboring points using equal directional sampling and treating hard data and previously simulated points separately. The second feature is a constant path of simulation, which enhances the efficiency for multiple realizations. We have also modified the aggregation operator to be more flexible with regard to the assumption of independence of the considered datasets. This is achieved through log-linear pooling, which essentially allows for attributing weights to the various data components. Finally, a multi-grid simulating path was created to enforce large-scale variance and to allow for adapting parameters, such as, for example, the log-linear weights or the type of simulation path at various scales. The newly implemented search method for kriging reduces the computational cost from an exponential dependence with regard to the grid size in the original algorithm to a linear relationship, as each neighboring search becomes independent from the grid size. For the considered examples, our results show a sevenfold reduction in run time for each additional realization when a constant simulation path is used. The traditional criticism that constant path techniques introduce a bias to the simulations was explored and our findings do indeed reveal a minor reduction in the diversity of the simulations. This bias can, however, be largely eliminated by changing the path type at different scales through the use of the multi-grid approach. Finally, we show that adapting the aggregation weight at each scale considered in our multi-grid approach allows for reproducing both the variogram and histogram, and the spatial trend of the underlying data.

  1. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling

    PubMed Central

    Korsgaard, Inge Riis; Lund, Mogens Sandø; Sorensen, Daniel; Gianola, Daniel; Madsen, Per; Jensen, Just

    2003-01-01

    A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed. PMID:12633531

  2. A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale

    PubMed Central

    Pérez Sánchez, Carlos Javier

    2014-01-01

    Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002

  3. Large-scale monitoring of shorebird populations using count data and N-mixture models: Black Oystercatcher (Haematopus bachmani) surveys by land and sea

    USGS Publications Warehouse

    Lyons, James E.; Andrew, Royle J.; Thomas, Susan M.; Elliott-Smith, Elise; Evenson, Joseph R.; Kelly, Elizabeth G.; Milner, Ruth L.; Nysewander, David R.; Andres, Brad A.

    2012-01-01

    Large-scale monitoring of bird populations is often based on count data collected across spatial scales that may include multiple physiographic regions and habitat types. Monitoring at large spatial scales may require multiple survey platforms (e.g., from boats and land when monitoring coastal species) and multiple survey methods. It becomes especially important to explicitly account for detection probability when analyzing count data that have been collected using multiple survey platforms or methods. We evaluated a new analytical framework, N-mixture models, to estimate actual abundance while accounting for multiple detection biases. During May 2006, we made repeated counts of Black Oystercatchers (Haematopus bachmani) from boats in the Puget Sound area of Washington (n = 55 sites) and from land along the coast of Oregon (n = 56 sites). We used a Bayesian analysis of N-mixture models to (1) assess detection probability as a function of environmental and survey covariates and (2) estimate total Black Oystercatcher abundance during the breeding season in the two regions. Probability of detecting individuals during boat-based surveys was 0.75 (95% credible interval: 0.42–0.91) and was not influenced by tidal stage. Detection probability from surveys conducted on foot was 0.68 (0.39–0.90); the latter was not influenced by fog, wind, or number of observers but was ~35% lower during rain. The estimated population size was 321 birds (262–511) in Washington and 311 (276–382) in Oregon. N-mixture models provide a flexible framework for modeling count data and covariates in large-scale bird monitoring programs designed to understand population change.

  4. Simulating large-scale crop yield by using perturbed-parameter ensemble method

    NASA Astrophysics Data System (ADS)

    Iizumi, T.; Yokozawa, M.; Sakurai, G.; Nishimori, M.

    2010-12-01

    Toshichika Iizumi, Masayuki Yokozawa, Gen Sakurai, Motoki Nishimori Agro-Meteorology Division, National Institute for Agro-Environmental Sciences, Japan Abstract One of concerning issues of food security under changing climate is to predict the inter-annual variation of crop production induced by climate extremes and modulated climate. To secure food supply for growing world population, methodology that can accurately predict crop yield on a large scale is needed. However, for developing a process-based large-scale crop model with a scale of general circulation models (GCMs), 100 km in latitude and longitude, researchers encounter the difficulties in spatial heterogeneity of available information on crop production such as cultivated cultivars and management. This study proposed an ensemble-based simulation method that uses a process-based crop model and systematic parameter perturbation procedure, taking maize in U.S., China, and Brazil as examples. The crop model was developed modifying the fundamental structure of the Soil and Water Assessment Tool (SWAT) to incorporate the effect of heat stress on yield. We called the new model PRYSBI: the Process-based Regional-scale Yield Simulator with Bayesian Inference. The posterior probability density function (PDF) of 17 parameters, which represents the crop- and grid-specific features of the crop and its uncertainty under given data, was estimated by the Bayesian inversion analysis. We then take 1500 ensemble members of simulated yield values based on the parameter sets sampled from the posterior PDF to describe yearly changes of the yield, i.e. perturbed-parameter ensemble method. The ensemble median for 27 years (1980-2006) was compared with the data aggregated from the county yield. On a country scale, the ensemble median of the simulated yield showed a good correspondence with the reported yield: the Pearson’s correlation coefficient is over 0.6 for all countries. In contrast, on a grid scale, the correspondence is still high in most grids regardless of the countries. However, the model showed comparatively low reproducibility in the slope areas, such as around the Rocky Mountains in South Dakota, around the Great Xing'anling Mountains in Heilongjiang, and around the Brazilian Plateau. As there is a wide-ranging local climate conditions in the complex terrain, such as the slope of mountain, the GCM grid-scale weather inputs is likely one of major sources of error. The results of this study highlight the benefits of the perturbed-parameter ensemble method in simulating crop yield on a GCM grid scale: (1) the posterior PDF of parameter could quantify the uncertainty of parameter value of the crop model associated with the local crop production aspects; (2) the method can explicitly account for the uncertainty of parameter value in the crop model simulations; (3) the method achieve a Monte Carlo approximation of probability of sub-grid scale yield, accounting for the nonlinear response of crop yield to weather and management; (4) the method is therefore appropriate to aggregate the simulated sub-grid scale yields to a grid-scale yield and it may be a reason for high performance of the model in capturing inter-annual variation of yield.

  5. Light-sheet Bayesian microscopy enables deep-cell super-resolution imaging of heterochromatin in live human embryonic stem cells.

    PubMed

    Hu, Ying S; Zhu, Quan; Elkins, Keri; Tse, Kevin; Li, Yu; Fitzpatrick, James A J; Verma, Inder M; Cang, Hu

    2013-01-01

    Heterochromatin in the nucleus of human embryonic cells plays an important role in the epigenetic regulation of gene expression. The architecture of heterochromatin and its dynamic organization remain elusive because of the lack of fast and high-resolution deep-cell imaging tools. We enable this task by advancing instrumental and algorithmic implementation of the localization-based super-resolution technique. We present light-sheet Bayesian super-resolution microscopy (LSBM). We adapt light-sheet illumination for super-resolution imaging by using a novel prism-coupled condenser design to illuminate a thin slice of the nucleus with high signal-to-noise ratio. Coupled with a Bayesian algorithm that resolves overlapping fluorophores from high-density areas, we show, for the first time, nanoscopic features of the heterochromatin structure in both fixed and live human embryonic stem cells. The enhanced temporal resolution allows capturing the dynamic change of heterochromatin with a lateral resolution of 50-60 nm on a time scale of 2.3 s. Light-sheet Bayesian microscopy opens up broad new possibilities of probing nanometer-scale nuclear structures and real-time sub-cellular processes and other previously difficult-to-access intracellular regions of living cells at the single-molecule, and single cell level.

  6. Light-sheet Bayesian microscopy enables deep-cell super-resolution imaging of heterochromatin in live human embryonic stem cells

    PubMed Central

    Hu, Ying S; Zhu, Quan; Elkins, Keri; Tse, Kevin; Li, Yu; Fitzpatrick, James A J; Verma, Inder M; Cang, Hu

    2016-01-01

    Background Heterochromatin in the nucleus of human embryonic cells plays an important role in the epigenetic regulation of gene expression. The architecture of heterochromatin and its dynamic organization remain elusive because of the lack of fast and high-resolution deep-cell imaging tools. We enable this task by advancing instrumental and algorithmic implementation of the localization-based super-resolution technique. Results We present light-sheet Bayesian super-resolution microscopy (LSBM). We adapt light-sheet illumination for super-resolution imaging by using a novel prism-coupled condenser design to illuminate a thin slice of the nucleus with high signal-to-noise ratio. Coupled with a Bayesian algorithm that resolves overlapping fluorophores from high-density areas, we show, for the first time, nanoscopic features of the heterochromatin structure in both fixed and live human embryonic stem cells. The enhanced temporal resolution allows capturing the dynamic change of heterochromatin with a lateral resolution of 50–60 nm on a time scale of 2.3 s. Conclusion Light-sheet Bayesian microscopy opens up broad new possibilities of probing nanometer-scale nuclear structures and real-time sub-cellular processes and other previously difficult-to-access intracellular regions of living cells at the single-molecule, and single cell level. PMID:27795878

  7. Estimating age from recapture data: integrating incremental growth measures with ancillary data to infer age-at-length

    USGS Publications Warehouse

    Eaton, Mitchell J.; Link, William A.

    2011-01-01

    Estimating the age of individuals in wild populations can be of fundamental importance for answering ecological questions, modeling population demographics, and managing exploited or threatened species. Significant effort has been devoted to determining age through the use of growth annuli, secondary physical characteristics related to age, and growth models. Many species, however, either do not exhibit physical characteristics useful for independent age validation or are too rare to justify sacrificing a large number of individuals to establish the relationship between size and age. Length-at-age models are well represented in the fisheries and other wildlife management literature. Many of these models overlook variation in growth rates of individuals and consider growth parameters as population parameters. More recent models have taken advantage of hierarchical structuring of parameters and Bayesian inference methods to allow for variation among individuals as functions of environmental covariates or individual-specific random effects. Here, we describe hierarchical models in which growth curves vary as individual-specific stochastic processes, and we show how these models can be fit using capture–recapture data for animals of unknown age along with data for animals of known age. We combine these independent data sources in a Bayesian analysis, distinguishing natural variation (among and within individuals) from measurement error. We illustrate using data for African dwarf crocodiles, comparing von Bertalanffy and logistic growth models. The analysis provides the means of predicting crocodile age, given a single measurement of head length. The von Bertalanffy was much better supported than the logistic growth model and predicted that dwarf crocodiles grow from 19.4 cm total length at birth to 32.9 cm in the first year and 45.3 cm by the end of their second year. Based on the minimum size of females observed with hatchlings, reproductive maturity was estimated to be at nine years. These size benchmarks are believed to represent thresholds for important demographic parameters; improved estimates of age, therefore, will increase the precision of population projection models. The modeling approach that we present can be applied to other species and offers significant advantages when multiple sources of data are available and traditional aging techniques are not practical.

  8. Pretense, Counterfactuals, and Bayesian Causal Models: Why What Is Not Real Really Matters

    ERIC Educational Resources Information Center

    Weisberg, Deena S.; Gopnik, Alison

    2013-01-01

    Young children spend a large portion of their time pretending about non-real situations. Why? We answer this question by using the framework of Bayesian causal models to argue that pretending and counterfactual reasoning engage the same component cognitive abilities: disengaging with current reality, making inferences about an alternative…

  9. Evidence of major genes affecting stress response in rainbow trout using Bayesian methods of complex segregation analysis

    USDA-ARS?s Scientific Manuscript database

    As a first step towards the genetic mapping of quantitative trait loci (QTL) affecting stress response variation in rainbow trout, we performed complex segregation analyses (CSA) fitting mixed inheritance models of plasma cortisol using Bayesian methods in large full-sib families of rainbow trout. ...

  10. Bayesian Models for Astrophysical Data Using R, JAGS, Python, and Stan

    NASA Astrophysics Data System (ADS)

    Hilbe, Joseph M.; de Souza, Rafael S.; Ishida, Emille E. O.

    2017-05-01

    This comprehensive guide to Bayesian methods in astronomy enables hands-on work by supplying complete R, JAGS, Python, and Stan code, to use directly or to adapt. It begins by examining the normal model from both frequentist and Bayesian perspectives and then progresses to a full range of Bayesian generalized linear and mixed or hierarchical models, as well as additional types of models such as ABC and INLA. The book provides code that is largely unavailable elsewhere and includes details on interpreting and evaluating Bayesian models. Initial discussions offer models in synthetic form so that readers can easily adapt them to their own data; later the models are applied to real astronomical data. The consistent focus is on hands-on modeling, analysis of data, and interpretations that address scientific questions. A must-have for astronomers, its concrete approach will also be attractive to researchers in the sciences more generally.

  11. An Intuitive Dashboard for Bayesian Network Inference

    NASA Astrophysics Data System (ADS)

    Reddy, Vikas; Charisse Farr, Anna; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K. D. V.

    2014-03-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++.

  12. When mechanism matters: Bayesian forecasting using models of ecological diffusion

    USGS Publications Warehouse

    Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

    2017-01-01

    Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.

  13. Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bates, Cameron Russell; Mckigney, Edward Allen

    The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.

  14. Selectivity curves of the capture of mangrove crab (Ucides cordatus) on the northern coast of Brazil using bayesian inference.

    PubMed

    Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S

    2016-06-01

    Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.

  15. Comparing centralised and decentralised anaerobic digestion of stillage from a large-scale bioethanol plant to animal feed production.

    PubMed

    Drosg, B; Wirthensohn, T; Konrad, G; Hornbachner, D; Resch, C; Wäger, F; Loderer, C; Waltenberger, R; Kirchmayr, R; Braun, R

    2008-01-01

    A comparison of stillage treatment options for large-scale bioethanol plants was based on the data of an existing plant producing approximately 200,000 t/yr of bioethanol and 1,400,000 t/yr of stillage. Animal feed production--the state-of-the-art technology at the plant--was compared to anaerobic digestion. The latter was simulated in two different scenarios: digestion in small-scale biogas plants in the surrounding area versus digestion in a large-scale biogas plant at the bioethanol production site. Emphasis was placed on a holistic simulation balancing chemical parameters and calculating logistic algorithms to compare the efficiency of the stillage treatment solutions. For central anaerobic digestion different digestate handling solutions were considered because of the large amount of digestate. For land application a minimum of 36,000 ha of available agricultural area would be needed and 600,000 m(3) of storage volume. Secondly membrane purification of the digestate was investigated consisting of decanter, microfiltration, and reverse osmosis. As a third option aerobic wastewater treatment of the digestate was discussed. The final outcome was an economic evaluation of the three mentioned stillage treatment options, as a guide to stillage management for operators of large-scale bioethanol plants. Copyright IWA Publishing 2008.

  16. Efficient Implementation of MrBayes on Multi-GPU

    PubMed Central

    Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-01-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters. PMID:23493260

  17. Efficient implementation of MrBayes on multi-GPU.

    PubMed

    Bao, Jie; Xia, Hongju; Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-06-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)(3) (aMCMCMC) for MrBayes (MC)(3) on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)(3) achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)(3) is dramatically faster than all the previous (MC)(3) algorithms and scales well to large GPU clusters.

  18. Uncertainty aggregation and reduction in structure-material performance prediction

    NASA Astrophysics Data System (ADS)

    Hu, Zhen; Mahadevan, Sankaran; Ao, Dan

    2018-02-01

    An uncertainty aggregation and reduction framework is presented for structure-material performance prediction. Different types of uncertainty sources, structural analysis model, and material performance prediction model are connected through a Bayesian network for systematic uncertainty aggregation analysis. To reduce the uncertainty in the computational structure-material performance prediction model, Bayesian updating using experimental observation data is investigated based on the Bayesian network. It is observed that the Bayesian updating results will have large error if the model cannot accurately represent the actual physics, and that this error will be propagated to the predicted performance distribution. To address this issue, this paper proposes a novel uncertainty reduction method by integrating Bayesian calibration with model validation adaptively. The observation domain of the quantity of interest is first discretized into multiple segments. An adaptive algorithm is then developed to perform model validation and Bayesian updating over these observation segments sequentially. Only information from observation segments where the model prediction is highly reliable is used for Bayesian updating; this is found to increase the effectiveness and efficiency of uncertainty reduction. A composite rotorcraft hub component fatigue life prediction model, which combines a finite element structural analysis model and a material damage model, is used to demonstrate the proposed method.

  19. Comparison between the basic least squares and the Bayesian approach for elastic constants identification

    NASA Astrophysics Data System (ADS)

    Gogu, C.; Haftka, R.; LeRiche, R.; Molimard, J.; Vautrin, A.; Sankar, B.

    2008-11-01

    The basic formulation of the least squares method, based on the L2 norm of the misfit, is still widely used today for identifying elastic material properties from experimental data. An alternative statistical approach is the Bayesian method. We seek here situations with significant difference between the material properties found by the two methods. For a simple three bar truss example we illustrate three such situations in which the Bayesian approach leads to more accurate results: different magnitude of the measurements, different uncertainty in the measurements and correlation among measurements. When all three effects add up, the Bayesian approach can have a large advantage. We then compared the two methods for identification of elastic constants from plate vibration natural frequencies.

  20. Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks

    PubMed Central

    Lähdesmäki, Harri; Hautaniemi, Sampsa; Shmulevich, Ilya; Yli-Harja, Olli

    2006-01-01

    A significant amount of attention has recently been focused on modeling of gene regulatory networks. Two frequently used large-scale modeling frameworks are Bayesian networks (BNs) and Boolean networks, the latter one being a special case of its recent stochastic extension, probabilistic Boolean networks (PBNs). PBN is a promising model class that generalizes the standard rule-based interactions of Boolean networks into the stochastic setting. Dynamic Bayesian networks (DBNs) is a general and versatile model class that is able to represent complex temporal stochastic processes and has also been proposed as a model for gene regulatory systems. In this paper, we concentrate on these two model classes and demonstrate that PBNs and a certain subclass of DBNs can represent the same joint probability distribution over their common variables. The major benefit of introducing the relationships between the models is that it opens up the possibility of applying the standard tools of DBNs to PBNs and vice versa. Hence, the standard learning tools of DBNs can be applied in the context of PBNs, and the inference methods give a natural way of handling the missing values in PBNs which are often present in gene expression measurements. Conversely, the tools for controlling the stationary behavior of the networks, tools for projecting networks onto sub-networks, and efficient learning schemes can be used for DBNs. In other words, the introduced relationships between the models extend the collection of analysis tools for both model classes. PMID:17415411

  1. A flexible Bayesian hierarchical model of preterm birth risk among US Hispanic subgroups in relation to maternal nativity and education

    PubMed Central

    2011-01-01

    Background Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Methods Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. Results The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Conclusions Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself. PMID:21504612

  2. A flexible Bayesian hierarchical model of preterm birth risk among US Hispanic subgroups in relation to maternal nativity and education.

    PubMed

    Kaufman, Jay S; MacLehose, Richard F; Torrone, Elizabeth A; Savitz, David A

    2011-04-19

    Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself.

  3. Constructing Model of Relationship among Behaviors and Injuries to Products Based on Large Scale Text Data on Injuries

    NASA Astrophysics Data System (ADS)

    Nomori, Koji; Kitamura, Koji; Motomura, Yoichi; Nishida, Yoshifumi; Yamanaka, Tatsuhiro; Komatsubara, Akinori

    In Japan, childhood injury prevention is urgent issue. Safety measures through creating knowledge of injury data are essential for preventing childhood injuries. Especially the injury prevention approach by product modification is very important. The risk assessment is one of the most fundamental methods to design safety products. The conventional risk assessment has been carried out subjectively because product makers have poor data on injuries. This paper deals with evidence-based risk assessment, in which artificial intelligence technologies are strongly needed. This paper describes a new method of foreseeing usage of products, which is the first step of the evidence-based risk assessment, and presents a retrieval system of injury data. The system enables a product designer to foresee how children use a product and which types of injuries occur due to the product in daily environment. The developed system consists of large scale injury data, text mining technology and probabilistic modeling technology. Large scale text data on childhood injuries was collected from medical institutions by an injury surveillance system. Types of behaviors to a product were derived from the injury text data using text mining technology. The relationship among products, types of behaviors, types of injuries and characteristics of children was modeled by Bayesian Network. The fundamental functions of the developed system and examples of new findings obtained by the system are reported in this paper.

  4. Bayesian just-so stories in psychology and neuroscience.

    PubMed

    Bowers, Jeffrey S; Davis, Colin J

    2012-05-01

    According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak. This weakness relates to the many arbitrary ways that priors, likelihoods, and utility functions can be altered in order to account for the data that are obtained, making the models unfalsifiable. It further relates to the fact that Bayesian theories are rarely better at predicting data compared with alternative (and simpler) non-Bayesian theories. Second, we show that the empirical evidence for Bayesian theories in neuroscience is weaker still. There are impressive mathematical analyses showing how populations of neurons could compute in a Bayesian manner but little or no evidence that they do. Third, we challenge the general scientific approach that characterizes Bayesian theorizing in cognitive science. A common premise is that theories in psychology should largely be constrained by a rational analysis of what the mind ought to do. We question this claim and argue that many of the important constraints come from biological, evolutionary, and processing (algorithmic) considerations that have no adaptive relevance to the problem per se. In our view, these factors have contributed to the development of many Bayesian "just so" stories in psychology and neuroscience; that is, mathematical analyses of cognition that can be used to explain almost any behavior as optimal. 2012 APA, all rights reserved.

  5. Comparing spatially varying coefficient models: a case study examining violent crime rates and their relationships to alcohol outlets and illegal drug arrests

    NASA Astrophysics Data System (ADS)

    Wheeler, David C.; Waller, Lance A.

    2009-03-01

    In this paper, we compare and contrast a Bayesian spatially varying coefficient process (SVCP) model with a geographically weighted regression (GWR) model for the estimation of the potentially spatially varying regression effects of alcohol outlets and illegal drug activity on violent crime in Houston, Texas. In addition, we focus on the inherent coefficient shrinkage properties of the Bayesian SVCP model as a way to address increased coefficient variance that follows from collinearity in GWR models. We outline the advantages of the Bayesian model in terms of reducing inflated coefficient variance, enhanced model flexibility, and more formal measuring of model uncertainty for prediction. We find spatially varying effects for alcohol outlets and drug violations, but the amount of variation depends on the type of model used. For the Bayesian model, this variation is controllable through the amount of prior influence placed on the variance of the coefficients. For example, the spatial pattern of coefficients is similar for the GWR and Bayesian models when a relatively large prior variance is used in the Bayesian model.

  6. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization

    ERIC Educational Resources Information Center

    Gelman, Andrew; Lee, Daniel; Guo, Jiqiang

    2015-01-01

    Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…

  7. Surface shape analysis with an application to brain surface asymmetry in schizophrenia.

    PubMed

    Brignell, Christopher J; Dryden, Ian L; Gattone, S Antonio; Park, Bert; Leask, Stuart; Browne, William J; Flynn, Sean

    2010-10-01

    Some methods for the statistical analysis of surface shapes and asymmetry are introduced. We focus on a case study where magnetic resonance images of the brain are available from groups of 30 schizophrenia patients and 38 controls, and we investigate large-scale brain surface shape differences. Key aspects of shape analysis are to remove nuisance transformations by registration and to identify which parts of one object correspond with the parts of another object. We introduce maximum likelihood and Bayesian methods for registering brain images and providing large-scale correspondences of the brain surfaces. Brain surface size-and-shape analysis is considered using random field theory, and also dimension reduction is carried out using principal and independent components analysis. Some small but significant differences are observed between the the patient and control groups. We then investigate a particular type of asymmetry called torque. Differences in asymmetry are observed between the control and patient groups, which add strength to other observations in the literature. Further investigations of the midline plane location in the 2 groups and the fitting of nonplanar curved midlines are also considered.

  8. Using Bayesian neural networks to classify forest scenes

    NASA Astrophysics Data System (ADS)

    Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

    1998-10-01

    We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.

  9. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    PubMed

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  10. Large scale landslide susceptibility assessment using the statistical methods of logistic regression and BSA - study case: the sub-basin of the small Niraj (Transylvania Depression, Romania)

    NASA Astrophysics Data System (ADS)

    Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.

    2015-11-01

    The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).

  11. NASA Space Flight Vehicle Fault Isolation Challenges

    NASA Technical Reports Server (NTRS)

    Bramon, Christopher; Inman, Sharon K.; Neeley, James R.; Jones, James V.; Tuttle, Loraine

    2016-01-01

    The Space Launch System (SLS) is the new NASA heavy lift launch vehicle and is scheduled for its first mission in 2017. The goal of the first mission, which will be uncrewed, is to demonstrate the integrated system performance of the SLS rocket and spacecraft before a crewed flight in 2021. SLS has many of the same logistics challenges as any other large scale program. Common logistics concerns for SLS include integration of discrete programs geographically separated, multiple prime contractors with distinct and different goals, schedule pressures and funding constraints. However, SLS also faces unique challenges. The new program is a confluence of new hardware and heritage, with heritage hardware constituting seventy-five percent of the program. This unique approach to design makes logistics concerns such as testability of the integrated flight vehicle especially problematic. The cost of fully automated diagnostics can be completely justified for a large fleet, but not so for a single flight vehicle. Fault detection is mandatory to assure the vehicle is capable of a safe launch, but fault isolation is another issue. SLS has considered various methods for fault isolation which can provide a reasonable balance between adequacy, timeliness and cost. This paper will address the analyses and decisions the NASA Logistics engineers are making to mitigate risk while providing a reasonable testability solution for fault isolation.

  12. Hierarchical Bayesian modeling of spatio-temporal patterns of lung cancer incidence risk in Georgia, USA: 2000-2007

    NASA Astrophysics Data System (ADS)

    Yin, Ping; Mu, Lan; Madden, Marguerite; Vena, John E.

    2014-10-01

    Lung cancer is the second most commonly diagnosed cancer in both men and women in Georgia, USA. However, the spatio-temporal patterns of lung cancer risk in Georgia have not been fully studied. Hierarchical Bayesian models are used here to explore the spatio-temporal patterns of lung cancer incidence risk by race and gender in Georgia for the period of 2000-2007. With the census tract level as the spatial scale and the 2-year period aggregation as the temporal scale, we compare a total of seven Bayesian spatio-temporal models including two under a separate modeling framework and five under a joint modeling framework. One joint model outperforms others based on the deviance information criterion. Results show that the northwest region of Georgia has consistently high lung cancer incidence risk for all population groups during the study period. In addition, there are inverse relationships between the socioeconomic status and the lung cancer incidence risk among all Georgian population groups, and the relationships in males are stronger than those in females. By mapping more reliable variations in lung cancer incidence risk at a relatively fine spatio-temporal scale for different Georgian population groups, our study aims to better support healthcare performance assessment, etiological hypothesis generation, and health policy making.

  13. Vehicle Scheduling Schemes for Commercial and Emergency Logistics Integration

    PubMed Central

    Li, Xiaohui; Tan, Qingmei

    2013-01-01

    In modern logistics operations, large-scale logistics companies, besides active participation in profit-seeking commercial business, also play an essential role during an emergency relief process by dispatching urgently-required materials to disaster-affected areas. Therefore, an issue has been widely addressed by logistics practitioners and caught researchers' more attention as to how the logistics companies achieve maximum commercial profit on condition that emergency tasks are effectively and performed satisfactorily. In this paper, two vehicle scheduling models are proposed to solve the problem. One is a prediction-related scheme, which predicts the amounts of disaster-relief materials and commercial business and then accepts the business that will generate maximum profits; the other is a priority-directed scheme, which, firstly groups commercial and emergency business according to priority grades and then schedules both types of business jointly and simultaneously by arriving at the maximum priority in total. Moreover, computer-based simulations are carried out to evaluate the performance of these two models by comparing them with two traditional disaster-relief tactics in China. The results testify the feasibility and effectiveness of the proposed models. PMID:24391724

  14. Vehicle scheduling schemes for commercial and emergency logistics integration.

    PubMed

    Li, Xiaohui; Tan, Qingmei

    2013-01-01

    In modern logistics operations, large-scale logistics companies, besides active participation in profit-seeking commercial business, also play an essential role during an emergency relief process by dispatching urgently-required materials to disaster-affected areas. Therefore, an issue has been widely addressed by logistics practitioners and caught researchers' more attention as to how the logistics companies achieve maximum commercial profit on condition that emergency tasks are effectively and performed satisfactorily. In this paper, two vehicle scheduling models are proposed to solve the problem. One is a prediction-related scheme, which predicts the amounts of disaster-relief materials and commercial business and then accepts the business that will generate maximum profits; the other is a priority-directed scheme, which, firstly groups commercial and emergency business according to priority grades and then schedules both types of business jointly and simultaneously by arriving at the maximum priority in total. Moreover, computer-based simulations are carried out to evaluate the performance of these two models by comparing them with two traditional disaster-relief tactics in China. The results testify the feasibility and effectiveness of the proposed models.

  15. Effect of extreme data loss on heart rate signals quantified by entropy analysis

    NASA Astrophysics Data System (ADS)

    Li, Yu; Wang, Jun; Li, Jin; Liu, Dazhao

    2015-02-01

    The phenomenon of data loss always occurs in the analysis of large databases. Maintaining the stability of analysis results in the event of data loss is very important. In this paper, we used a segmentation approach to generate a synthetic signal that is randomly wiped from data according to the Gaussian distribution and the exponential distribution of the original signal. Then, the logistic map is used as verification. Finally, two methods of measuring entropy-base-scale entropy and approximate entropy-are comparatively analyzed. Our results show the following: (1) Two key parameters-the percentage and the average length of removed data segments-can change the sequence complexity according to logistic map testing. (2) The calculation results have preferable stability for base-scale entropy analysis, which is not sensitive to data loss. (3) The loss percentage of HRV signals should be controlled below the range (p = 30 %), which can provide useful information in clinical applications.

  16. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  17. Sequential Designs Based on Bayesian Uncertainty Quantification in Sparse Representation Surrogate Modeling

    DOE PAGES

    Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.

    2017-04-12

    A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less

  18. Revisiting the 2004 Sumatra-Andaman earthquake in a Bayesian framework

    NASA Astrophysics Data System (ADS)

    Bletery, Q.; Sladen, A.; Jiang, J.; Simons, M.

    2015-12-01

    The 2004 Mw 9.25 Sumatra-Andaman earthquake is the largest seismic event of the modern instrumental era. Despite considerable effort to analyze the characteristics of its rupture, the different available observations have proven difficult to simultaneously integrate jointly into a finite-fault slip model. In particular, the critical near-field geodetic records contain variable and significant post-seismic signal (between 2 weeks and 2 months) while the satellite altimetry records of the associated tsunami are affected by various sources of uncertainties (e.g. source rupture velocity, meso-scale oceanic currents). In this study, we investigate the quasi-static slip distribution of the Sumatra-Andaman earthquake by carefully accounting for the different sources of uncertainties in the joint inversion of an extended set of geodetic and tsunami data. To do so, we use non-diagonal covariance matrices reflecting both data and model uncertainties in a fully Bayesian inversion framework. As model errors are particularly large for mega-earthquakes, we also rely on advanced simulation codes (normal mode theory on a layered spherical Earth for the static displacement field and non-hydrostatic equations for the tsunami) and account for the 3D curvature of the megathrust interface to reduce the associated epistemic uncertainties. The fully Bayesian inversion framework then enables us to derive the families of possible models compatible with the unevenly distributed and sometimes ambiguous measurements. We find two regions of high slip at latitudes 3°-4°N and 7°-8°N with amplitudes that probably reached values as large as 40 m and possibly larger. Such amounts of slip were not proposed by previous studies, which might have been biased by smoothing regularizations. We also find significant slip (around 20 m) offshore Andaman islands absent in earlier studies. Furthermore, we find that the rupture very likely involved shallow slip, with the possibility of reaching the trench.

  19. Low coverage of central point vaccination against dog rabies in Bamako, Mali.

    PubMed

    Muthiani, Yvonne; Traoré, Abdallah; Mauti, Stephanie; Zinsstag, Jakob; Hattendorf, Jan

    2015-06-15

    Canine rabies remains an important public-health problem in Africa. Dog mass vaccination is the recommended method for rabies control and elimination. We report on the first small-scale mass dog vaccination campaign trial in Bamako, Mali. Our objective was to estimate coverage of the vaccination campaign and to quantify determinants of intervention effectiveness. In September 2013, a central point vaccination campaign--free of cost for dog owners--was carried out in 17 posts on three consecutive days within Bamako's Commune 1. Vaccination coverage and the proportion of ownerless dogs were estimated by combining mark-recapture household and transect surveys using Bayesian modeling. The estimated vaccination coverage was 17.6% (95% Credibility Interval, CI: 14.4-22.1%) which is far below the World Health Organization (WHO) recommended vaccination coverage of 70%. The Bayesian estimate for the owned dog population of Commune 1 was 3459 dogs (95% CI: 2786-4131) and the proportion of ownerless dogs was about 8%. The low coverage observed is primarily attributed to low participation by dog owners. Dog owners reported several reasons for not bringing their dogs to the vaccination posts. The most frequently reported reasons for non-attendance were lack of information (25%) and the inability to handle the dog (16%). For 37% of respondents, no clear reason was given for non-vaccination. Despite low coverage, the vaccination campaign in Bamako was relatively easy to implement, both in terms of logistics and organization. Almost half of the participating dog owners brought their pets on the first day of the campaign. Participatory stakeholder processes involving communities and local authorities are needed to identify effective communication channels and locally adapted vaccination strategies, which could include both central-point and door-to-door vaccination. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Making Supply Chains Resilient to Floods Using a Bayesian Network

    NASA Astrophysics Data System (ADS)

    Haraguchi, M.

    2015-12-01

    Natural hazards distress the global economy by disrupting the interconnected supply chain networks. Manufacturing companies have created cost-efficient supply chains by reducing inventories, streamlining logistics and limiting the number of suppliers. As a result, today's supply chains are profoundly susceptible to systemic risks. In Thailand, for example, the GDP growth rate declined by 76 % in 2011 due to prolonged flooding. Thailand incurred economic damage including the loss of USD 46.5 billion, approximately 70% of which was caused by major supply chain disruptions in the manufacturing sector. Similar problems occurred after the Great East Japan Earthquake and Tsunami in 2011, the Mississippi River floods and droughts during 2011 - 2013, and Hurricane Sandy in 2012. This study proposes a methodology for modeling supply chain disruptions using a Bayesian network analysis (BNA) to estimate expected values of countermeasures of floods, such as inventory management, supplier management and hard infrastructure management. We first performed a spatio-temporal correlation analysis between floods and extreme precipitation data for the last 100 years at a global scale. Then we used a BNA to create synthetic networks that include variables associated with the magnitude and duration of floods, major components of supply chains and market demands. We also included decision variables of countermeasures that would mitigate potential losses caused by supply chain disruptions. Finally, we conducted a cost-benefit analysis by estimating the expected values of these potential countermeasures while conducting a sensitivity analysis. The methodology was applied to supply chain disruptions caused by the 2011 Thailand floods. Our study demonstrates desirable typical data requirements for the analysis, such as anonymized supplier network data (i.e. critical dependencies, vulnerability information of suppliers) and sourcing data(i.e. locations of suppliers, and production rates and volume), and data from previous experiences (i.e. companies' risk mitigation strategy decisions).

  1. Bayesian Inference for the Stereotype Regression Model: Application to a Case-control Study of Prostate Cancer

    PubMed Central

    Ahn, Jaeil; Mukherjee, Bhramar; Banerjee, Mousumi; Cooney, Kathleen A.

    2011-01-01

    Summary The stereotype regression model for categorical outcomes, proposed by Anderson (1984) is nested between the baseline category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline-category (or multinomial logistic) model due to a product representation of the log odds-ratios in terms of a common parameter corresponding to each predictor and category specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multidimensional in nature. As pointed out by Greenland (1994), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome-stratified sampling as in case-control studies. In addition, for matched case-control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood based testing approaches due to non-linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men’s Health Study, a case-control study of prostate cancer in African-American men aged 40 to 79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastatsis (TNM) as the categorical response of interest. PMID:19731262

  2. Metapopulation Tracking Juvenile Penguins Reveals an Ecosystem-wide Ecological Trap.

    PubMed

    Sherley, Richard B; Ludynia, Katrin; Dyer, Bruce M; Lamont, Tarron; Makhado, Azwianewi B; Roux, Jean-Paul; Scales, Kylie L; Underhill, Les G; Votier, Stephen C

    2017-02-20

    Climate change and fisheries are transforming the oceans, but we lack a complete understanding of their ecological impact [1-3]. Environmental degradation can cause maladaptive habitat selection, inducing ecological traps with profound consequences for biodiversity [4-6]. However, whether ecological traps operate in marine systems is unclear [7]. Large marine vertebrates may be vulnerable to ecological traps [6], but their broad-scale movements and complex life histories obscure the population-level consequences of habitat selection [8, 9]. We satellite tracked postnatal dispersal in African penguins (Spheniscus demersus) from eight sites across their breeding range to test whether they have become ecologically trapped in the degraded Benguela ecosystem. Bayesian state-space and habitat models show that penguins traversed thousands of square kilometers to areas of low sea surface temperatures (14.5°C-17.5°C) and high chlorophyll-a (∼11 mg m -3 ). These were once reliable cues for prey-rich waters, but climate change and industrial fishing have depleted forage fish stocks in this system [10, 11]. Juvenile penguin survival is low in populations selecting degraded areas, and Bayesian projection models suggest that breeding numbers are ∼50% lower than if non-impacted habitats were used, revealing the extent and effect of a marine ecological trap for the first time. These cascading impacts of localized forage fish depletion-unobserved in studies on adults-were only elucidated via broad-scale movement and demographic data on juveniles. Our results support suspending fishing when prey biomass drops below critical thresholds [12, 13] and suggest that mitigation of marine ecological traps will require matching conservation action to the scale of ecological processes [14]. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. NASA Space Rocket Logistics Challenges

    NASA Technical Reports Server (NTRS)

    Neeley, James R.; Jones, James V.; Watson, Michael D.; Bramon, Christopher J.; Inman, Sharon K.; Tuttle, Loraine

    2014-01-01

    The Space Launch System (SLS) is the new NASA heavy lift launch vehicle and is scheduled for its first mission in 2017. The goal of the first mission, which will be uncrewed, is to demonstrate the integrated system performance of the SLS rocket and spacecraft before a crewed flight in 2021. SLS has many of the same logistics challenges as any other large scale program. Common logistics concerns for SLS include integration of discreet programs geographically separated, multiple prime contractors with distinct and different goals, schedule pressures and funding constraints. However, SLS also faces unique challenges. The new program is a confluence of new hardware and heritage, with heritage hardware constituting seventy-five percent of the program. This unique approach to design makes logistics concerns such as commonality especially problematic. Additionally, a very low manifest rate of one flight every four years makes logistics comparatively expensive. That, along with the SLS architecture being developed using a block upgrade evolutionary approach, exacerbates long-range planning for supportability considerations. These common and unique logistics challenges must be clearly identified and tackled to allow SLS to have a successful program. This paper will address the common and unique challenges facing the SLS programs, along with the analysis and decisions the NASA Logistics engineers are making to mitigate the threats posed by each.

  4. 2D VARIABLY SATURATED FLOWS: PHYSICAL SCALING AND BAYESIAN ESTIMATION

    EPA Science Inventory

    A novel dimensionless formulation for water flow in two-dimensional variably saturated media is presented. It shows that scaling physical systems requires conservation of the ratio between capillary forces and gravity forces. A direct result of this finding is that for two phys...

  5. In situ genetic differentiation in a Hispaniolan lizard (Ameiva chrysolaema): a multilocus perspective.

    PubMed

    Gifford, Matthew E; Larson, Allan

    2008-10-01

    A previous phylogeographic study of mitochondrial haplotypes for the Hispaniolan lizard Ameiva chrysolaema revealed deep genetic structure associated with seawater inundation during the late Pliocene/early Pleistocene and evidence of subsequent population expansion into formerly inundated areas. We revisit hypotheses generated by our previous study using increased geographic sampling of populations and analysis of three nuclear markers (alpha-enolase intron 8, alpha-cardiac-actin intron 4, and beta-actin intron 3) in addition to mitochondrial haplotypes (ND2). Large genetic discontinuities correspond spatially and temporally with historical barriers to gene flow (sea inundations). NCPA cross-validation analysis and Bayesian multilocus analyses of divergence times (IMa and MCMCcoal) reveal two separate episodes of fragmentation associated with Pliocene and Pleistocene sea inundations, separating the species into historically separate Northern, East-Central, West-Central, and Southern population lineages. Multilocus Bayesian analysis using IMa indicates asymmetrical migration from the East-Central to the West-Central populations following secondary contact, consistent with expectations from the more pervasive sea inundation in the western region. The West-Central lineage has a genetic signature of population growth consistent with the expectation of geographic expansion into formerly inundated areas. Within each lineage, significant spatial genetic structure indicates isolation by distance at comparable temporal scales. This study adds to the growing body of evidence that vicariant speciation may be the prevailing source of lineage accumulation on oceanic islands. Thus, prior theories of island biogeography generally underestimate the role and temporal scale of intra-island vicariant processes.

  6. Bayesian flood forecasting methods: A review

    NASA Astrophysics Data System (ADS)

    Han, Shasha; Coulibaly, Paulin

    2017-08-01

    Over the past few decades, floods have been seen as one of the most common and largely distributed natural disasters in the world. If floods could be accurately forecasted in advance, then their negative impacts could be greatly minimized. It is widely recognized that quantification and reduction of uncertainty associated with the hydrologic forecast is of great importance for flood estimation and rational decision making. Bayesian forecasting system (BFS) offers an ideal theoretic framework for uncertainty quantification that can be developed for probabilistic flood forecasting via any deterministic hydrologic model. It provides suitable theoretical structure, empirically validated models and reasonable analytic-numerical computation method, and can be developed into various Bayesian forecasting approaches. This paper presents a comprehensive review on Bayesian forecasting approaches applied in flood forecasting from 1999 till now. The review starts with an overview of fundamentals of BFS and recent advances in BFS, followed with BFS application in river stage forecasting and real-time flood forecasting, then move to a critical analysis by evaluating advantages and limitations of Bayesian forecasting methods and other predictive uncertainty assessment approaches in flood forecasting, and finally discusses the future research direction in Bayesian flood forecasting. Results show that the Bayesian flood forecasting approach is an effective and advanced way for flood estimation, it considers all sources of uncertainties and produces a predictive distribution of the river stage, river discharge or runoff, thus gives more accurate and reliable flood forecasts. Some emerging Bayesian forecasting methods (e.g. ensemble Bayesian forecasting system, Bayesian multi-model combination) were shown to overcome limitations of single model or fixed model weight and effectively reduce predictive uncertainty. In recent years, various Bayesian flood forecasting approaches have been developed and widely applied, but there is still room for improvements. Future research in the context of Bayesian flood forecasting should be on assimilation of various sources of newly available information and improvement of predictive performance assessment methods.

  7. Dimensionality of the 9-item Utrecht Work Engagement Scale revisited: A Bayesian structural equation modeling approach.

    PubMed

    Fong, Ted C T; Ho, Rainbow T H

    2015-01-01

    The aim of this study was to reexamine the dimensionality of the widely used 9-item Utrecht Work Engagement Scale using the maximum likelihood (ML) approach and Bayesian structural equation modeling (BSEM) approach. Three measurement models (1-factor, 3-factor, and bi-factor models) were evaluated in two split samples of 1,112 health-care workers using confirmatory factor analysis and BSEM, which specified small-variance informative priors for cross-loadings and residual covariances. Model fit and comparisons were evaluated by posterior predictive p-value (PPP), deviance information criterion, and Bayesian information criterion (BIC). None of the three ML-based models showed an adequate fit to the data. The use of informative priors for cross-loadings did not improve the PPP for the models. The 1-factor BSEM model with approximately zero residual covariances displayed a good fit (PPP>0.10) to both samples and a substantially lower BIC than its 3-factor and bi-factor counterparts. The BSEM results demonstrate empirical support for the 1-factor model as a parsimonious and reasonable representation of work engagement.

  8. Bayesian Techniques for Comparing Time-dependent GRMHD Simulations to Variable Event Horizon Telescope Observations

    NASA Astrophysics Data System (ADS)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.

  9. BAYESIAN TECHNIQUES FOR COMPARING TIME-DEPENDENT GRMHD SIMULATIONS TO VARIABLE EVENT HORIZON TELESCOPE OBSERVATIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan

    2016-12-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less

  10. High-resolution behavioral mapping of electric fishes in Amazonian habitats.

    PubMed

    Madhav, Manu S; Jayakumar, Ravikrishnan P; Demir, Alican; Stamper, Sarah A; Fortune, Eric S; Cowan, Noah J

    2018-04-11

    The study of animal behavior has been revolutionized by sophisticated methodologies that identify and track individuals in video recordings. Video recording of behavior, however, is challenging for many species and habitats including fishes that live in turbid water. Here we present a methodology for identifying and localizing weakly electric fishes on the centimeter scale with subsecond temporal resolution based solely on the electric signals generated by each individual. These signals are recorded with a grid of electrodes and analyzed using a two-part algorithm that identifies the signals from each individual fish and then estimates the position and orientation of each fish using Bayesian inference. Interestingly, because this system involves eavesdropping on electrocommunication signals, it permits monitoring of complex social and physical interactions in the wild. This approach has potential for large-scale non-invasive monitoring of aquatic habitats in the Amazon basin and other tropical freshwater systems.

  11. Large-Scale Simulations of Plastic Neural Networks on Neuromorphic Hardware

    PubMed Central

    Knight, James C.; Tully, Philip J.; Kaplan, Bernhard A.; Lansner, Anders; Furber, Steve B.

    2016-01-01

    SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 2.0 × 104 neurons and 5.1 × 107 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately 45× more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models. PMID:27092061

  12. Planck data versus large scale structure: Methods to quantify discordance

    NASA Astrophysics Data System (ADS)

    Charnock, Tom; Battye, Richard A.; Moss, Adam

    2017-06-01

    Discordance in the Λ cold dark matter cosmological model can be seen by comparing parameters constrained by cosmic microwave background (CMB) measurements to those inferred by probes of large scale structure. Recent improvements in observations, including final data releases from both Planck and SDSS-III BOSS, as well as improved astrophysical uncertainty analysis of CFHTLenS, allows for an update in the quantification of any tension between large and small scales. This paper is intended, primarily, as a discussion on the quantifications of discordance when comparing the parameter constraints of a model when given two different data sets. We consider Kullback-Leibler divergence, comparison of Bayesian evidences and other statistics which are sensitive to the mean, variance and shape of the distributions. However, as a byproduct, we present an update to the similar analysis in [R. A. Battye, T. Charnock, and A. Moss, Phys. Rev. D 91, 103508 (2015), 10.1103/PhysRevD.91.103508], where we find that, considering new data and treatment of priors, the constraints from the CMB and from a combination of large scale structure (LSS) probes are in greater agreement and any tension only persists to a minor degree. In particular, we find the parameter constraints from the combination of LSS probes which are most discrepant with the Planck 2015 +Pol +BAO parameter distributions can be quantified at a ˜2.55 σ tension using the method introduced in [R. A. Battye, T. Charnock, and A. Moss, Phys. Rev. D 91, 103508 (2015), 10.1103/PhysRevD.91.103508]. If instead we use the distributions constrained by the combination of LSS probes which are in greatest agreement with those from Planck 2015 +Pol +BAO this tension is only 0.76 σ .

  13. Genetic basis of climatic adaptation in scots pine by bayesian quantitative trait locus analysis.

    PubMed Central

    Hurme, P; Sillanpää, M J; Arjas, E; Repo, T; Savolainen, O

    2000-01-01

    We examined the genetic basis of large adaptive differences in timing of bud set and frost hardiness between natural populations of Scots pine. As a mapping population, we considered an "open-pollinated backcross" progeny by collecting seeds of a single F(1) tree (cross between trees from southern and northern Finland) growing in southern Finland. Due to the special features of the design (no marker information available on grandparents or the father), we applied a Bayesian quantitative trait locus (QTL) mapping method developed previously for outcrossed offspring. We found four potential QTL for timing of bud set and seven for frost hardiness. Bayesian analyses detected more QTL than ANOVA for frost hardiness, but the opposite was true for bud set. These QTL included alleles with rather large effects, and additionally smaller QTL were supported. The largest QTL for bud set date accounted for about a fourth of the mean difference between populations. Thus, natural selection during adaptation has resulted in selection of at least some alleles of rather large effect. PMID:11063704

  14. Improved inference in Bayesian segmentation using Monte Carlo sampling: application to hippocampal subfield volumetry.

    PubMed

    Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Van Leemput, Koen

    2013-10-01

    Many segmentation algorithms in medical image analysis use Bayesian modeling to augment local image appearance with prior anatomical knowledge. Such methods often contain a large number of free parameters that are first estimated and then kept fixed during the actual segmentation process. However, a faithful Bayesian analysis would marginalize over such parameters, accounting for their uncertainty by considering all possible values they may take. Here we propose to incorporate this uncertainty into Bayesian segmentation methods in order to improve the inference process. In particular, we approximate the required marginalization over model parameters using computationally efficient Markov chain Monte Carlo techniques. We illustrate the proposed approach using a recently developed Bayesian method for the segmentation of hippocampal subfields in brain MRI scans, showing a significant improvement in an Alzheimer's disease classification task. As an additional benefit, the technique also allows one to compute informative "error bars" on the volume estimates of individual structures. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Improved Inference in Bayesian Segmentation Using Monte Carlo Sampling: Application to Hippocampal Subfield Volumetry

    PubMed Central

    Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Leemput, Koen Van

    2013-01-01

    Many segmentation algorithms in medical image analysis use Bayesian modeling to augment local image appearance with prior anatomical knowledge. Such methods often contain a large number of free parameters that are first estimated and then kept fixed during the actual segmentation process. However, a faithful Bayesian analysis would marginalize over such parameters, accounting for their uncertainty by considering all possible values they may take. Here we propose to incorporate this uncertainty into Bayesian segmentation methods in order to improve the inference process. In particular, we approximate the required marginalization over model parameters using computationally efficient Markov chain Monte Carlo techniques. We illustrate the proposed approach using a recently developed Bayesian method for the segmentation of hippocampal subfields in brain MRI scans, showing a significant improvement in an Alzheimer’s disease classification task. As an additional benefit, the technique also allows one to compute informative “error bars” on the volume estimates of individual structures. PMID:23773521

  16. Bayesian inference of a historical bottleneck in a heavily exploited marine mammal.

    PubMed

    Hoffman, J I; Grant, S M; Forcada, J; Phillips, C D

    2011-10-01

    Emerging Bayesian analytical approaches offer increasingly sophisticated means of reconstructing historical population dynamics from genetic data, but have been little applied to scenarios involving demographic bottlenecks. Consequently, we analysed a large mitochondrial and microsatellite dataset from the Antarctic fur seal Arctocephalus gazella, a species subjected to one of the most extreme examples of uncontrolled exploitation in history when it was reduced to the brink of extinction by the sealing industry during the late eighteenth and nineteenth centuries. Classical bottleneck tests, which exploit the fact that rare alleles are rapidly lost during demographic reduction, yielded ambiguous results. In contrast, a strong signal of recent demographic decline was detected using both Bayesian skyline plots and Approximate Bayesian Computation, the latter also allowing derivation of posterior parameter estimates that were remarkably consistent with historical observations. This was achieved using only contemporary samples, further emphasizing the potential of Bayesian approaches to address important problems in conservation and evolutionary biology. © 2011 Blackwell Publishing Ltd.

  17. Partially Observed Mixtures of IRT Models: An Extension of the Generalized Partial-Credit Model

    ERIC Educational Resources Information Center

    Von Davier, Matthias; Yamamoto, Kentaro

    2004-01-01

    The generalized partial-credit model (GPCM) is used frequently in educational testing and in large-scale assessments for analyzing polytomous data. Special cases of the generalized partial-credit model are the partial-credit model--or Rasch model for ordinal data--and the two parameter logistic (2PL) model. This article extends the GPCM to the…

  18. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  19. An approach for siting poplar energy production systems to increase productivity and associated ecosystem services

    Treesearch

    Ronald S. Zalesny; Deahn M. Donner; David R. Coyle; William L. Headlee

    2012-01-01

    Short rotation woody crops such as Populus spp. and their hybrids (i.e., poplars) are a significant component of the total biofuels and bioenergy feedstock resource in the USA. Production of these dedicated energy crops may result in large-scale land conversion, which leads to questions about their economic, logistic, and ecologic feasibility. To...

  20. A guide to Bayesian model selection for ecologists

    USGS Publications Warehouse

    Hooten, Mevin B.; Hobbs, N.T.

    2015-01-01

    The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.

  1. Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition.

    PubMed

    Jones, Matt; Love, Bradley C

    2011-08-01

    The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls that have plagued previous theoretical movements.

  2. Tmax Determined Using a Bayesian Estimation Deconvolution Algorithm Applied to Bolus Tracking Perfusion Imaging: A Digital Phantom Validation Study.

    PubMed

    Uwano, Ikuko; Sasaki, Makoto; Kudo, Kohsuke; Boutelier, Timothé; Kameda, Hiroyuki; Mori, Futoshi; Yamashita, Fumio

    2017-01-10

    The Bayesian estimation algorithm improves the precision of bolus tracking perfusion imaging. However, this algorithm cannot directly calculate Tmax, the time scale widely used to identify ischemic penumbra, because Tmax is a non-physiological, artificial index that reflects the tracer arrival delay (TD) and other parameters. We calculated Tmax from the TD and mean transit time (MTT) obtained by the Bayesian algorithm and determined its accuracy in comparison with Tmax obtained by singular value decomposition (SVD) algorithms. The TD and MTT maps were generated by the Bayesian algorithm applied to digital phantoms with time-concentration curves that reflected a range of values for various perfusion metrics using a global arterial input function. Tmax was calculated from the TD and MTT using constants obtained by a linear least-squares fit to Tmax obtained from the two SVD algorithms that showed the best benchmarks in a previous study. Correlations between the Tmax values obtained by the Bayesian and SVD methods were examined. The Bayesian algorithm yielded accurate TD and MTT values relative to the true values of the digital phantom. Tmax calculated from the TD and MTT values with the least-squares fit constants showed excellent correlation (Pearson's correlation coefficient = 0.99) and agreement (intraclass correlation coefficient = 0.99) with Tmax obtained from SVD algorithms. Quantitative analyses of Tmax values calculated from Bayesian-estimation algorithm-derived TD and MTT from a digital phantom correlated and agreed well with Tmax values determined using SVD algorithms.

  3. Bayesian LASSO, scale space and decision making in association genetics.

    PubMed

    Pasanen, Leena; Holmström, Lasse; Sillanpää, Mikko J

    2015-01-01

    LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection. We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.

  4. Astrophysical data analysis with information field theory

    NASA Astrophysics Data System (ADS)

    Enßlin, Torsten

    2014-12-01

    Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory, which permits the construction of optimal signal recovery algorithms. It exploits spatial correlations of the signal fields even for nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for recovering signals of unknown correlation structure by using IFT will be discussed in particular as well as a novel improvement on instrumental self-calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic microwave background, large-scale structure) and astrophysics (galactic magnetism, radio interferometry) are presented.

  5. Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution.

    PubMed

    van Iterson, Maarten; van Zwet, Erik W; Heijmans, Bastiaan T

    2017-01-27

    We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.

  6. Mitochondrial DNA Reveals Genetic Structuring of Pinna nobilis across the Mediterranean Sea

    PubMed Central

    Sanna, Daria; Cossu, Piero; Dedola, Gian Luca; Scarpa, Fabio; Maltagliati, Ferruccio; Castelli, Alberto; Franzoi, Piero; Lai, Tiziana; Cristo, Benedetto; Curini-Galletti, Marco; Francalacci, Paolo; Casu, Marco

    2013-01-01

    Pinna nobilis is the largest endemic Mediterranean marine bivalve. During past centuries, various human activities have promoted the regression of its populations. As a consequence of stringent standards of protection, demographic expansions are currently reported in many sites. The aim of this study was to provide the first large broad-scale insight into the genetic variability of P. nobilis in the area that encompasses the western Mediterranean, Ionian Sea, and Adriatic Sea marine ecoregions. To accomplish this objective twenty-five populations from this area were surveyed using two mitochondrial DNA markers (COI and 16S). Our dataset was then merged with those obtained in other studies for the Aegean and Tunisian populations (eastern Mediterranean), and statistical analyses (Bayesian model-based clustering, median-joining network, AMOVA, mismatch distribution, Tajima’s and Fu’s neutrality tests and Bayesian skyline plots) were performed. The results revealed genetic divergence among three distinguishable areas: (1) western Mediterranean and Ionian Sea; (2) Adriatic Sea; and (3) Aegean Sea and Tunisian coastal areas. From a conservational point of view, populations from the three genetically divergent groups found may be considered as different management units. PMID:23840684

  7. Eddington's demon: inferring galaxy mass functions and other distributions from uncertain data

    NASA Astrophysics Data System (ADS)

    Obreschkow, D.; Murray, S. G.; Robotham, A. S. G.; Westmeier, T.

    2018-03-01

    We present a general modified maximum likelihood (MML) method for inferring generative distribution functions from uncertain and biased data. The MML estimator is identical to, but easier and many orders of magnitude faster to compute than the solution of the exact Bayesian hierarchical modelling of all measurement errors. As a key application, this method can accurately recover the mass function (MF) of galaxies, while simultaneously dealing with observational uncertainties (Eddington bias), complex selection functions and unknown cosmic large-scale structure. The MML method is free of binning and natively accounts for small number statistics and non-detections. Its fast implementation in the R-package dftools is equally applicable to other objects, such as haloes, groups, and clusters, as well as observables other than mass. The formalism readily extends to multidimensional distribution functions, e.g. a Choloniewski function for the galaxy mass-angular momentum distribution, also handled by dftools. The code provides uncertainties and covariances for the fitted model parameters and approximate Bayesian evidences. We use numerous mock surveys to illustrate and test the MML method, as well as to emphasize the necessity of accounting for observational uncertainties in MFs of modern galaxy surveys.

  8. The Prevalences of Salmonella Genomic Island 1 Variants in Human and Animal Salmonella Typhimurium DT104 Are Distinguishable Using a Bayesian Approach

    PubMed Central

    Mather, Alison E.; Denwood, Matthew J.; Haydon, Daniel T.; Matthews, Louise; Mellor, Dominic J.; Coia, John E.; Brown, Derek J.; Reid, Stuart W. J.

    2011-01-01

    Throughout the 1990 s, there was an epidemic of multidrug resistant Salmonella Typhimurium DT104 in both animals and humans in Scotland. The use of antimicrobials in agriculture is often cited as a major source of antimicrobial resistance in pathogenic bacteria of humans, suggesting that DT104 in animals and humans should demonstrate similar prevalences of resistance determinants. Until very recently, only the application of molecular methods would allow such a comparison and our understanding has been hindered by the fact that surveillance data are primarily phenotypic in nature. Here, using large scale surveillance datasets and a novel Bayesian approach, we infer and compare the prevalence of Salmonella Genomic Island 1 (SGI1), SGI1 variants, and resistance determinants independent of SGI1 in animal and human DT104 isolates from such phenotypic data. We demonstrate differences in the prevalences of SGI1, SGI1-B, SGI1-C, absence of SGI1, and tetracycline resistance determinants independent of SGI1 between these human and animal populations, a finding that challenges established tenets that DT104 in domestic animals and humans are from the same well-mixed microbial population. PMID:22125606

  9. The Empirical Distribution of Singletons for Geographic Samples of DNA Sequences.

    PubMed

    Cubry, Philippe; Vigouroux, Yves; François, Olivier

    2017-01-01

    Rare variants are important for drawing inference about past demographic events in a species history. A singleton is a rare variant for which genetic variation is carried by a unique chromosome in a sample. How singletons are distributed across geographic space provides a local measure of genetic diversity that can be measured at the individual level. Here, we define the empirical distribution of singletons in a sample of chromosomes as the proportion of the total number of singletons that each chromosome carries, and we present a theoretical background for studying this distribution. Next, we use computer simulations to evaluate the potential for the empirical distribution of singletons to provide a description of genetic diversity across geographic space. In a Bayesian framework, we show that the empirical distribution of singletons leads to accurate estimates of the geographic origin of range expansions. We apply the Bayesian approach to estimating the origin of the cultivated plant species Pennisetum glaucum [L.] R. Br . (pearl millet) in Africa, and find support for range expansion having started from Northern Mali. Overall, we report that the empirical distribution of singletons is a useful measure to analyze results of sequencing projects based on large scale sampling of individuals across geographic space.

  10. Bayesian cross-validation for model evaluation and selection, with application to the North American Breeding Bird Survey

    USGS Publications Warehouse

    Link, William; Sauer, John R.

    2016-01-01

    The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.

  11. Probabilistic Inference: Task Dependency and Individual Differences of Probability Weighting Revealed by Hierarchical Bayesian Modeling

    PubMed Central

    Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno

    2016-01-01

    Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323

  12. Incorporating approximation error in surrogate based Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Zhang, J.; Zeng, L.; Li, W.; Wu, L.

    2015-12-01

    There are increasing interests in applying surrogates for inverse Bayesian modeling to reduce repetitive evaluations of original model. In this way, the computational cost is expected to be saved. However, the approximation error of surrogate model is usually overlooked. This is partly because that it is difficult to evaluate the approximation error for many surrogates. Previous studies have shown that, the direct combination of surrogates and Bayesian methods (e.g., Markov Chain Monte Carlo, MCMC) may lead to biased estimations when the surrogate cannot emulate the highly nonlinear original system. This problem can be alleviated by implementing MCMC in a two-stage manner. However, the computational cost is still high since a relatively large number of original model simulations are required. In this study, we illustrate the importance of incorporating approximation error in inverse Bayesian modeling. Gaussian process (GP) is chosen to construct the surrogate for its convenience in approximation error evaluation. Numerical cases of Bayesian experimental design and parameter estimation for contaminant source identification are used to illustrate this idea. It is shown that, once the surrogate approximation error is well incorporated into Bayesian framework, promising results can be obtained even when the surrogate is directly used, and no further original model simulations are required.

  13. Bayesian Latent Class Analysis Tutorial.

    PubMed

    Li, Yuelin; Lord-Bessen, Jennifer; Shiyko, Mariya; Loeb, Rebecca

    2018-01-01

    This article is a how-to guide on Bayesian computation using Gibbs sampling, demonstrated in the context of Latent Class Analysis (LCA). It is written for students in quantitative psychology or related fields who have a working knowledge of Bayes Theorem and conditional probability and have experience in writing computer programs in the statistical language R . The overall goals are to provide an accessible and self-contained tutorial, along with a practical computation tool. We begin with how Bayesian computation is typically described in academic articles. Technical difficulties are addressed by a hypothetical, worked-out example. We show how Bayesian computation can be broken down into a series of simpler calculations, which can then be assembled together to complete a computationally more complex model. The details are described much more explicitly than what is typically available in elementary introductions to Bayesian modeling so that readers are not overwhelmed by the mathematics. Moreover, the provided computer program shows how Bayesian LCA can be implemented with relative ease. The computer program is then applied in a large, real-world data set and explained line-by-line. We outline the general steps in how to extend these considerations to other methodological applications. We conclude with suggestions for further readings.

  14. Impact assessment of extreme storm events using a Bayesian network

    USGS Publications Warehouse

    den Heijer, C.(Kees); Knipping, Dirk T.J.A.; Plant, Nathaniel G.; van Thiel de Vries, Jaap S. M.; Baart, Fedor; van Gelder, Pieter H. A. J. M.

    2012-01-01

    This paper describes an investigation on the usefulness of Bayesian Networks in the safety assessment of dune coasts. A network has been created that predicts the erosion volume based on hydraulic boundary conditions and a number of cross-shore profile indicators. Field measurement data along a large part of the Dutch coast has been used to train the network. Corresponding storm impact on the dunes was calculated with an empirical dune erosion model named duros+. Comparison between the Bayesian Network predictions and the original duros+ results, here considered as observations, results in a skill up to 0.88, provided that the training data covers the range of predictions. Hence, the predictions from a deterministic model (duros+) can be captured in a probabilistic model (Bayesian Network) such that both the process knowledge and uncertainties can be included in impact and vulnerability assessments.

  15. An evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting.

    PubMed

    Hippert, Henrique S; Taylor, James W

    2010-04-01

    Artificial neural networks have frequently been proposed for electricity load forecasting because of their capabilities for the nonlinear modelling of large multivariate data sets. Modelling with neural networks is not an easy task though; two of the main challenges are defining the appropriate level of model complexity, and choosing the input variables. This paper evaluates techniques for automatic neural network modelling within a Bayesian framework, as applied to six samples containing daily load and weather data for four different countries. We analyse input selection as carried out by the Bayesian 'automatic relevance determination', and the usefulness of the Bayesian 'evidence' for the selection of the best structure (in terms of number of neurones), as compared to methods based on cross-validation. Copyright 2009 Elsevier Ltd. All rights reserved.

  16. Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences

    PubMed Central

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566

  17. Efficient occupancy model-fitting for extensive citizen-science data.

    PubMed

    Dennis, Emily B; Morgan, Byron J T; Freeman, Stephen N; Ridout, Martin S; Brereton, Tom M; Fox, Richard; Powney, Gary D; Roy, David B

    2017-01-01

    Appropriate large-scale citizen-science data present important new opportunities for biodiversity modelling, due in part to the wide spatial coverage of information. Recently proposed occupancy modelling approaches naturally incorporate random effects in order to account for annual variation in the composition of sites surveyed. In turn this leads to Bayesian analysis and model fitting, which are typically extremely time consuming. Motivated by presence-only records of occurrence from the UK Butterflies for the New Millennium data base, we present an alternative approach, in which site variation is described in a standard way through logistic regression on relevant environmental covariates. This allows efficient occupancy model-fitting using classical inference, which is easily achieved using standard computers. This is especially important when models need to be fitted each year, typically for many different species, as with British butterflies for example. Using both real and simulated data we demonstrate that the two approaches, with and without random effects, can result in similar conclusions regarding trends. There are many advantages to classical model-fitting, including the ability to compare a range of alternative models, identify appropriate covariates and assess model fit, using standard tools of maximum likelihood. In addition, modelling in terms of covariates provides opportunities for understanding the ecological processes that are in operation. We show that there is even greater potential; the classical approach allows us to construct regional indices simply, which indicate how changes in occupancy typically vary over a species' range. In addition we are also able to construct dynamic occupancy maps, which provide a novel, modern tool for examining temporal changes in species distribution. These new developments may be applied to a wide range of taxa, and are valuable at a time of climate change. They also have the potential to motivate citizen scientists.

  18. Efficient occupancy model-fitting for extensive citizen-science data

    PubMed Central

    Morgan, Byron J. T.; Freeman, Stephen N.; Ridout, Martin S.; Brereton, Tom M.; Fox, Richard; Powney, Gary D.; Roy, David B.

    2017-01-01

    Appropriate large-scale citizen-science data present important new opportunities for biodiversity modelling, due in part to the wide spatial coverage of information. Recently proposed occupancy modelling approaches naturally incorporate random effects in order to account for annual variation in the composition of sites surveyed. In turn this leads to Bayesian analysis and model fitting, which are typically extremely time consuming. Motivated by presence-only records of occurrence from the UK Butterflies for the New Millennium data base, we present an alternative approach, in which site variation is described in a standard way through logistic regression on relevant environmental covariates. This allows efficient occupancy model-fitting using classical inference, which is easily achieved using standard computers. This is especially important when models need to be fitted each year, typically for many different species, as with British butterflies for example. Using both real and simulated data we demonstrate that the two approaches, with and without random effects, can result in similar conclusions regarding trends. There are many advantages to classical model-fitting, including the ability to compare a range of alternative models, identify appropriate covariates and assess model fit, using standard tools of maximum likelihood. In addition, modelling in terms of covariates provides opportunities for understanding the ecological processes that are in operation. We show that there is even greater potential; the classical approach allows us to construct regional indices simply, which indicate how changes in occupancy typically vary over a species’ range. In addition we are also able to construct dynamic occupancy maps, which provide a novel, modern tool for examining temporal changes in species distribution. These new developments may be applied to a wide range of taxa, and are valuable at a time of climate change. They also have the potential to motivate citizen scientists. PMID:28328937

  19. The dawn of open access to phylogenetic data.

    PubMed

    Magee, Andrew F; May, Michael R; Moore, Brian R

    2014-01-01

    The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation--extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for [Formula: see text] of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Importantly, our survey spans recent policy initiatives and infrastructural changes; our analyses indicate that the positive impact of these community initiatives has been both dramatic and immediate. Although the results of our study indicate that the situation is dire, our findings also reveal tremendous recent progress in the sharing and preservation of phylogenetic data.

  20. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    NASA Astrophysics Data System (ADS)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-07-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.

  1. Combining Computational Methods for Hit to Lead Optimization in Mycobacterium tuberculosis Drug Discovery

    PubMed Central

    Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C

    2013-01-01

    Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686

  2. Environmental Factors Affecting Brook Trout Occurrence in Headwater Stream Segments

    Treesearch

    Yoichiro Kanno; Benjamin H. Letcher; Ana L. Rosner; Kyle P. O' Neil; Keith H. Nislow

    2015-01-01

    We analyzed the associations of catchment-scale and riparian-scale environmental factors with occurrence of Brook Trout Salvelinus fontinalis in Connecticut headwater stream segments with catchment areas of 15 < km2. A hierarchical Bayesian approach was applied to a statewide stream survey data set, in which Brook...

  3. Natural Hazards and Supply Chain Disruptions

    NASA Astrophysics Data System (ADS)

    Haraguchi, M.

    2016-12-01

    Natural hazards distress the global economy through disruptions in supply chain networks. Moreover, despite increasing investment to infrastructure for disaster risk management, economic damages and losses caused by natural hazards are increasing. Manufacturing companies today have reduced inventories and streamlined logistics in order to maximize economic competitiveness. As a result, today's supply chains are profoundly susceptible to systemic risks, which are the risk of collapse of an entire network caused by a few node of the network. For instance, the prolonged floods in Thailand in 2011 caused supply chain disruptions in their primary industries, i.e. electronic and automotive industries, harming not only the Thai economy but also the global economy. Similar problems occurred after the Great East Japan Earthquake and Tsunami in 2011, the Mississippi River floods and droughts during 2011 - 2013, and the Earthquake in Kumamoto Japan in 2016. This study attempts to discover what kind of effective measures are available for private companies to manage supply chain disruptions caused by floods. It also proposes a method to estimate potential risks using a Bayesian network. The study uses a Bayesian network to create synthetic networks that include variables associated with the magnitude and duration of floods, major components of supply chains such as logistics, multiple layers of suppliers, warehouses, and consumer markets. Considering situations across different times, our study shows desirable data requirements for the analysis and effective measures to improve Value at Risk (VaR) for private enterprises and supply chains.

  4. Psychosocial stress factors, including the relationship with the coach, and their influence on acute and overuse injury risk in elite female football players.

    PubMed

    Pensgaard, Anne Marte; Ivarsson, Andreas; Nilstad, Agnethe; Solstad, Bård Erlend; Steffen, Kathrin

    2018-01-01

    The relationship between specific types of stressors (eg, teammates, coach) and acute versus overuse injuries is not well understood. To examine the roles of different types of stressors as well as the effect of motivational climate on the occurrence of acute and overuse injuries. Players in the Norwegian elite female football league (n=193 players from 12 teams) participated in baseline screening tests prior to the 2009 competitive football season. As part of the screening, we included the Life Event Survey for Collegiate Athletes and the Perceived Motivational Climate in Sport Questionnaire (Norwegian short version). Acute and overuse time-loss injuries and exposure to training and matches were recorded prospectively in the football season using weekly text messaging. Data were analysed with Bayesian logistic regression analyses. Using Bayesian logistic regression analyses, we showed that perceived negative life event stress from teammates was associated with an increased risk of acute injuries (OR=1.23, 95% credibility interval (1.01 to 1.48)). There was a credible positive association between perceived negative life event stress from the coach and the risk of overuse injuries (OR=1.21, 95% credibility interval (1.01 to 1.45)). Players who report teammates as a source of stress have a greater risk of sustaining an acute injury, while players reporting the coach as a source of stress are at greater risk of sustaining an overuse injury. Motivational climate did not relate to increased injury occurrence.

  5. Development of a Bayesian model to estimate health care outcomes in the severely wounded

    PubMed Central

    Stojadinovic, Alexander; Eberhardt, John; Brown, Trevor S; Hawksworth, Jason S; Gage, Frederick; Tadaki, Douglas K; Forsberg, Jonathan A; Davis, Thomas A; Potter, Benjamin K; Dunne, James R; Elster, E A

    2010-01-01

    Background: Graphical probabilistic models have the ability to provide insights as to how clinical factors are conditionally related. These models can be used to help us understand factors influencing health care outcomes and resource utilization, and to estimate morbidity and clinical outcomes in trauma patient populations. Study design: Thirty-two combat casualties with severe extremity injuries enrolled in a prospective observational study were analyzed using step-wise machine-learned Bayesian belief network (BBN) and step-wise logistic regression (LR). Models were evaluated using 10-fold cross-validation to calculate area-under-the-curve (AUC) from receiver operating characteristics (ROC) curves. Results: Our BBN showed important associations between various factors in our data set that could not be developed using standard regression methods. Cross-validated ROC curve analysis showed that our BBN model was a robust representation of our data domain and that LR models trained on these findings were also robust: hospital-acquired infection (AUC: LR, 0.81; BBN, 0.79), intensive care unit length of stay (AUC: LR, 0.97; BBN, 0.81), and wound healing (AUC: LR, 0.91; BBN, 0.72) showed strong AUC. Conclusions: A BBN model can effectively represent clinical outcomes and biomarkers in patients hospitalized after severe wounding, and is confirmed by 10-fold cross-validation and further confirmed through logistic regression modeling. The method warrants further development and independent validation in other, more diverse patient populations. PMID:21197361

  6. Model verification of large structural systems. [space shuttle model response

    NASA Technical Reports Server (NTRS)

    Lee, L. T.; Hasselman, T. K.

    1978-01-01

    A computer program for the application of parameter identification on the structural dynamic models of space shuttle and other large models with hundreds of degrees of freedom is described. Finite element, dynamic, analytic, and modal models are used to represent the structural system. The interface with math models is such that output from any structural analysis program applied to any structural configuration can be used directly. Processed data from either sine-sweep tests or resonant dwell tests are directly usable. The program uses measured modal data to condition the prior analystic model so as to improve the frequency match between model and test. A Bayesian estimator generates an improved analytical model and a linear estimator is used in an iterative fashion on highly nonlinear equations. Mass and stiffness scaling parameters are generated for an improved finite element model, and the optimum set of parameters is obtained in one step.

  7. Massive parallelization of serial inference algorithms for a complex generalized linear model

    PubMed Central

    Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

    2014-01-01

    Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363

  8. A full-Bayesian approach to parameter inference from tracer travel time moments and investigation of scale effects at the Cape Cod experimental site

    USGS Publications Warehouse

    Woodbury, Allan D.; Rubin, Yoram

    2000-01-01

    A method for inverting the travel time moments of solutes in heterogeneous aquifers is presented and is based on peak concentration arrival times as measured at various samplers in an aquifer. The approach combines a Lagrangian [Rubin and Dagan, 1992] solute transport framework with full‐Bayesian hydrogeological parameter inference. In the full‐Bayesian approach the noise values in the observed data are treated as hyperparameters, and their effects are removed by marginalization. The prior probability density functions (pdfs) for the model parameters (horizontal integral scale, velocity, and log K variance) and noise values are represented by prior pdfs developed from minimum relative entropy considerations. Analysis of the Cape Cod (Massachusetts) field experiment is presented. Inverse results for the hydraulic parameters indicate an expected value for the velocity, variance of log hydraulic conductivity, and horizontal integral scale of 0.42 m/d, 0.26, and 3.0 m, respectively. While these results are consistent with various direct‐field determinations, the importance of the findings is in the reduction of confidence range about the various expected values. On selected control planes we compare observed travel time frequency histograms with the theoretical pdf, conditioned on the observed travel time moments. We observe a positive skew in the travel time pdf which tends to decrease as the travel time distance grows. We also test the hypothesis that there is no scale dependence of the integral scale λ with the scale of the experiment at Cape Cod. We adopt two strategies. The first strategy is to use subsets of the full data set and then to see if the resulting parameter fits are different as we use different data from control planes at expanding distances from the source. The second approach is from the viewpoint of entropy concentration. No increase in integral scale with distance is inferred from either approach over the range of the Cape Cod tracer experiment.

  9. An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit

    PubMed Central

    Wong, Rowena Syn Yin; Ismail, Noor Azina

    2016-01-01

    Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes. PMID:27007413

  10. An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.

    PubMed

    Wong, Rowena Syn Yin; Ismail, Noor Azina

    2016-01-01

    There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of in-ICU mortality outcomes.

  11. Phylogenomic analysis of a rapid radiation of misfit fishes (Syngnathiformes) using ultraconserved elements.

    PubMed

    Longo, S J; Faircloth, B C; Meyer, A; Westneat, M W; Alfaro, M E; Wainwright, P C

    2017-08-01

    Phylogenetics is undergoing a revolution as large-scale molecular datasets reveal unexpected but repeatable rearrangements of clades that were previously thought to be disparate lineages. One of the most unusual clades of fishes that has been found using large-scale molecular datasets is an expanded Syngnathiformes including traditional long-snouted syngnathiform lineages (Aulostomidae, Centriscidae, Fistulariidae, Solenostomidae, Syngnathidae), as well as a diverse set of largely benthic-associated fishes (Callionymoidei, Dactylopteridae, Mullidae, Pegasidae) that were previously dispersed across three orders. The monophyly of this surprising clade of fishes has been upheld by recent studies utilizing both nuclear and mitogenomic data, but the relationships among major lineages within Syngnathiformes remain ambiguous; previous analyses have inconsistent topologies and are plagued by low support at deep divergences between the major lineages. In this study, we use a dataset of ultraconserved elements (UCEs) to conduct the first phylogenomic study of Syngnathiformes. UCEs have been effective markers for resolving deep phylogenetic relationships in fishes and, combined with increased taxon sampling, we expected UCEs to resolve problematic syngnathiform relationships. Overall, UCEs were effective at resolving relationships within Syngnathiformes at a range of evolutionary timescales. We find consistent support for the monophyly of traditional long-snouted syngnathiform lineages (Aulostomidae, Centriscidae, Fistulariidae, Solenostomidae, Syngnathidae), which better agrees with morphological hypotheses than previously published topologies from molecular data. This result was supported by all Bayesian and maximum likelihood analyses, was robust to differences in matrix completeness and potential sources of bias, and was highly supported in coalescent-based analyses in ASTRAL when matrices were filtered to contain the most phylogenetically informative loci. While Bayesian and maximum likelihood analyses found support for a benthic-associated clade (Callionymidae, Dactylopteridae, Mullidae, and Pegasidae) as sister to the long-snouted clade, this result was not replicated in the ASTRAL analyses. The base of our phylogeny is characterized by short internodes separating major syngnathiform lineages and is consistent with the hypothesis of an ancient rapid radiation at the base of Syngnathiformes. Syngnathiformes therefore present an exciting opportunity to study patterns of morphological variation and functional innovation arising from rapid but ancient radiation. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Use of large-scale acoustic monitoring to assess anthropogenic pressures on Orthoptera communities.

    PubMed

    Penone, Caterina; Le Viol, Isabelle; Pellissier, Vincent; Julien, Jean-François; Bas, Yves; Kerbiriou, Christian

    2013-10-01

    Biodiversity monitoring at large spatial and temporal scales is greatly needed in the context of global changes. Although insects are a species-rich group and are important for ecosystem functioning, they have been largely neglected in conservation studies and policies, mainly due to technical and methodological constraints. Sound detection, a nondestructive method, is easily applied within a citizen-science framework and could be an interesting solution for insect monitoring. However, it has not yet been tested at a large scale. We assessed the value of a citizen-science program in which Orthoptera species (Tettigoniidae) were monitored acoustically along roads. We used Bayesian model-averaging analyses to test whether we could detect widely known patterns of anthropogenic effects on insects, such as the negative effects of urbanization or intensive agriculture on Orthoptera populations and communities. We also examined site-abundance correlations between years and estimated the biases in species detection to evaluate and improve the protocol. Urbanization and intensive agricultural landscapes negatively affected Orthoptera species richness, diversity, and abundance. This finding is consistent with results of previous studies of Orthoptera, vertebrates, carabids, and butterflies. The average mass of communities decreased as urbanization increased. The dispersal ability of communities increased as the percentage of agricultural land and, to a lesser extent, urban area increased. Despite changes in abundances over time, we found significant correlations between yearly abundances. We identified biases linked to the protocol (e.g., car speed or temperature) that can be accounted for ease in analyses. We argue that acoustic monitoring of Orthoptera along roads offers several advantages for assessing Orthoptera biodiversity at large spatial and temporal extents, particularly in a citizen science framework. © 2013 Society for Conservation Biology.

  13. Automated high resolution mapping of coffee in Rwanda using an expert Bayesian network

    NASA Astrophysics Data System (ADS)

    Mukashema, A.; Veldkamp, A.; Vrieling, A.

    2014-12-01

    African highland agro-ecosystems are dominated by small-scale agricultural fields that often contain a mix of annual and perennial crops. This makes such systems difficult to map by remote sensing. We developed an expert Bayesian network model to extract the small-scale coffee fields of Rwanda from very high resolution data. The model was subsequently applied to aerial orthophotos covering more than 99% of Rwanda and on one QuickBird image for the remaining part. The method consists of a stepwise adjustment of pixel probabilities, which incorporates expert knowledge on size of coffee trees and fields, and on their location. The initial naive Bayesian network, which is a spectral-based classification, yielded a coffee map with an overall accuracy of around 50%. This confirms that standard spectral variables alone cannot accurately identify coffee fields from high resolution images. The combination of spectral and ancillary data (DEM and a forest map) allowed mapping of coffee fields and associated uncertainties with an overall accuracy of 87%. Aggregated to district units, the mapped coffee areas demonstrated a high correlation with the coffee areas reported in the detailed national coffee census of 2009 (R2 = 0.92). Unlike the census data our map provides high spatial resolution of coffee area patterns of Rwanda. The proposed method has potential for mapping other perennial small scale cropping systems in the East African Highlands and elsewhere.

  14. Green Packaging Management of Logistics Enterprises

    NASA Astrophysics Data System (ADS)

    Zhang, Guirong; Zhao, Zongjian

    From the connotation of green logistics management, we discuss the principles of green packaging, and from the two levels of government and enterprises, we put forward a specific management strategy. The management of green packaging can be directly and indirectly promoted by laws, regulations, taxation, institutional and other measures. The government can also promote new investment to the development of green packaging materials, and establish specialized institutions to identify new packaging materials, standardization of packaging must also be accomplished through the power of the government. Business units of large scale through the packaging and container-based to reduce the use of packaging materials, develop and use green packaging materials and easy recycling packaging materials for proper packaging.

  15. NASA Space Flight Vehicle Fault Isolation Challenges

    NASA Technical Reports Server (NTRS)

    Neeley, James R.; Jones, James V.; Bramon, Christopher J.; Inman, Sharon K.; Tuttle, Loraine

    2016-01-01

    The Space Launch System (SLS) is the new NASA heavy lift launch vehicle in development and is scheduled for its first mission in 2018.SLS has many of the same logistics challenges as any other large scale program. However, SLS also faces unique challenges related to testability. This presentation will address the SLS challenges for diagnostics and fault isolation, along with the analyses and decisions to mitigate risk..

  16. Leaf optical properties shed light on foliar trait variability at individual to global scales

    NASA Astrophysics Data System (ADS)

    Shiklomanov, A. N.; Serbin, S.; Dietze, M.

    2016-12-01

    Recent syntheses of large trait databases have contributed immensely to our understanding of drivers of plant function at the global scale. However, the global trade-offs revealed by such syntheses, such as the trade-off between leaf productivity and resilience (i.e. "leaf economics spectrum"), are often absent at smaller scales and fail to correlate with actual functional limitations. An improved understanding of how traits vary within communities, species, and individuals is critical to accurate representations of vegetation ecophysiology and ecological dynamics in ecosystem models. Spectral data from both field observations and remote sensing platforms present a potentially rich and widely available source of information on plant traits. In particular, the inversion of physically-based radiative transfer models (RTMs) is an effective and general method for estimating plant traits from spectral measurements. Here, we apply Bayesian inversion of the PROSPECT leaf RTM to a large database of field spectra and plant traits spanning tropical, temperate, and boreal forests, agricultural plots, arid shrublands, and tundra to identify dominant sources of variability and characterize trade-offs in plant functional traits. By leveraging such a large and diverse dataset, we re-calibrate the empirical absorption coefficients underlying the PROSPECT model and expand its scope to include additional leaf biochemical components, namely leaf nitrogen content. Our work provides a key methodological contribution as a physically-based retrieval of leaf nitrogen from remote sensing observations, and provides substantial insights about trait trade-offs related to plant acclimation, adaptation, and community assembly.

  17. Phylogenetic congruence of armored scale insects (Hemiptera: Diaspididae) and their primary endosymbionts from the phylum Bacteroidetes.

    PubMed

    Gruwell, Matthew E; Morse, Geoffrey E; Normark, Benjamin B

    2007-07-01

    Insects in the sap-sucking hemipteran suborder Sternorrhyncha typically harbor maternally transmitted bacteria housed in a specialized organ, the bacteriome. In three of the four superfamilies of Sternorrhyncha (Aphidoidea, Aleyrodoidea, Psylloidea), the bacteriome-associated (primary) bacterial lineage is from the class Gammaproteobacteria (phylum Proteobacteria). The fourth superfamily, Coccoidea (scale insects), has a diverse array of bacterial endosymbionts whose affinities are largely unexplored. We have amplified fragments of two bacterial ribosomal genes from each of 68 species of armored scale insects (Diaspididae). In spite of initially using primers designed for Gammaproteobacteria, we consistently amplified sequences from a different bacterial phylum: Bacteroidetes. We use these sequences (16S and 23S, 2105 total base pairs), along with previously published sequences from the armored scale hosts (elongation factor 1alpha and 28S rDNA) to investigate phylogenetic congruence between the two clades. The Bayesian tree for the bacteria is roughly congruent with that of the hosts, with 67% of nodes identical. Partition homogeneity tests found no significant difference between the host and bacterial data sets. Of thirteen Shimodaira-Hasegawa tests, comparing the original Bayesian bacterial tree to bacterial trees with incongruent clades forced to match the host tree, 12 found no significant difference. A significant difference in topology was found only when the entire host tree was compared with the entire bacterial tree. For the bacterial data set, the treelengths of the most parsimonious host trees are only 1.8-2.4% longer than that of the most parsimonious bacterial trees. The high level of congruence between the topologies indicates that these Bacteroidetes are the primary endosymbionts of armored scale insects. To investigate the phylogenetic affinities of these endosymbionts, we aligned some of their 16S rDNA sequences with other known Bacteroidetes endosymbionts and with other similar sequences identified by BLAST searches. Although the endosymbionts of armored scales are only distantly related to the endosymbionts of the other sternorrhynchan insects, they are closely related to bacteria associated with eriococcid and margarodid scale insects, to cockroach and auchenorrynchan endosymbionts (Blattabacterium and Sulcia), and to male-killing endosymbionts of ladybird beetles. We propose the name "Candidatus Uzinura diaspidicola" for the primary endosymbionts of armored scale insects.

  18. Association of LMX1A genetic polymorphisms with susceptibility to congenital scoliosis in Chinese Han population.

    PubMed

    Wu, Nan; Yuan, Suomao; Liu, Jiaqi; Chen, Jun; Fei, Qi; Liu, Sen; Su, Xinlin; Wang, Shengru; Zhang, Jianguo; Li, Shugang; Wang, Yipeng; Qiu, Guixing; Wu, Zhihong

    2014-10-01

    A genetic association study of single nucleotide polymorphisms (SNPs) for the LMX1A gene with congenital scoliosis (CS) in the Chinese Han population. To determine whether LMX1A genetic polymorphisms are associated with susceptibility to CS. CS is a lateral curvature of the spine due to congenital vertebral defects, whose exact genetic cause has not been well established. The LMX1A gene was suggested as a potential human candidate gene for CS. However, no genetic study of LMX1A in CS has ever been reported. We genotyped 13 SNPs of the LMX1A gene in 154 patients with CS and 144 controls with matched sex and age. After conducting the Hardy-Weinberg equilibrium test, the data of 13 SNPs were analyzed by the allelic and genotypic association with logistic regression analysis. Furthermore, the genotype-phenotype association and haplotype association analysis were also performed. The 13 SNPs of the LMX1A gene met Hardy-Weinberg equilibrium in the controls, which was not in the cases. None of the allelic and genotypic frequencies of these SNPs showed significant difference between case and control groups (P > 0.05). However, the genotypic frequencies of rs1354510 and rs16841013 in the LMX1A gene were associated with CS predisposition in the unconditional logistic regression analysis (P = 0.02 and 0.018, respectively). Genotypic frequencies of 3 SNPs at rs6671290, rs1354510, and rs16841013 were found to exhibit significant differences between patients with CS with failure of formation and the healthy controls (P = 0.019, 0.007, and 0.006, respectively). Besides, in the model analysis by using unconditional logistic regression analysis, the optimized model for the 3 genotypic positive SNPs with failure of formation were rs6671290 (codominant; P = 0.025, Akaike information value = 316.6, Bayesian information criterion = 333.9), rs1354510 (overdominant; P = 0.0017, Akaike information value = 312.1, Bayesian information criterion = 325.9), and rsl6841013 (overdominant; P = 0.0016, Akaike information value = 311.1, Bayesian information criterion = 325), respectively. However, the haplotype distributions in the case group were not significantly different from those of the control group in the 3 haplotype blocks. To our knowledge, this is the first study to identify that the SNPs of the LMX1A gene might be associated with the susceptibility to CS and different clinical phenotypes of CS in the Chinese Han population. 4.

  19. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  20. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  1. Bayesian Estimation of Pneumonia Etiology: Epidemiologic Considerations and Applications to the Pneumonia Etiology Research for Child Health Study

    PubMed Central

    Fu, Wei; Shi, Qiyuan; Prosperi, Christine; Wu, Zhenke; Hammitt, Laura L.; Feikin, Daniel R.; Baggett, Henry C.; Howie, Stephen R.C.; Scott, J. Anthony G.; Murdoch, David R.; Madhi, Shabir A.; Thea, Donald M.; Brooks, W. Abdullah; Kotloff, Karen L.; Li, Mengying; Park, Daniel E.; Lin, Wenyi; Levine, Orin S.; O’Brien, Katherine L.; Zeger, Scott L.

    2017-01-01

    Abstract In pneumonia, specimens are rarely obtained directly from the infection site, the lung, so the pathogen causing infection is determined indirectly from multiple tests on peripheral clinical specimens, which may have imperfect and uncertain sensitivity and specificity, so inference about the cause is complex. Analytic approaches have included expert review of case-only results, case–control logistic regression, latent class analysis, and attributable fraction, but each has serious limitations and none naturally integrate multiple test results. The Pneumonia Etiology Research for Child Health (PERCH) study required an analytic solution appropriate for a case–control design that could incorporate evidence from multiple specimens from cases and controls and that accounted for measurement error. We describe a Bayesian integrated approach we developed that combined and extended elements of attributable fraction and latent class analyses to meet some of these challenges and illustrate the advantage it confers regarding the challenges identified for other methods. PMID:28575370

  2. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling.

    PubMed

    Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W

    2007-07-01

    Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.

  3. Between-Site Differences in the Scale of Dispersal and Gene Flow in Red Oak

    PubMed Central

    Moran, Emily V.; Clark, James S.

    2012-01-01

    Background Nut-bearing trees, including oaks (Quercus spp.), are considered to be highly dispersal limited, leading to concerns about their ability to colonize new sites or migrate in response to climate change. However, estimating seed dispersal is challenging in species that are secondarily dispersed by animals, and differences in disperser abundance or behavior could lead to large spatio-temporal variation in dispersal ability. Parentage and dispersal analyses combining genetic and ecological data provide accurate estimates of current dispersal, while spatial genetic structure (SGS) can shed light on past patterns of dispersal and establishment. Methodology and Principal Findings In this study, we estimate seed and pollen dispersal and parentage for two mixed-species red oak populations using a hierarchical Bayesian approach. We compare these results to those of a genetic ML parentage model. We also test whether observed patterns of SGS in three size cohorts are consistent with known site history and current dispersal patterns. We find that, while pollen dispersal is extensive at both sites, the scale of seed dispersal differs substantially. Parentage results differ between models due to additional data included in Bayesian model and differing genotyping error assumptions, but both indicate between-site dispersal differences. Patterns of SGS in large adults, small adults, and seedlings are consistent with known site history (farmed vs. selectively harvested), and with long-term differences in seed dispersal. This difference is consistent with predator/disperser satiation due to higher acorn production at the low-dispersal site. While this site-to-site variation results in substantial differences in asymptotic spread rates, dispersal for both sites is substantially lower than required to track latitudinal temperature shifts. Conclusions Animal-dispersed trees can exhibit considerable spatial variation in seed dispersal, although patterns may be surprisingly constant over time. However, even under favorable conditions, migration in heavy-seeded species is likely to lag contemporary climate change. PMID:22563504

  4. Double the dates and go for Bayes - Impacts of model choice, dating density and quality on chronologies

    NASA Astrophysics Data System (ADS)

    Blaauw, Maarten; Christen, J. Andrés; Bennett, K. D.; Reimer, Paula J.

    2018-05-01

    Reliable chronologies are essential for most Quaternary studies, but little is known about how age-depth model choice, as well as dating density and quality, affect the precision and accuracy of chronologies. A meta-analysis suggests that most existing late-Quaternary studies contain fewer than one date per millennium, and provide millennial-scale precision at best. We use existing and simulated sediment cores to estimate what dating density and quality are required to obtain accurate chronologies at a desired precision. For many sites, a doubling in dating density would significantly improve chronologies and thus their value for reconstructing and interpreting past environmental changes. Commonly used classical age-depth models stop becoming more precise after a minimum dating density is reached, but the precision of Bayesian age-depth models which take advantage of chronological ordering continues to improve with more dates. Our simulations show that classical age-depth models severely underestimate uncertainty and are inaccurate at low dating densities, and also perform poorly at high dating densities. On the other hand, Bayesian age-depth models provide more realistic precision estimates, including at low to average dating densities, and are much more robust against dating scatter and outliers. Indeed, Bayesian age-depth models outperform classical ones at all tested dating densities, qualities and time-scales. We recommend that chronologies should be produced using Bayesian age-depth models taking into account chronological ordering and based on a minimum of 2 dates per millennium.

  5. Early onset obsessive-compulsive disorder with and without tics.

    PubMed

    de Mathis, Maria Alice; Diniz, Juliana B; Shavitt, Roseli G; Torres, Albina R; Ferrão, Ygor A; Fossaluza, Victor; Pereira, Carlos; Miguel, Eurípedes; do Rosario, Maria Conceicão

    2009-07-01

    Research suggests that obsessive-compulsive disorder (OCD) is not a unitary entity, but rather a highly heterogeneous condition, with complex and variable clinical manifestations. The aims of this study were to compare clinical and demographic characteristics of OCD patients with early and late age of onset of obsessive-compulsive symptoms (OCS); and to compare the same features in early onset OCD with and without tics. The independent impact of age at onset and presence of tics on comorbidity patterns was investigated. Three hundred and thirty consecutive outpatients meeting Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition criteria for OCD were evaluated: 160 patients belonged to the "early onset" group (EOG): before 11 years of age, 75 patients had an "intermediate onset" (IOG), and 95 patients were from the "late onset" group (LOG): after 18 years of age. From the 160 EOG, 60 had comorbidity with tic disorders. The diagnostic instruments used were: the Yale-Brown Obsessive Compulsive Scale and the Dimensional Yale-Brown Obsessive Compulsive Scale (DY-BOCS), Yale Global Tics Severity Scale, and Structured Clinical Interview for DSM-IV Axis I Disorders-patient edition. Statistical tests used were: Mann-Whitney, full Bayesian significance test, and logistic regression. The EOG had a predominance of males, higher frequency of family history of OCS, higher mean scores on the "aggression/violence" and "miscellaneous" dimensions, and higher mean global DY-BOCS scores. Patients with EOG without tic disorders presented higher mean global DY-BOCS scores and higher mean scores in the "contamination/cleaning" dimension. The current results disentangle some of the clinical overlap between early onset OCD with and without tics.

  6. Separation in Logistic Regression: Causes, Consequences, and Control.

    PubMed

    Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg

    2018-04-01

    Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.

  7. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  8. Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood

    PubMed Central

    Yan, Fang-Rong; Lin, Jin-Guan; Liu, Yu

    2011-01-01

    The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD) penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA) and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO) penalty, by using receiver operating characteristic (ROC) with bayesian bootstrap estimating area under the curve (AUC) diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA) in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis. PMID:21716672

  9. Spatiotemporal Bayesian analysis of Lyme disease in New York state, 1990-2000.

    PubMed

    Chen, Haiyan; Stratton, Howard H; Caraco, Thomas B; White, Dennis J

    2006-07-01

    Mapping ordinarily increases our understanding of nontrivial spatial and temporal heterogeneities in disease rates. However, the large number of parameters required by the corresponding statistical models often complicates detailed analysis. This study investigates the feasibility of a fully Bayesian hierarchical regression approach to the problem and identifies how it outperforms two more popular methods: crude rate estimates (CRE) and empirical Bayes standardization (EBS). In particular, we apply a fully Bayesian approach to the spatiotemporal analysis of Lyme disease incidence in New York state for the period 1990-2000. These results are compared with those obtained by CRE and EBS in Chen et al. (2005). We show that the fully Bayesian regression model not only gives more reliable estimates of disease rates than the other two approaches but also allows for tractable models that can accommodate more numerous sources of variation and unknown parameters.

  10. Bayesian learning of visual chunks by human observers

    PubMed Central

    Orbán, Gergő; Fiser, József; Aslin, Richard N.; Lengyel, Máté

    2008-01-01

    Efficient and versatile processing of any hierarchically structured information requires a learning mechanism that combines lower-level features into higher-level chunks. We investigated this chunking mechanism in humans with a visual pattern-learning paradigm. We developed an ideal learner based on Bayesian model comparison that extracts and stores only those chunks of information that are minimally sufficient to encode a set of visual scenes. Our ideal Bayesian chunk learner not only reproduced the results of a large set of previous empirical findings in the domain of human pattern learning but also made a key prediction that we confirmed experimentally. In accordance with Bayesian learning but contrary to associative learning, human performance was well above chance when pair-wise statistics in the exemplars contained no relevant information. Thus, humans extract chunks from complex visual patterns by generating accurate yet economical representations and not by encoding the full correlational structure of the input. PMID:18268353

  11. Bayesian Recurrent Neural Network for Language Modeling.

    PubMed

    Chien, Jen-Tzung; Ku, Yuan-Chu

    2016-02-01

    A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.

  12. Least Squares Method for Equating Logistic Ability Scales: A General Approach and Evaluation. Iowa Testing Programs Occasional Papers, Number 30.

    ERIC Educational Resources Information Center

    Haebara, Tomokazu

    When several ability scales in item response models are separately derived from different test forms administered to different samples of examinees, these scales must be equated to a common scale because their units and origins are arbitrarily determined and generally different from scale to scale. A general method for equating logistic ability…

  13. Risk factors for pneumonic and ulceroglandular tularaemia in Finland: a population-based case-control study.

    PubMed

    Rossow, H; Ollgren, J; Klemets, P; Pietarinen, I; Saikku, J; Pekkanen, E; Nikkari, S; Syrjälä, H; Kuusi, M; Nuorti, J P

    2014-10-01

    Few population-based data are available on factors associated with pneumonic and ulceroglandular type B tularaemia. We conducted a case-control study during a large epidemic in 2000. Laboratory-confirmed case patients were identified through active surveillance and matched control subjects (age, sex, residency) from the national population information system. Data were collected using a self-administered questionnaire. A conditional logistic regression model addressing missing data with Bayesian full-likelihood modelling included 227 case patients and 415 control subjects; reported mosquito bites [adjusted odds ratio (aOR) 9·2, 95% confidence interval (CI) 4·4-22, population-attributable risk (PAR) 82%] and farming activities (aOR 4·3, 95% CI 2·5-7·2, PAR 32%) were independently associated with ulceroglandular tularaemia, whereas exposure to hay dust (aOR 6·6, 95% CI 1·9-25·4, PAR 48%) was associated with pneumonic tularaemia. Although the bulk of tularaemia type B disease burden is attributable to mosquito bites, risk factors for ulceroglandular and pneumonic forms of tularaemia are different, enabling targeting of prevention efforts accordingly.

  14. A Bayesian Assessment of Seismic Semi-Periodicity Forecasts

    NASA Astrophysics Data System (ADS)

    Nava, F.; Quinteros, C.; Glowacka, E.; Frez, J.

    2016-01-01

    Among the schemes for earthquake forecasting, the search for semi-periodicity during large earthquakes in a given seismogenic region plays an important role. When considering earthquake forecasts based on semi-periodic sequence identification, the Bayesian formalism is a useful tool for: (1) assessing how well a given earthquake satisfies a previously made forecast; (2) re-evaluating the semi-periodic sequence probability; and (3) testing other prior estimations of the sequence probability. A comparison of Bayesian estimates with updated estimates of semi-periodic sequences that incorporate new data not used in the original estimates shows extremely good agreement, indicating that: (1) the probability that a semi-periodic sequence is not due to chance is an appropriate estimate for the prior sequence probability estimate; and (2) the Bayesian formalism does a very good job of estimating corrected semi-periodicity probabilities, using slightly less data than that used for updated estimates. The Bayesian approach is exemplified explicitly by its application to the Parkfield semi-periodic forecast, and results are given for its application to other forecasts in Japan and Venezuela.

  15. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  16. A Commercialization Roadmap for Carbon-Negative Energy Systems

    NASA Astrophysics Data System (ADS)

    Sanchez, D.

    2016-12-01

    The Intergovernmental Panel on Climate Change (IPCC) envisages the need for large-scale deployment of net-negative CO2 emissions technologies by mid-century to meet stringent climate mitigation goals and yield a net drawdown of atmospheric carbon. Yet there are few commercial deployments of BECCS outside of niche markets, creating uncertainty about commercialization pathways and sustainability impacts at scale. This uncertainty is exacerbated by the absence of a strong policy framework, such as high carbon prices and research coordination. Here, we propose a strategy for the potential commercial deployment of BECCS. This roadmap proceeds via three steps: 1) via capture and utilization of biogenic CO2 from existing bioenergy facilities, notably ethanol fermentation, 2) via thermochemical co-conversion of biomass and fossil fuels, particularly coal, and 3) via dedicated, large-scale BECCS. Although biochemical conversion is a proven first market for BECCS, this trajectory alone is unlikely to drive commercialization of BECCS at the gigatonne scale. In contrast to biochemical conversion, thermochemical conversion of coal and biomass enables large-scale production of fuels and electricity with a wide range of carbon intensities, process efficiencies and process scales. Aside from systems integration, primarily technical barriers are involved in large-scale biomass logistics, gasification and gas cleaning. Key uncertainties around large-scale BECCS deployment are not limited to commercialization pathways; rather, they include physical constraints on biomass cultivation or CO2 storage, as well as social barriers, including public acceptance of new technologies and conceptions of renewable and fossil energy, which co-conversion systems confound. Despite sustainability risks, this commercialization strategy presents a pathway where energy suppliers, manufacturers and governments could transition from laggards to leaders in climate change mitigation efforts.

  17. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

    PubMed Central

    Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo

    2017-01-01

    There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241

  18. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo

    2017-06-07

    There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.

  19. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network

    PubMed Central

    2011-01-01

    Background Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. Results We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism’s metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. Conclusions After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis. PMID:22784571

  20. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network.

    PubMed

    Kim, Hyun Uk; Kim, Tae Yong; Lee, Sang Yup

    2011-01-01

    Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism's metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis.

  1. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  2. Quantification of downscaled precipitation uncertainties via Bayesian inference

    NASA Astrophysics Data System (ADS)

    Nury, A. H.; Sharma, A.; Marshall, L. A.

    2017-12-01

    Prediction of precipitation from global climate model (GCM) outputs remains critical to decision-making in water-stressed regions. In this regard, downscaling of GCM output has been a useful tool for analysing future hydro-climatological states. Several downscaling approaches have been developed for precipitation downscaling, including those using dynamical or statistical downscaling methods. Frequently, outputs from dynamical downscaling are not readily transferable across regions for significant methodical and computational difficulties. Statistical downscaling approaches provide a flexible and efficient alternative, providing hydro-climatological outputs across multiple temporal and spatial scales in many locations. However these approaches are subject to significant uncertainty, arising due to uncertainty in the downscaled model parameters and in the use of different reanalysis products for inferring appropriate model parameters. Consequently, these will affect the performance of simulation in catchment scale. This study develops a Bayesian framework for modelling downscaled daily precipitation from GCM outputs. This study aims to introduce uncertainties in downscaling evaluating reanalysis datasets against observational rainfall data over Australia. In this research a consistent technique for quantifying downscaling uncertainties by means of Bayesian downscaling frame work has been proposed. The results suggest that there are differences in downscaled precipitation occurrences and extremes.

  3. A Hierarchical Bayesian Multidimensional Scaling Methodology for Accommodating Both Structural and Preference Heterogeneity

    ERIC Educational Resources Information Center

    Park, Joonwook; Desarbo, Wayne S.; Liechty, John

    2008-01-01

    Multidimensional scaling (MDS) models for the analysis of dominance data have been developed in the psychometric and classification literature to simultaneously capture subjects' "preference heterogeneity" and the underlying dimentional structure for a set of designated stimuli in a parsimonious manner. There are two major types of latent utility…

  4. Second-hand market as an alternative in reverse logistics

    NASA Astrophysics Data System (ADS)

    Pochampally, Kishore K.; Gupta, Surendra M.

    2004-02-01

    Collectors of discarded products seldom know when those products were bought and why they are discarded. Also, the products do not indicate their remaining life periods. So, it is difficult to decide if it is "sensible" to repair (if necessary) a particular product for subsequent sale on the second-hand market or to disassemble it partially or completely for subsequent remanufacture and/or recycle. To this end, we build an expert system using Bayesian updating process and fuzzy set theory, to aid such decision-making. A numerical example demonstrates the building approach.

  5. Harvesting forest biomass for energy in Minnesota: An assessment of guidelines, costs and logistics

    NASA Astrophysics Data System (ADS)

    Saleh, Dalia El Sayed Abbas Mohamed

    The emerging market for renewable energy in Minnesota has generated a growing interest in utilizing more forest biomass for energy. However, this growing interest is paralleled with limited knowledge of the environmental impacts and cost effectiveness of utilizing this resource. To address environmental and economic viability concerns, this dissertation has addressed three areas related to biomass harvest: First, existing biomass harvesting guidelines and sustainability considerations are examined. Second, the potential contribution of biomass energy production to reduce the costs of hazardous fuel reduction treatments in these trials is assessed. Third, the logistics of biomass production trials are analyzed. Findings show that: (1) Existing forest related guidelines are not sufficient to allow large-scale production of biomass energy from forest residue sustainably. Biomass energy guidelines need to be based on scientific assessments of how repeated and large scale biomass production is going to affect soil, water and habitat values, in an integrated and individual manner over time. Furthermore, such guidelines would need to recommend production logistics (planning, implementation, and coordination of operations) necessary for a potential supply with the least site and environmental impacts. (2) The costs of biomass production trials were assessed and compared with conventional treatment costs. In these trials, conventional mechanical treatment costs were lower than biomass energy production costs less income from biomass sale. However, a sensitivity analysis indicated that costs reductions are possible under certain site, prescriptions and distance conditions. (3) Semi-structured interviews with forest machine operators indicate that existing fuel reduction prescriptions need to be more realistic in making recommendations that can overcome operational barriers (technical and physical) and planning and coordination concerns (guidelines and communications) identified by machine operators, and which are necessary for a viable biomass energy production system. The results of this dissertation suggest that once biomass energy production is intended, incorporating an early understanding of production logistics while developing environmentally sensitive guidelines and site-specific prescriptions can improve biomass energy production, costs, performance and sustainability.

  6. Measurement of faculty anesthesiologists' quality of clinical supervision has greater reliability when controlling for the leniency of the rating anesthesia resident: a retrospective cohort study.

    PubMed

    Dexter, Franklin; Ledolter, Johannes; Hindman, Bradley J

    2017-06-01

    Our department monitors the quality of anesthesiologists' clinical supervision and provides each anesthesiologist with periodic feedback. We hypothesized that greater differentiation among anesthesiologists' supervision scores could be obtained by adjusting for leniency of the rating resident. From July 1, 2013 to December 31, 2015, our department has utilized the de Oliveira Filho unidimensional nine-item supervision scale to assess the quality of clinical supervision provided by faculty as rated by residents. We examined all 13,664 ratings of the 97 anesthesiologists (ratees) by the 65 residents (raters). Testing for internal consistency among answers to questions (large Cronbach's alpha > 0.90) was performed to rule out that one or two questions accounted for leniency. Mixed-effects logistic regression was used to compare ratees while controlling for rater leniency vs using Student t tests without rater leniency. The mean supervision scale score was calculated for each combination of the 65 raters and nine questions. The Cronbach's alpha was very large (0.977). The mean score was calculated for each of the 3,421 observed combinations of resident and anesthesiologist. The logits of the percentage of scores equal to the maximum value of 4.00 were normally distributed (residents, P = 0.24; anesthesiologists, P = 0.50). There were 20/97 anesthesiologists identified as significant outliers (13 with below average supervision scores and seven with better than average) using the mixed-effects logistic regression with rater leniency entered as a fixed effect but not by Student's t test. In contrast, there were three of 97 anesthesiologists identified as outliers (all three above average) using Student's t tests but not by logistic regression with leniency. The 20 vs 3 was significant (P < 0.001). Use of logistic regression with leniency results in greater detection of anesthesiologists with significantly better (or worse) clinical supervision scores than use of Student's t tests (i.e., without adjustment for rater leniency).

  7. Robust phenotyping strategies for evaluation of stem non-structural carbohydrates (NSC) in rice

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Diane R.; Wolfrum, Edward J.; Virk, Parminder

    Rice plants ( Oryza sativa) accumulate excess photoassimilates in the form of non-structural carbohydrates (NSCs) in their stems prior to heading that can later be mobilized to supplement photosynthate production during grain-filling. Despite longstanding interest in stem NSC for rice improvement, the dynamics of NSC accumulation, remobilization, and re-accumulation that have genetic potential for optimization have not been systematically investigated. Here we conducted three pilot experiments to lay the groundwork for large-scale diversity studies on rice stem NSC. We assessed the relationship of stem NSC components with 21 agronomic traits in large-scale, tropical yield trials using 33 breeder-nominated lines, establishedmore » an appropriate experimental design for future genetic studies using a Bayesian framework to sample sub-datasets from highly replicated greenhouse data using 36 genetically diverse genotypes, and used 434 phenotypically divergent rice stem samples to develop two partial least-squares (PLS) models using near-infrared (NIR) spectra for accurate, rapid prediction of rice stem starch, sucrose, and total non-structural carbohydrates. Lastly, we find evidence that stem reserves are most critical for short-duration varieties and suggest that pre-heading stem NSC is worthy of further experimentation for breeding early maturing rice.« less

  8. Robust phenotyping strategies for evaluation of stem non-structural carbohydrates (NSC) in rice

    PubMed Central

    Wang, Diane R.; Wolfrum, Edward J.; Virk, Parminder; Ismail, Abdelbagi; Greenberg, Anthony J.; McCouch, Susan R.

    2016-01-01

    Rice plants (Oryza sativa) accumulate excess photoassimilates in the form of non-structural carbohydrates (NSCs) in their stems prior to heading that can later be mobilized to supplement photosynthate production during grain-filling. Despite longstanding interest in stem NSC for rice improvement, the dynamics of NSC accumulation, remobilization, and re-accumulation that have genetic potential for optimization have not been systematically investigated. Here we conducted three pilot experiments to lay the groundwork for large-scale diversity studies on rice stem NSC. We assessed the relationship of stem NSC components with 21 agronomic traits in large-scale, tropical yield trials using 33 breeder-nominated lines, established an appropriate experimental design for future genetic studies using a Bayesian framework to sample sub-datasets from highly replicated greenhouse data using 36 genetically diverse genotypes, and used 434 phenotypically divergent rice stem samples to develop two partial least-squares (PLS) models using near-infrared (NIR) spectra for accurate, rapid prediction of rice stem starch, sucrose, and total non-structural carbohydrates. We find evidence that stem reserves are most critical for short-duration varieties and suggest that pre-heading stem NSC is worthy of further experimentation for breeding early maturing rice. PMID:27707775

  9. Robust phenotyping strategies for evaluation of stem non-structural carbohydrates (NSC) in rice

    DOE PAGES

    Wang, Diane R.; Wolfrum, Edward J.; Virk, Parminder; ...

    2016-10-05

    Rice plants ( Oryza sativa) accumulate excess photoassimilates in the form of non-structural carbohydrates (NSCs) in their stems prior to heading that can later be mobilized to supplement photosynthate production during grain-filling. Despite longstanding interest in stem NSC for rice improvement, the dynamics of NSC accumulation, remobilization, and re-accumulation that have genetic potential for optimization have not been systematically investigated. Here we conducted three pilot experiments to lay the groundwork for large-scale diversity studies on rice stem NSC. We assessed the relationship of stem NSC components with 21 agronomic traits in large-scale, tropical yield trials using 33 breeder-nominated lines, establishedmore » an appropriate experimental design for future genetic studies using a Bayesian framework to sample sub-datasets from highly replicated greenhouse data using 36 genetically diverse genotypes, and used 434 phenotypically divergent rice stem samples to develop two partial least-squares (PLS) models using near-infrared (NIR) spectra for accurate, rapid prediction of rice stem starch, sucrose, and total non-structural carbohydrates. Lastly, we find evidence that stem reserves are most critical for short-duration varieties and suggest that pre-heading stem NSC is worthy of further experimentation for breeding early maturing rice.« less

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bazillian, Morgan; Pedersen, Ascha Lychett; Pless, Jacuelyn

    Shale gas resource potential in China is assessed to be large, and its development could have wide-ranging economic, environmental, and energy security implications. Although commercial scale shale gas development has not yet begun in China, it holds the potential to change the global energy landscape. Chinese decision-makers are wrestling with the challenges associated with bringing the potential to reality: geologic complexity; infrastructure and logistical difficulties; technological, institutional, social and market development issues; and environmental impacts, including greenhouse gas emissions, impacts on water availability and quality, and air pollution. This paper briefly examines the current situation and outlook for shale gasmore » in China, and explores existing and potential avenues for international cooperation. We find that despite some barriers to large-scale development, Chinese shale gas production has the potential to grow rapidly over the medium-term.« less

  11. A new probe of the magnetic field power spectrum in cosmic web filaments

    NASA Astrophysics Data System (ADS)

    Hales, Christopher A.; Greiner, Maksim; Ensslin, Torsten A.

    2015-08-01

    Establishing the properties of magnetic fields on scales larger than galaxy clusters is critical for resolving the unknown origin and evolution of galactic and cluster magnetism. More generally, observations of magnetic fields on cosmic scales are needed for assessing the impacts of magnetism on cosmology, particle physics, and structure formation over the full history of the Universe. However, firm observational evidence for magnetic fields in large scale structure remains elusive. In an effort to address this problem, we have developed a novel statistical method to infer the magnetic field power spectrum in cosmic web filaments using observation of the two-point correlation of Faraday rotation measures from a dense grid of extragalactic radio sources. Here we describe our approach, which embeds and extends the pioneering work of Kolatt (1998) within the context of Information Field Theory (a statistical theory for Bayesian inference on spatially distributed signals; Enfllin et al., 2009). We describe prospects for observation, for example with forthcoming data from the ultra-deep JVLA CHILES Con Pol survey and future surveys with the SKA.

  12. Objectified quantification of uncertainties in Bayesian atmospheric inversions

    NASA Astrophysics Data System (ADS)

    Berchet, A.; Pison, I.; Chevallier, F.; Bousquet, P.; Bonne, J.-L.; Paris, J.-D.

    2015-05-01

    Classical Bayesian atmospheric inversions process atmospheric observations and prior emissions, the two being connected by an observation operator picturing mainly the atmospheric transport. These inversions rely on prescribed errors in the observations, the prior emissions and the observation operator. When data pieces are sparse, inversion results are very sensitive to the prescribed error distributions, which are not accurately known. The classical Bayesian framework experiences difficulties in quantifying the impact of mis-specified error distributions on the optimized fluxes. In order to cope with this issue, we rely on recent research results to enhance the classical Bayesian inversion framework through a marginalization on a large set of plausible errors that can be prescribed in the system. The marginalization consists in computing inversions for all possible error distributions weighted by the probability of occurrence of the error distributions. The posterior distribution of the fluxes calculated by the marginalization is not explicitly describable. As a consequence, we carry out a Monte Carlo sampling based on an approximation of the probability of occurrence of the error distributions. This approximation is deduced from the well-tested method of the maximum likelihood estimation. Thus, the marginalized inversion relies on an automatic objectified diagnosis of the error statistics, without any prior knowledge about the matrices. It robustly accounts for the uncertainties on the error distributions, contrary to what is classically done with frozen expert-knowledge error statistics. Some expert knowledge is still used in the method for the choice of an emission aggregation pattern and of a sampling protocol in order to reduce the computation cost. The relevance and the robustness of the method is tested on a case study: the inversion of methane surface fluxes at the mesoscale with virtual observations on a realistic network in Eurasia. Observing system simulation experiments are carried out with different transport patterns, flux distributions and total prior amounts of emitted methane. The method proves to consistently reproduce the known "truth" in most cases, with satisfactory tolerance intervals. Additionally, the method explicitly provides influence scores and posterior correlation matrices. An in-depth interpretation of the inversion results is then possible. The more objective quantification of the influence of the observations on the fluxes proposed here allows us to evaluate the impact of the observation network on the characterization of the surface fluxes. The explicit correlations between emission aggregates reveal the mis-separated regions, hence the typical temporal and spatial scales the inversion can analyse. These scales are consistent with the chosen aggregation patterns.

  13. A 3-level Bayesian mixed effects location scale model with an application to ecological momentary assessment data.

    PubMed

    Lin, Xiaolei; Mermelstein, Robin J; Hedeker, Donald

    2018-06-15

    Ecological momentary assessment studies usually produce intensively measured longitudinal data with large numbers of observations per unit, and research interest is often centered around understanding the changes in variation of people's thoughts, emotions and behaviors. Hedeker et al developed a 2-level mixed effects location scale model that allows observed covariates as well as unobserved variables to influence both the mean and the within-subjects variance, for a 2-level data structure where observations are nested within subjects. In some ecological momentary assessment studies, subjects are measured at multiple waves, and within each wave, subjects are measured over time. Li and Hedeker extended the original 2-level model to a 3-level data structure where observations are nested within days and days are then nested within subjects, by including a random location and scale intercept at the intermediate wave level. However, the 3-level random intercept model assumes constant response change rate for both the mean and variance. To account for changes in variance across waves, as well as clustering attributable to waves, we propose a more comprehensive location scale model that allows subject heterogeneity at baseline as well as across different waves, for a 3-level data structure where observations are nested within waves and waves are then further nested within subjects. The model parameters are estimated using Markov chain Monte Carlo methods. We provide details on the Bayesian estimation approach and demonstrate how the Stan statistical software can be used to sample from the desired distributions and achieve consistent estimates. The proposed model is validated via a series of simulation studies. Data from an adolescent smoking study are analyzed to demonstrate this approach. The analyses clearly favor the proposed model and show significant subject heterogeneity at baseline as well as change over time, for both mood mean and variance. The proposed 3-level location scale model can be widely applied to areas of research where the interest lies in the consistency in addition to the mean level of the responses. Copyright © 2018 John Wiley & Sons, Ltd.

  14. Multivariable and Bayesian Network Analysis of Outcome Predictors in Acute Aneurysmal Subarachnoid Hemorrhage: Review of a Pure Surgical Series in the Post-International Subarachnoid Aneurysm Trial Era.

    PubMed

    Zador, Zsolt; Huang, Wendy; Sperrin, Matthew; Lawton, Michael T

    2018-06-01

    Following the International Subarachnoid Aneurysm Trial (ISAT), evolving treatment modalities for acute aneurysmal subarachnoid hemorrhage (aSAH) has changed the case mix of patients undergoing urgent surgical clipping. To update our knowledge on outcome predictors by analyzing admission parameters in a pure surgical series using variable importance ranking and machine learning. We reviewed a single surgeon's case series of 226 patients suffering from aSAH treated with urgent surgical clipping. Predictions were made using logistic regression models, and predictive performance was assessed using areas under the receiver operating curve (AUC). We established variable importance ranking using partial Nagelkerke R2 scores. Probabilistic associations between variables were depicted using Bayesian networks, a method of machine learning. Importance ranking showed that World Federation of Neurosurgical Societies (WFNS) grade and age were the most influential outcome prognosticators. Inclusion of only these 2 predictors was sufficient to maintain model performance compared to when all variables were considered (AUC = 0.8222, 95% confidence interval (CI): 0.7646-0.88 vs 0.8218, 95% CI: 0.7616-0.8821, respectively, DeLong's P = .992). Bayesian networks showed that age and WFNS grade were associated with several variables such as laboratory results and cardiorespiratory parameters. Our study is the first to report early outcomes and formal predictor importance ranking following aSAH in a post-ISAT surgical case series. Models showed good predictive power with fewer relevant predictors than in similar size series. Bayesian networks proved to be a powerful tool in visualizing the widespread association of the 2 key predictors with admission variables, explaining their importance and demonstrating the potential for hypothesis generation.

  15. Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review

    PubMed Central

    McClelland, James L.

    2013-01-01

    This article seeks to establish a rapprochement between explicitly Bayesian models of contextual effects in perception and neural network models of such effects, particularly the connectionist interactive activation (IA) model of perception. The article is in part an historical review and in part a tutorial, reviewing the probabilistic Bayesian approach to understanding perception and how it may be shaped by context, and also reviewing ideas about how such probabilistic computations may be carried out in neural networks, focusing on the role of context in interactive neural networks, in which both bottom-up and top-down signals affect the interpretation of sensory inputs. It is pointed out that connectionist units that use the logistic or softmax activation functions can exactly compute Bayesian posterior probabilities when the bias terms and connection weights affecting such units are set to the logarithms of appropriate probabilistic quantities. Bayesian concepts such the prior, likelihood, (joint and marginal) posterior, probability matching and maximizing, and calculating vs. sampling from the posterior are all reviewed and linked to neural network computations. Probabilistic and neural network models are explicitly linked to the concept of a probabilistic generative model that describes the relationship between the underlying target of perception (e.g., the word intended by a speaker or other source of sensory stimuli) and the sensory input that reaches the perceiver for use in inferring the underlying target. It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation. Ways in which these computations might be realized in real neural systems are also considered. PMID:23970868

  16. Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review.

    PubMed

    McClelland, James L

    2013-01-01

    This article seeks to establish a rapprochement between explicitly Bayesian models of contextual effects in perception and neural network models of such effects, particularly the connectionist interactive activation (IA) model of perception. The article is in part an historical review and in part a tutorial, reviewing the probabilistic Bayesian approach to understanding perception and how it may be shaped by context, and also reviewing ideas about how such probabilistic computations may be carried out in neural networks, focusing on the role of context in interactive neural networks, in which both bottom-up and top-down signals affect the interpretation of sensory inputs. It is pointed out that connectionist units that use the logistic or softmax activation functions can exactly compute Bayesian posterior probabilities when the bias terms and connection weights affecting such units are set to the logarithms of appropriate probabilistic quantities. Bayesian concepts such the prior, likelihood, (joint and marginal) posterior, probability matching and maximizing, and calculating vs. sampling from the posterior are all reviewed and linked to neural network computations. Probabilistic and neural network models are explicitly linked to the concept of a probabilistic generative model that describes the relationship between the underlying target of perception (e.g., the word intended by a speaker or other source of sensory stimuli) and the sensory input that reaches the perceiver for use in inferring the underlying target. It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation. Ways in which these computations might be realized in real neural systems are also considered.

  17. Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.

    PubMed

    Li, Qiuju; Pan, Jianxin; Belcher, John

    2016-12-01

    In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.

  18. Live immunization against East Coast fever--current status.

    PubMed

    Di Giulio, Giuseppe; Lynen, Godelieve; Morzaria, Subhash; Oura, Chris; Bishop, Richard

    2009-02-01

    The infection-and-treatment method (ITM) for immunization of cattle against East Coast fever has historically been used only on a limited scale because of logistical and policy constraints. Recent large-scale deployment among pastoralists in Tanzania has stimulated demand. Concurrently, a suite of molecular tools, developed from the Theileria parva genome, has enabled improved quality control of the immunizing stabilate and post-immunization monitoring of the efficacy and biological impact of ITM in the field. This article outlines the current status of ITM immunization in the field, with associated developments in the molecular epidemiology of T. parva.

  19. Hierarchical faunal filters: An approach to assessing effects of habitat and nonnative species on native fishes

    USGS Publications Warehouse

    Quist, M.C.; Rahel, F.J.; Hubert, W.A.

    2005-01-01

    Understanding factors related to the occurrence of species across multiple spatial and temporal scales is critical to the conservation and management of native fishes, especially for those species at the edge of their natural distribution. We used the concept of hierarchical faunal filters to provide a framework for investigating the influence of habitat characteristics and normative piscivores on the occurrence of 10 native fishes in streams of the North Platte River watershed in Wyoming. Three faunal filters were developed for each species: (i) large-scale biogeographic, (ii) local abiotic, and (iii) biotic. The large-scale biogeographic filter, composed of elevation and stream-size thresholds, was used to determine the boundaries within which each species might be expected to occur. Then, a local abiotic filter (i.e., habitat associations), developed using binary logistic-regression analysis, estimated the probability of occurrence of each species from features such as maximum depth, substrate composition, submergent aquatic vegetation, woody debris, and channel morphology (e.g., amount of pool habitat). Lastly, a biotic faunal filter was developed using binary logistic regression to estimate the probability of occurrence of each species relative to the abundance of nonnative piscivores in a reach. Conceptualising fish assemblages within a framework of hierarchical faunal filters is simple and logical, helps direct conservation and management activities, and provides important information on the ecology of fishes in the western Great Plains of North America. ?? Blackwell Munksgaard, 2004.

  20. Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: an application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39).

    PubMed

    Borchani, Hanen; Bielza, Concha; Martı Nez-Martı N, Pablo; Larrañaga, Pedro

    2012-12-01

    Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson's patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson's disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. How to distinguish between cloudy mini-Neptunes and water/volatile-dominated super-Earths

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benneke, Björn; Seager, Sara, E-mail: bbenneke@mit.edu

    One of the most profound questions about the newly discovered class of low-density super-Earths is whether these exoplanets are predominately H{sub 2}-dominated mini-Neptunes or volatile-rich worlds with gas envelopes dominated by H{sub 2}O, CO{sub 2}, CO, CH{sub 4}, or N{sub 2}. Transit observations of the super-Earth GJ 1214b rule out cloud-free H{sub 2}-dominated scenarios, but are not able to determine whether the lack of deep spectral features is due to high-altitude clouds or the presence of a high mean molecular mass atmosphere. Here, we demonstrate that one can unambiguously distinguish between cloudy mini-Neptunes and volatile-dominated worlds based on wing steepnessmore » and relative depths of absorption features in moderate-resolution near-infrared transmission spectra (R ∼ 100). In a numerical retrieval study, we show for GJ 1214b that an unambiguous distinction between a cloudy H{sub 2}-dominated atmosphere and cloud-free H{sub 2}O atmosphere will be possible if the uncertainties in the spectral transit depth measurements can be reduced by a factor of ∼3 compared to the published Hubble Space Telescope Wide-Field Camera 3 and Very Large Telescope transit observations by Berta et al. and Bean et al. We argue that the required precision for the distinction may be achievable with currently available instrumentation by stacking 10-15 repeated transit observations. We provide a scaling law that scales our quantitative results to other transiting super-Earths and Neptunes such as HD 97658b, 55 Cnc e, GJ 3470b and GJ 436b. The analysis in this work is performed using an improved version of our Bayesian atmospheric retrieval framework. The new framework not only constrains the gas composition and cloud/haze parameters, but also determines our confidence in having detected molecules and cloud/haze species through Bayesian model comparison. Using the Bayesian tool, we demonstrate quantitatively that the subtle transit depth variation in the Berta et al. data is not sufficient to claim the detection of water absorption.« less

  2. Data analysis using scale-space filtering and Bayesian probabilistic reasoning

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Kutulakos, Kiriakos; Robinson, Peter

    1991-01-01

    This paper describes a program for analysis of output curves from Differential Thermal Analyzer (DTA). The program first extracts probabilistic qualitative features from a DTA curve of a soil sample, and then uses Bayesian probabilistic reasoning to infer the mineral in the soil. The qualifier module employs a simple and efficient extension of scale-space filtering suitable for handling DTA data. We have observed that points can vanish from contours in the scale-space image when filtering operations are not highly accurate. To handle the problem of vanishing points, perceptual organizations heuristics are used to group the points into lines. Next, these lines are grouped into contours by using additional heuristics. Probabilities are associated with these contours using domain-specific correlations. A Bayes tree classifier processes probabilistic features to infer the presence of different minerals in the soil. Experiments show that the algorithm that uses domain-specific correlation to infer qualitative features outperforms a domain-independent algorithm that does not.

  3. Nonparametric weighted stochastic block models

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2018-01-01

    We present a Bayesian formulation of weighted stochastic block models that can be used to infer the large-scale modular structure of weighted networks, including their hierarchical organization. Our method is nonparametric, and thus does not require the prior knowledge of the number of groups or other dimensions of the model, which are instead inferred from data. We give a comprehensive treatment of different kinds of edge weights (i.e., continuous or discrete, signed or unsigned, bounded or unbounded), as well as arbitrary weight transformations, and describe an unsupervised model selection approach to choose the best network description. We illustrate the application of our method to a variety of empirical weighted networks, such as global migrations, voting patterns in congress, and neural connections in the human brain.

  4. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  5. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  6. Fast genomic predictions via Bayesian G-BLUP and multilocus models of threshold traits including censored Gaussian data.

    PubMed

    Kärkkäinen, Hanni P; Sillanpää, Mikko J

    2013-09-04

    Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.

  7. Fast Genomic Predictions via Bayesian G-BLUP and Multilocus Models of Threshold Traits Including Censored Gaussian Data

    PubMed Central

    Kärkkäinen, Hanni P.; Sillanpää, Mikko J.

    2013-01-01

    Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed. PMID:23821618

  8. A Bayesian partition modelling approach to resolve spatial variability in climate records from borehole temperature inversion

    NASA Astrophysics Data System (ADS)

    Hopcroft, Peter O.; Gallagher, Kerry; Pain, Christopher C.

    2009-08-01

    Collections of suitably chosen borehole profiles can be used to infer large-scale trends in ground-surface temperature (GST) histories for the past few hundred years. These reconstructions are based on a large database of carefully selected borehole temperature measurements from around the globe. Since non-climatic thermal influences are difficult to identify, representative temperature histories are derived by averaging individual reconstructions to minimize the influence of these perturbing factors. This may lead to three potentially important drawbacks: the net signal of non-climatic factors may not be zero, meaning that the average does not reflect the best estimate of past climate; the averaging over large areas restricts the useful amount of more local climate change information available; and the inversion methods used to reconstruct the past temperatures at each site must be mathematically identical and are therefore not necessarily best suited to all data sets. In this work, we avoid these issues by using a Bayesian partition model (BPM), which is computed using a trans-dimensional form of a Markov chain Monte Carlo algorithm. This then allows the number and spatial distribution of different GST histories to be inferred from a given set of borehole data by partitioning the geographical area into discrete partitions. Profiles that are heavily influenced by non-climatic factors will be partitioned separately. Conversely, profiles with climatic information, which is consistent with neighbouring profiles, will then be inferred to lie in the same partition. The geographical extent of these partitions then leads to information on the regional extent of the climatic signal. In this study, three case studies are described using synthetic and real data. The first demonstrates that the Bayesian partition model method is able to correctly partition a suite of synthetic profiles according to the inferred GST history. In the second, more realistic case, a series of temperature profiles are calculated using surface air temperatures of a global climate model simulation. In the final case, 23 real boreholes from the United Kingdom, previously used for climatic reconstructions, are examined and the results compared with a local instrumental temperature series and the previous estimate derived from the same borehole data. The results indicate that the majority (17) of the 23 boreholes are unsuitable for climatic reconstruction purposes, at least without including other thermal processes in the forward model.

  9. Dynamical Bayesian inference of time-evolving interactions: from a pair of coupled oscillators to networks of oscillators.

    PubMed

    Duggento, Andrea; Stankovski, Tomislav; McClintock, Peter V E; Stefanovska, Aneta

    2012-12-01

    Living systems have time-evolving interactions that, until recently, could not be identified accurately from recorded time series in the presence of noise. Stankovski et al. [Phys. Rev. Lett. 109, 024101 (2012)] introduced a method based on dynamical Bayesian inference that facilitates the simultaneous detection of time-varying synchronization, directionality of influence, and coupling functions. It can distinguish unsynchronized dynamics from noise-induced phase slips. The method is based on phase dynamics, with Bayesian inference of the time-evolving parameters being achieved by shaping the prior densities to incorporate knowledge of previous samples. We now present the method in detail using numerically generated data, data from an analog electronic circuit, and cardiorespiratory data. We also generalize the method to encompass networks of interacting oscillators and thus demonstrate its applicability to small-scale networks.

  10. Disentangling Complexity in Bayesian Automatic Adaptive Quadrature

    NASA Astrophysics Data System (ADS)

    Adam, Gheorghe; Adam, Sanda

    2018-02-01

    The paper describes a Bayesian automatic adaptive quadrature (BAAQ) solution for numerical integration which is simultaneously robust, reliable, and efficient. Detailed discussion is provided of three main factors which contribute to the enhancement of these features: (1) refinement of the m-panel automatic adaptive scheme through the use of integration-domain-length-scale-adapted quadrature sums; (2) fast early problem complexity assessment - enables the non-transitive choice among three execution paths: (i) immediate termination (exceptional cases); (ii) pessimistic - involves time and resource consuming Bayesian inference resulting in radical reformulation of the problem to be solved; (iii) optimistic - asks exclusively for subrange subdivision by bisection; (3) use of the weaker accuracy target from the two possible ones (the input accuracy specifications and the intrinsic integrand properties respectively) - results in maximum possible solution accuracy under minimum possible computing time.

  11. Low-Complexity Polynomial Channel Estimation in Large-Scale MIMO With Arbitrary Statistics

    NASA Astrophysics Data System (ADS)

    Shariati, Nafiseh; Bjornson, Emil; Bengtsson, Mats; Debbah, Merouane

    2014-10-01

    This paper considers pilot-based channel estimation in large-scale multiple-input multiple-output (MIMO) communication systems, also known as massive MIMO, where there are hundreds of antennas at one side of the link. Motivated by the fact that computational complexity is one of the main challenges in such systems, a set of low-complexity Bayesian channel estimators, coined Polynomial ExpAnsion CHannel (PEACH) estimators, are introduced for arbitrary channel and interference statistics. While the conventional minimum mean square error (MMSE) estimator has cubic complexity in the dimension of the covariance matrices, due to an inversion operation, our proposed estimators significantly reduce this to square complexity by approximating the inverse by a L-degree matrix polynomial. The coefficients of the polynomial are optimized to minimize the mean square error (MSE) of the estimate. We show numerically that near-optimal MSEs are achieved with low polynomial degrees. We also derive the exact computational complexity of the proposed estimators, in terms of the floating-point operations (FLOPs), by which we prove that the proposed estimators outperform the conventional estimators in large-scale MIMO systems of practical dimensions while providing a reasonable MSEs. Moreover, we show that L needs not scale with the system dimensions to maintain a certain normalized MSE. By analyzing different interference scenarios, we observe that the relative MSE loss of using the low-complexity PEACH estimators is smaller in realistic scenarios with pilot contamination. On the other hand, PEACH estimators are not well suited for noise-limited scenarios with high pilot power; therefore, we also introduce the low-complexity diagonalized estimator that performs well in this regime. Finally, we ...

  12. Regional-scale integration of hydrological and geophysical data using Bayesian sequential simulation: application to field data

    NASA Astrophysics Data System (ADS)

    Ruggeri, Paolo; Irving, James; Gloaguen, Erwan; Holliger, Klaus

    2013-04-01

    Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches to the regional scale still represents a major challenge, yet is critically important for the development of groundwater flow and contaminant transport models. To address this issue, we have developed a regional-scale hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure. The objective is to simulate the regional-scale distribution of a hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, our approach first involves linking the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. We present the application of this methodology to a pertinent field scenario, where we consider collocated high-resolution measurements of the electrical conductivity, measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, estimated from EM flowmeter and slug test measurements, in combination with low-resolution exhaustive electrical conductivity estimates obtained from dipole-dipole ERT meausurements.

  13. European Invasion of North American Pinus strobus at Large and Fine Scales: High Genetic Diversity and Fine-Scale Genetic Clustering over Time in the Adventive Range

    PubMed Central

    Mandák, Bohumil; Hadincová, Věroslava; Mahelka, Václav; Wildová, Radka

    2013-01-01

    Background North American Pinus strobus is a highly invasive tree species in Central Europe. Using ten polymorphic microsatellite loci we compared various aspects of the large-scale genetic diversity of individuals from 30 sites in the native distribution range with those from 30 sites in the European adventive distribution range. To investigate the ascertained pattern of genetic diversity of this intercontinental comparison further, we surveyed fine-scale genetic diversity patterns and changes over time within four highly invasive populations in the adventive range. Results Our data show that at the large scale the genetic diversity found within the relatively small adventive range in Central Europe, surprisingly, equals the diversity found within the sampled area in the native range, which is about thirty times larger. Bayesian assignment grouped individuals into two genetic clusters separating North American native populations from the European, non-native populations, without any strong genetic structure shown over either range. In the case of the fine scale, our comparison of genetic diversity parameters among the localities and age classes yielded no evidence of genetic diversity increase over time. We found that SGS differed across age classes within the populations under study. Old trees in general completely lacked any SGS, which increased over time and reached its maximum in the sapling stage. Conclusions Based on (1) the absence of difference in genetic diversity between the native and adventive ranges, together with the lack of structure in the native range, and (2) the lack of any evidence of any temporal increase in genetic diversity at four highly invasive populations in the adventive range, we conclude that population amalgamation probably first happened in the native range, prior to introduction. In such case, there would have been no need for multiple introductions from previously isolated populations, but only several introductions from genetically diverse populations. PMID:23874648

  14. Prevalence and risk factors of Coxiella burnetii seropositivity in Danish beef and dairy cattle at slaughter adjusted for test uncertainty.

    PubMed

    Paul, Suman; Agger, Jens F; Agerholm, Jørgen S; Markussen, Bo

    2014-03-01

    Antibodies to Coxiella burnetii have been found in the Danish dairy cattle population with high levels of herd and within herd seroprevalences. However, the prevalence of antibodies to C. burnetii in Danish beef cattle remains unknown. The objectives of this study were to (1) estimate the prevalence and (2) identify risk factors associated with C. burnetii seropositivity in Danish beef and dairy cattle based on sampling at slaughter. Eight hundred blood samples from slaughtered cattle were collected from six Danish slaughter houses from August to October 2012 following a random sampling procedure. Blood samples were tested by a commercially available C. burnetii antibody ELISA kit. A sample was defined positive if the sample-to-positive ratio was greater than or equal to 40. Animal and herd information were extracted from the Danish Cattle Database. Apparent (AP) and true prevalences (TPs) specific for breed, breed groups, gender and herd type; and breed-specific true prevalences with a random effect of breed was estimated in a Bayesian framework. A Bayesian logistic regression model was used to identify risk factors of C. burnetii seropositivity. Test sensitivity and specificity estimates from a previous study involving Danish dairy cattle were used to generate prior information. The prevalence was significantly higher in dairy breeds (AP=9.11%; TP=9.45%) than in beef breeds (AP=4.32%; TP=3.54%), in females (AP=9.10%; TP=9.40%) than in males (AP=3.62%; TP=2.61%) and in dairy herds (AP=15.10%; TP=16.67%) compared to beef herds (AP=4.54%; TP=3.66%). The Bayesian logistic regression model identified breed group along with age, and number of movements as contributors for C. burnetii seropositivity. The risk of seropositivity increased with age and increasing number of movements between herds. Results indicate that seroprevalence of C. burnetii is lower in cattle sent for slaughter than in Danish dairy cows in production units. A greater proportion of this prevalence is attributed to slaughtered cattle of dairy breeds or cattle raised in dairy herds rather than beef breeds. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Identification of transmissivity fields using a Bayesian strategy and perturbative approach

    NASA Astrophysics Data System (ADS)

    Zanini, Andrea; Tanda, Maria Giovanna; Woodbury, Allan D.

    2017-10-01

    The paper deals with the crucial problem of the groundwater parameter estimation that is the basis for efficient modeling and reclamation activities. A hierarchical Bayesian approach is developed: it uses the Akaike's Bayesian Information Criteria in order to estimate the hyperparameters (related to the covariance model chosen) and to quantify the unknown noise variance. The transmissivity identification proceeds in two steps: the first, called empirical Bayesian interpolation, uses Y* (Y = lnT) observations to interpolate Y values on a specified grid; the second, called empirical Bayesian update, improve the previous Y estimate through the addition of hydraulic head observations. The relationship between the head and the lnT has been linearized through a perturbative solution of the flow equation. In order to test the proposed approach, synthetic aquifers from literature have been considered. The aquifers in question contain a variety of boundary conditions (both Dirichelet and Neuman type) and scales of heterogeneities (σY2 = 1.0 and σY2 = 5.3). The estimated transmissivity fields were compared to the true one. The joint use of Y* and head measurements improves the estimation of Y considering both degrees of heterogeneity. Even if the variance of the strong transmissivity field can be considered high for the application of the perturbative approach, the results show the same order of approximation of the non-linear methods proposed in literature. The procedure allows to compute the posterior probability distribution of the target quantities and to quantify the uncertainty in the model prediction. Bayesian updating has advantages related both to the Monte-Carlo (MC) and non-MC approaches. In fact, as the MC methods, Bayesian updating allows computing the direct posterior probability distribution of the target quantities and as non-MC methods it has computational times in the order of seconds.

  16. Novel probabilistic and distributed algorithms for guidance, control, and nonlinear estimation of large-scale multi-agent systems

    NASA Astrophysics Data System (ADS)

    Bandyopadhyay, Saptarshi

    Multi-agent systems are widely used for constructing a desired formation shape, exploring an area, surveillance, coverage, and other cooperative tasks. This dissertation introduces novel algorithms in the three main areas of shape formation, distributed estimation, and attitude control of large-scale multi-agent systems. In the first part of this dissertation, we address the problem of shape formation for thousands to millions of agents. Here, we present two novel algorithms for guiding a large-scale swarm of robotic systems into a desired formation shape in a distributed and scalable manner. These probabilistic swarm guidance algorithms adopt an Eulerian framework, where the physical space is partitioned into bins and the swarm's density distribution over each bin is controlled using tunable Markov chains. In the first algorithm - Probabilistic Swarm Guidance using Inhomogeneous Markov Chains (PSG-IMC) - each agent determines its bin transition probabilities using a time-inhomogeneous Markov chain that is constructed in real-time using feedback from the current swarm distribution. This PSG-IMC algorithm minimizes the expected cost of the transitions required to achieve and maintain the desired formation shape, even when agents are added to or removed from the swarm. The algorithm scales well with a large number of agents and complex formation shapes, and can also be adapted for area exploration applications. In the second algorithm - Probabilistic Swarm Guidance using Optimal Transport (PSG-OT) - each agent determines its bin transition probabilities by solving an optimal transport problem, which is recast as a linear program. In the presence of perfect feedback of the current swarm distribution, this algorithm minimizes the given cost function, guarantees faster convergence, reduces the number of transitions for achieving the desired formation, and is robust to disturbances or damages to the formation. We demonstrate the effectiveness of these two proposed swarm guidance algorithms using results from numerical simulations and closed-loop hardware experiments on multiple quadrotors. In the second part of this dissertation, we present two novel discrete-time algorithms for distributed estimation, which track a single target using a network of heterogeneous sensing agents. The Distributed Bayesian Filtering (DBF) algorithm, the sensing agents combine their normalized likelihood functions using the logarithmic opinion pool and the discrete-time dynamic average consensus algorithm. Each agent's estimated likelihood function converges to an error ball centered on the joint likelihood function of the centralized multi-sensor Bayesian filtering algorithm. Using a new proof technique, the convergence, stability, and robustness properties of the DBF algorithm are rigorously characterized. The explicit bounds on the time step of the robust DBF algorithm are shown to depend on the time-scale of the target dynamics. Furthermore, the DBF algorithm for linear-Gaussian models can be cast into a modified form of the Kalman information filter. In the Bayesian Consensus Filtering (BCF) algorithm, the agents combine their estimated posterior pdfs multiple times within each time step using the logarithmic opinion pool scheme. Thus, each agent's consensual pdf minimizes the sum of Kullback-Leibler divergences with the local posterior pdfs. The performance and robust properties of these algorithms are validated using numerical simulations. In the third part of this dissertation, we present an attitude control strategy and a new nonlinear tracking controller for a spacecraft carrying a large object, such as an asteroid or a boulder. If the captured object is larger or comparable in size to the spacecraft and has significant modeling uncertainties, conventional nonlinear control laws that use exact feed-forward cancellation are not suitable because they exhibit a large resultant disturbance torque. The proposed nonlinear tracking control law guarantees global exponential convergence of tracking errors with finite-gain Lp stability in the presence of modeling uncertainties and disturbances, and reduces the resultant disturbance torque. Further, this control law permits the use of any attitude representation and its integral control formulation eliminates any constant disturbance. Under small uncertainties, the best strategy for stabilizing the combined system is to track a fuel-optimal reference trajectory using this nonlinear control law, because it consumes the least amount of fuel. In the presence of large uncertainties, the most effective strategy is to track the derivative plus proportional-derivative based reference trajectory, because it reduces the resultant disturbance torque. The effectiveness of the proposed attitude control law is demonstrated by using results of numerical simulation based on an Asteroid Redirect Mission concept. The new algorithms proposed in this dissertation will facilitate the development of versatile autonomous multi-agent systems that are capable of performing a variety of complex tasks in a robust and scalable manner.

  17. Horvitz-Thompson survey sample methods for estimating large-scale animal abundance

    USGS Publications Warehouse

    Samuel, M.D.; Garton, E.O.

    1994-01-01

    Large-scale surveys to estimate animal abundance can be useful for monitoring population status and trends, for measuring responses to management or environmental alterations, and for testing ecological hypotheses about abundance. However, large-scale surveys may be expensive and logistically complex. To ensure resources are not wasted on unattainable targets, the goals and uses of each survey should be specified carefully and alternative methods for addressing these objectives always should be considered. During survey design, the impoflance of each survey error component (spatial design, propofiion of detected animals, precision in detection) should be considered carefully to produce a complete statistically based survey. Failure to address these three survey components may produce population estimates that are inaccurate (biased low), have unrealistic precision (too precise) and do not satisfactorily meet the survey objectives. Optimum survey design requires trade-offs in these sources of error relative to the costs of sampling plots and detecting animals on plots, considerations that are specific to the spatial logistics and survey methods. The Horvitz-Thompson estimators provide a comprehensive framework for considering all three survey components during the design and analysis of large-scale wildlife surveys. Problems of spatial and temporal (especially survey to survey) heterogeneity in detection probabilities have received little consideration, but failure to account for heterogeneity produces biased population estimates. The goal of producing unbiased population estimates is in conflict with the increased variation from heterogeneous detection in the population estimate. One solution to this conflict is to use an MSE-based approach to achieve a balance between bias reduction and increased variation. Further research is needed to develop methods that address spatial heterogeneity in detection, evaluate the effects of temporal heterogeneity on survey objectives and optimize decisions related to survey bias and variance. Finally, managers and researchers involved in the survey design process must realize that obtaining the best survey results requires an interactive and recursive process of survey design, execution, analysis and redesign. Survey refinements will be possible as further knowledge is gained on the actual abundance and distribution of the population and on the most efficient techniques for detection animals.

  18. New seismogenic stress fields for southern Italy from a Bayesian approach

    NASA Astrophysics Data System (ADS)

    Totaro, Cristina; Orecchio, Barbara; Presti, Debora; Scolaro, Silvia; Neri, Giancarlo

    2017-04-01

    A new database of high-quality waveform inversion focal mechanism has been compiled for southern Italy by integrating the highest quality solutions, available from literature and catalogues, and 146 newly-computed ones. All the selected focal mechanisms are (i) coming from the Italian CMT, Regional CMT and TDMT catalogues (Pondrelli et al., PEPI 2006, PEPI 2011; http://www.ingv.it), or (ii) computed by using the Cut And Paste (CAP) method (Zhao & Helmberger, BSSA 1994; Zhu & Helmberger, BSSA 1996). Specific tests have been carried out in order to evaluate the robustness of the obtained solutions (e.g., by varying both seismic network configuration and Earth structure parameters) and to estimate uncertainties on the focal mechanism parameters. Only the resulting highest-quality solutions have been enclosed in the database, that has then been used for computation of posterior density distributions of stress tensor components by a Bayesian method (Arnold & Townend, GJI 2007). This algorithm furnishes the posterior density function of the principal components of stress tensor (maximum σ1, intermediate σ2, and minimum σ3 compressive stress, respectively) and the stress-magnitude ratio (R). Before stress computation, we applied the k-means clustering algorithm to subdivide the focal mechanism catalog on the basis of earthquake locations. This approach allows identifying the sectors to be investigated without any "a priori" constraint from faulting type distribution. The large amount of data and the application of the Bayesian algorithm allowed us to provide a more accurate local-to-regional scale stress distribution that has shed new light on the kinematics and dynamics of this very complex area, where lithospheric unit configuration and geodynamic engines are still strongly debated. The new high-quality information here furnished will then represent very useful tools and constraints for future geophysical analyses and geodynamic modeling.

  19. Evaluating the Impact of Genomic Data and Priors on Bayesian Estimates of the Angiosperm Evolutionary Timescale.

    PubMed

    Foster, Charles S P; Sauquet, Hervê; van der Merwe, Marlien; McPherson, Hannah; Rossetto, Maurizio; Ho, Simon Y W

    2017-05-01

    The evolutionary timescale of angiosperms has long been a key question in biology. Molecular estimates of this timescale have shown considerable variation, being influenced by differences in taxon sampling, gene sampling, fossil calibrations, evolutionary models, and choices of priors. Here, we analyze a data set comprising 76 protein-coding genes from the chloroplast genomes of 195 taxa spanning 86 families, including novel genome sequences for 11 taxa, to evaluate the impact of models, priors, and gene sampling on Bayesian estimates of the angiosperm evolutionary timescale. Using a Bayesian relaxed molecular-clock method, with a core set of 35 minimum and two maximum fossil constraints, we estimated that crown angiosperms arose 221 (251-192) Ma during the Triassic. Based on a range of additional sensitivity and subsampling analyses, we found that our date estimates were generally robust to large changes in the parameters of the birth-death tree prior and of the model of rate variation across branches. We found an exception to this when we implemented fossil calibrations in the form of highly informative gamma priors rather than as uniform priors on node ages. Under all other calibration schemes, including trials of seven maximum age constraints, we consistently found that the earliest divergences of angiosperm clades substantially predate the oldest fossils that can be assigned unequivocally to their crown group. Overall, our results and experiments with genome-scale data suggest that reliable estimates of the angiosperm crown age will require increased taxon sampling, significant methodological changes, and new information from the fossil record. [Angiospermae, chloroplast, genome, molecular dating, Triassic.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Distributed Bayesian Computation and Self-Organized Learning in Sheets of Spiking Neurons with Local Lateral Inhibition

    PubMed Central

    Bill, Johannes; Buesing, Lars; Habenschuss, Stefan; Nessler, Bernhard; Maass, Wolfgang; Legenstein, Robert

    2015-01-01

    During the last decade, Bayesian probability theory has emerged as a framework in cognitive science and neuroscience for describing perception, reasoning and learning of mammals. However, our understanding of how probabilistic computations could be organized in the brain, and how the observed connectivity structure of cortical microcircuits supports these calculations, is rudimentary at best. In this study, we investigate statistical inference and self-organized learning in a spatially extended spiking network model, that accommodates both local competitive and large-scale associative aspects of neural information processing, under a unified Bayesian account. Specifically, we show how the spiking dynamics of a recurrent network with lateral excitation and local inhibition in response to distributed spiking input, can be understood as sampling from a variational posterior distribution of a well-defined implicit probabilistic model. This interpretation further permits a rigorous analytical treatment of experience-dependent plasticity on the network level. Using machine learning theory, we derive update rules for neuron and synapse parameters which equate with Hebbian synaptic and homeostatic intrinsic plasticity rules in a neural implementation. In computer simulations, we demonstrate that the interplay of these plasticity rules leads to the emergence of probabilistic local experts that form distributed assemblies of similarly tuned cells communicating through lateral excitatory connections. The resulting sparse distributed spike code of a well-adapted network carries compressed information on salient input features combined with prior experience on correlations among them. Our theory predicts that the emergence of such efficient representations benefits from network architectures in which the range of local inhibition matches the spatial extent of pyramidal cells that share common afferent input. PMID:26284370

  1. Leaf optical properties shed light on foliar trait variability at individual to global scales

    NASA Astrophysics Data System (ADS)

    Shiklomanov, A. N.; Serbin, S.; Dietze, M.

    2017-12-01

    Recent syntheses of large trait databases have contributed immensely to our understanding of drivers of plant function at the global scale. However, the global trade-offs revealed by such syntheses, such as the trade-off between leaf productivity and resilience (i.e. "leaf economics spectrum"), are often absent at smaller scales and fail to correlate with actual functional limitations. An improved understanding of how traits vary among communities, species, and individuals is critical to accurate representations of vegetation ecophysiology and ecological dynamics in ecosystem models. Spectral data from both field observations and remote sensing platforms present a rich and widely available source of information on plant traits. Here, we apply Bayesian inversion of the PROSPECT leaf radiative transfer model to a large global database of over 60,000 field spectra and plant traits to (1) comprehensively assess the accuracy of leaf trait estimation using PROSPECT spectral inversion; (2) investigate the correlations between optical traits estimable from PROSPECT and other important foliar traits such as nitrogen and lignin concentrations; and (3) identify dominant sources of variability and characterize trade-offs in optical and non-optical foliar traits. Our work provides a key methodological contribution by validating physically-based retrieval of plant traits from remote sensing observations, and provides insights about trait trade-offs related to plant acclimation, adaptation, and community assembly.

  2. Mapping porosity of the deep critical zone in 3D using near-surface geophysics, rock physics modeling, and drilling

    NASA Astrophysics Data System (ADS)

    Flinchum, B. A.; Holbrook, W. S.; Grana, D.; Parsekian, A.; Carr, B.; Jiao, J.

    2017-12-01

    Porosity is generated by chemical, physical and biological processes that work to transform bedrock into soil. The resulting porosity structure can provide specifics about these processes and can improve understanding groundwater storage in the deep critical zone. Near-surface geophysical methods, when combined with rock physics and drilling, can be a tool used to map porosity over large spatial scales. In this study, we estimate porosity in three-dimensions (3D) across a 58 Ha granite catchment. Observations focus on seismic refraction, downhole nuclear magnetic resonance logs, downhole sonic logs, and samples of core acquired by push coring. We use a novel petrophysical approach integrating two rock physics models, a porous medium for the saprolite and a differential effective medium for the fractured rock, that drive a Bayesian inversion to calculate porosity from seismic velocities. The inverted geophysical porosities are within about 0.05 m3/m3 of lab measured values. We extrapolate the porosity estimates below seismic refraction lines to a 3D volume using ordinary kriging to map the distribution of porosity in 3D up to depths of 80 m. This study provides a unique map of porosity on scale never-before-seen in critical zone science. Estimating porosity on these large spatial scales opens the door for improving and understanding the processes that shape the deep critical zone.

  3. Construction and Application of a Refined Hospital Management Chain.

    PubMed

    Lihua, Yi

    2016-01-01

    Large scale development was quite common in the later period of hospital industrialization in China. Today, Chinese hospital management faces such problems as service inefficiency, high human resources cost, and low rate of capital use. This study analyzes the refined management chain of Wuxi No.2 People's Hospital. This consists of six gears namely, "organizational structure, clinical practice, outpatient service, medical technology, and nursing care and logistics." The gears are based on "flat management system targets, chief of medical staff, centralized outpatient service, intensified medical examinations, vertical nursing management and socialized logistics." The core concepts of refined hospital management are optimizing flow process, reducing waste, improving efficiency, saving costs, and taking good care of patients as most important. Keywords: Hospital, Refined, Management chain

  4. A Bayesian Framework for False Belief Reasoning in Children: A Rational Integration of Theory-Theory and Simulation Theory

    PubMed Central

    Asakura, Nobuhiko; Inui, Toshio

    2016-01-01

    Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities. PMID:28082941

  5. A Bayesian Framework for False Belief Reasoning in Children: A Rational Integration of Theory-Theory and Simulation Theory.

    PubMed

    Asakura, Nobuhiko; Inui, Toshio

    2016-01-01

    Two apparently contrasting theories have been proposed to account for the development of children's theory of mind (ToM): theory-theory and simulation theory. We present a Bayesian framework that rationally integrates both theories for false belief reasoning. This framework exploits two internal models for predicting the belief states of others: one of self and one of others. These internal models are responsible for simulation-based and theory-based reasoning, respectively. The framework further takes into account empirical studies of a developmental ToM scale (e.g., Wellman and Liu, 2004): developmental progressions of various mental state understandings leading up to false belief understanding. By representing the internal models and their interactions as a causal Bayesian network, we formalize the model of children's false belief reasoning as probabilistic computations on the Bayesian network. This model probabilistically weighs and combines the two internal models and predicts children's false belief ability as a multiplicative effect of their early-developed abilities to understand the mental concepts of diverse beliefs and knowledge access. Specifically, the model predicts that children's proportion of correct responses on a false belief task can be closely approximated as the product of their proportions correct on the diverse belief and knowledge access tasks. To validate this prediction, we illustrate that our model provides good fits to a variety of ToM scale data for preschool children. We discuss the implications and extensions of our model for a deeper understanding of developmental progressions of children's ToM abilities.

  6. Predicting coastal cliff erosion using a Bayesian probabilistic model

    USGS Publications Warehouse

    Hapke, Cheryl J.; Plant, Nathaniel G.

    2010-01-01

    Regional coastal cliff retreat is difficult to model due to the episodic nature of failures and the along-shore variability of retreat events. There is a growing demand, however, for predictive models that can be used to forecast areas vulnerable to coastal erosion hazards. Increasingly, probabilistic models are being employed that require data sets of high temporal density to define the joint probability density function that relates forcing variables (e.g. wave conditions) and initial conditions (e.g. cliff geometry) to erosion events. In this study we use a multi-parameter Bayesian network to investigate correlations between key variables that control and influence variations in cliff retreat processes. The network uses Bayesian statistical methods to estimate event probabilities using existing observations. Within this framework, we forecast the spatial distribution of cliff retreat along two stretches of cliffed coast in Southern California. The input parameters are the height and slope of the cliff, a descriptor of material strength based on the dominant cliff-forming lithology, and the long-term cliff erosion rate that represents prior behavior. The model is forced using predicted wave impact hours. Results demonstrate that the Bayesian approach is well-suited to the forward modeling of coastal cliff retreat, with the correct outcomes forecast in 70–90% of the modeled transects. The model also performs well in identifying specific locations of high cliff erosion, thus providing a foundation for hazard mapping. This approach can be employed to predict cliff erosion at time-scales ranging from storm events to the impacts of sea-level rise at the century-scale.

  7. Stochastic extreme downscaling model for an assessment of changes in rainfall intensity-duration-frequency curves over South Korea using multiple regional climate models

    NASA Astrophysics Data System (ADS)

    So, Byung-Jin; Kim, Jin-Young; Kwon, Hyun-Han; Lima, Carlos H. R.

    2017-10-01

    A conditional copula function based downscaling model in a fully Bayesian framework is developed in this study to evaluate future changes in intensity-duration frequency (IDF) curves in South Korea. The model incorporates a quantile mapping approach for bias correction while integrated Bayesian inference allows accounting for parameter uncertainties. The proposed approach is used to temporally downscale expected changes in daily rainfall, inferred from multiple CORDEX-RCMs based on Representative Concentration Pathways (RCPs) 4.5 and 8.5 scenarios, into sub-daily temporal scales. Among the CORDEX-RCMs, a noticeable increase in rainfall intensity is observed in the HadGem3-RA (9%), RegCM (28%), and SNU_WRF (13%) on average, whereas no noticeable changes are observed in the GRIMs (-2%) for the period 2020-2050. More specifically, a 5-30% increase in rainfall intensity is expected in all of the CORDEX-RCMs for 50-year return values under the RCP 8.5 scenario. Uncertainty in simulated rainfall intensity gradually decreases toward the longer durations, which is largely associated with the enhanced strength of the relationship with the 24-h annual maximum rainfalls (AMRs). A primary advantage of the proposed model is that projected changes in future rainfall intensities are well preserved.

  8. An improved approximate-Bayesian model-choice method for estimating shared evolutionary history

    PubMed Central

    2014-01-01

    Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937

  9. An accessible method for implementing hierarchical models with spatio-temporal abundance data

    USGS Publications Warehouse

    Ross, Beth E.; Hooten, Melvin B.; Koons, David N.

    2012-01-01

    A common goal in ecology and wildlife management is to determine the causes of variation in population dynamics over long periods of time and across large spatial scales. Many assumptions must nevertheless be overcome to make appropriate inference about spatio-temporal variation in population dynamics, such as autocorrelation among data points, excess zeros, and observation error in count data. To address these issues, many scientists and statisticians have recommended the use of Bayesian hierarchical models. Unfortunately, hierarchical statistical models remain somewhat difficult to use because of the necessary quantitative background needed to implement them, or because of the computational demands of using Markov Chain Monte Carlo algorithms to estimate parameters. Fortunately, new tools have recently been developed that make it more feasible for wildlife biologists to fit sophisticated hierarchical Bayesian models (i.e., Integrated Nested Laplace Approximation, ‘INLA’). We present a case study using two important game species in North America, the lesser and greater scaup, to demonstrate how INLA can be used to estimate the parameters in a hierarchical model that decouples observation error from process variation, and accounts for unknown sources of excess zeros as well as spatial and temporal dependence in the data. Ultimately, our goal was to make unbiased inference about spatial variation in population trends over time.

  10. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method.

    PubMed

    Norris, Peter M; da Silva, Arlindo M

    2016-07-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  11. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability Using High-Resolution Cloud Observations. Part 1: Method

    NASA Technical Reports Server (NTRS)

    Norris, Peter M.; Da Silva, Arlindo M.

    2016-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.

  12. Monte Carlo Bayesian inference on a statistical model of sub-gridcolumn moisture variability using high-resolution cloud observations. Part 1: Method

    PubMed Central

    Norris, Peter M.; da Silva, Arlindo M.

    2018-01-01

    A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847

  13. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

    PubMed Central

    Burger, Lukas; van Nimwegen, Erik

    2008-01-01

    Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381

  14. Optimization of nonlinear, non-Gaussian Bayesian filtering for diagnosis and prognosis of monotonic degradation processes

    NASA Astrophysics Data System (ADS)

    Corbetta, Matteo; Sbarufatti, Claudio; Giglio, Marco; Todd, Michael D.

    2018-05-01

    The present work critically analyzes the probabilistic definition of dynamic state-space models subject to Bayesian filters used for monitoring and predicting monotonic degradation processes. The study focuses on the selection of the random process, often called process noise, which is a key perturbation source in the evolution equation of particle filtering. Despite the large number of applications of particle filtering predicting structural degradation, the adequacy of the picked process noise has not been investigated. This paper reviews existing process noise models that are typically embedded in particle filters dedicated to monitoring and predicting structural damage caused by fatigue, which is monotonic in nature. The analysis emphasizes that existing formulations of the process noise can jeopardize the performance of the filter in terms of state estimation and remaining life prediction (i.e., damage prognosis). This paper subsequently proposes an optimal and unbiased process noise model and a list of requirements that the stochastic model must satisfy to guarantee high prognostic performance. These requirements are useful for future and further implementations of particle filtering for monotonic system dynamics. The validity of the new process noise formulation is assessed against experimental fatigue crack growth data from a full-scale aeronautical structure using dedicated performance metrics.

  15. Psychosocial stress factors, including the relationship with the coach, and their influence on acute and overuse injury risk in elite female football players

    PubMed Central

    Pensgaard, Anne Marte; Ivarsson, Andreas; Nilstad, Agnethe; Solstad, Bård Erlend; Steffen, Kathrin

    2018-01-01

    Background The relationship between specific types of stressors (eg, teammates, coach) and acute versus overuse injuries is not well understood. Objective To examine the roles of different types of stressors as well as the effect of motivational climate on the occurrence of acute and overuse injuries. Methods Players in the Norwegian elite female football league (n=193 players from 12 teams) participated in baseline screening tests prior to the 2009 competitive football season. As part of the screening, we included the Life Event Survey for Collegiate Athletes and the Perceived Motivational Climate in Sport Questionnaire (Norwegian short version). Acute and overuse time-loss injuries and exposure to training and matches were recorded prospectively in the football season using weekly text messaging. Data were analysed with Bayesian logistic regression analyses. Results Using Bayesian logistic regression analyses, we showed that perceived negative life event stress from teammates was associated with an increased risk of acute injuries (OR=1.23, 95% credibility interval (1.01 to 1.48)). There was a credible positive association between perceived negative life event stress from the coach and the risk of overuse injuries (OR=1.21, 95% credibility interval (1.01 to 1.45)). Conclusions Players who report teammates as a source of stress have a greater risk of sustaining an acute injury, while players reporting the coach as a source of stress are at greater risk of sustaining an overuse injury. Motivational climate did not relate to increased injury occurrence. PMID:29629182

  16. Clinical features of pure obsessive-compulsive disorder.

    PubMed

    Torres, Albina R; Shavitt, Roseli G; Torresan, Ricardo C; Ferrão, Ygor A; Miguel, Euripedes C; Fontenelle, Leonardo F

    2013-10-01

    Psychiatric comorbidity is the rule in obsessive-compulsive disorder (OCD); however, very few studies have evaluated the clinical characteristics of patients with no co-occurring disorders (non-comorbid or "pure" OCD). The aim of this study was to estimate the prevalence of pure cases in a large multicenter sample of OCD patients and compare the sociodemographic and clinical characteristics of individuals with and without any lifetime axis I comorbidity. A cross-sectional study with 955 adult patients of the Brazilian Research Consortium on Obsessive-Compulsive Spectrum Disorders (C-TOC). Assessment instruments included the Yale-Brown Obsessive-Compulsive Scale, the Dimensional Yale-Brown Obsessive-Compulsive Scale, The USP-Sensory Phenomena Scale and the Brown Assessment of Beliefs Scale. Comorbidities were evaluated using the Structured Clinical Interview for DSM-IV Axis I Disorders. Bivariate analyses were followed by logistic regression. Only 74 patients (7.7%) presented pure OCD. Compared with those presenting at least one lifetime comorbidity (881, 92.3%), non-comorbid patients were more likely to be female and to be working, reported less traumatic experiences and presented lower scores in the Y-BOCS obsession subscale and in total DY-BOCS scores. All symptom dimensions except contamination-cleaning and hoarding were less severe in non-comorbid patients. They also presented less severe depression and anxiety, lower suicidality and less previous treatments. In the logistic regression, the following variables predicted pure OCD: sex, severity of depressive and anxious symptoms, previous suicidal thoughts and psychotherapy. Pure OCD patients were the minority in this large sample and were characterized by female sex, less severe depressive and anxious symptoms, less suicidal thoughts and less use of psychotherapy as a treatment modality. The implications of these findings for clinical practice are discussed. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. A simplified gross primary production and evapotranspiration model for boreal coniferous forests - is a generic calibration sufficient?

    NASA Astrophysics Data System (ADS)

    Minunno, F.; Peltoniemi, M.; Launiainen, S.; Aurela, M.; Lindroth, A.; Lohila, A.; Mammarella, I.; Minkkinen, K.; Mäkelä, A.

    2015-07-01

    The problem of model complexity has been lively debated in environmental sciences as well as in the forest modelling community. Simple models are less input demanding and their calibration involves a lower number of parameters, but they might be suitable only at local scale. In this work we calibrated a simplified ecosystem process model (PRELES) to data from multiple sites and we tested if PRELES can be used at regional scale to estimate the carbon and water fluxes of Boreal conifer forests. We compared a multi-site (M-S) with site-specific (S-S) calibrations. Model calibrations and evaluations were carried out by the means of the Bayesian method; Bayesian calibration (BC) and Bayesian model comparison (BMC) were used to quantify the uncertainty in model parameters and model structure. To evaluate model performances BMC results were combined with more classical analysis of model-data mismatch (M-DM). Evapotranspiration (ET) and gross primary production (GPP) measurements collected in 10 sites of Finland and Sweden were used in the study. Calibration results showed that similar estimates were obtained for the parameters at which model outputs are most sensitive. No significant differences were encountered in the predictions of the multi-site and site-specific versions of PRELES with exception of a site with agricultural history (Alkkia). Although PRELES predicted GPP better than evapotranspiration, we concluded that the model can be reliably used at regional scale to simulate carbon and water fluxes of Boreal forests. Our analyses underlined also the importance of using long and carefully collected flux datasets in model calibration. In fact, even a single site can provide model calibrations that can be applied at a wider spatial scale, since it covers a wide range of variability in climatic conditions.

  18. Comparison of adjoint and analytical Bayesian inversion methods for constraining Asian sources of carbon monoxide using satellite (MOPITT) measurements of CO columns

    NASA Astrophysics Data System (ADS)

    Kopacz, Monika; Jacob, Daniel J.; Henze, Daven K.; Heald, Colette L.; Streets, David G.; Zhang, Qiang

    2009-02-01

    We apply the adjoint of an atmospheric chemical transport model (GEOS-Chem CTM) to constrain Asian sources of carbon monoxide (CO) with 2° × 2.5° spatial resolution using Measurement of Pollution in the Troposphere (MOPITT) satellite observations of CO columns in February-April 2001. Results are compared to the more common analytical method for solving the same Bayesian inverse problem and applied to the same data set. The analytical method is more exact but because of computational limitations it can only constrain emissions over coarse regions. We find that the correction factors to the a priori CO emission inventory from the adjoint inversion are generally consistent with those of the analytical inversion when averaged over the large regions of the latter. The adjoint solution reveals fine-scale variability (cities, political boundaries) that the analytical inversion cannot resolve, for example, in the Indian subcontinent or between Korea and Japan, and some of that variability is of opposite sign which points to large aggregation errors in the analytical solution. Upward correction factors to Chinese emissions from the prior inventory are largest in central and eastern China, consistent with a recent bottom-up revision of that inventory, although the revised inventory also sees the need for upward corrections in southern China where the adjoint and analytical inversions call for downward correction. Correction factors for biomass burning emissions derived from the adjoint and analytical inversions are consistent with a recent bottom-up inventory on the basis of MODIS satellite fire data.

  19. Enhancements of Bayesian Blocks; Application to Large Light Curve Databases

    NASA Technical Reports Server (NTRS)

    Scargle, Jeff

    2015-01-01

    Bayesian Blocks are optimal piecewise linear representations (step function fits) of light-curves. The simple algorithm implementing this idea, using dynamic programming, has been extended to include more data modes and fitness metrics, multivariate analysis, and data on the circle (Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations, Scargle, Norris, Jackson and Chiang 2013, ApJ, 764, 167), as well as new results on background subtraction and refinement of the procedure for precise timing of transient events in sparse data. Example demonstrations will include exploratory analysis of the Kepler light curve archive in a search for "star-tickling" signals from extraterrestrial civilizations. (The Cepheid Galactic Internet, Learned, Kudritzki, Pakvasa1, and Zee, 2008, arXiv: 0809.0339; Walkowicz et al., in progress).

  20. Greenhouse Gas Source Attribution: Measurements Modeling and Uncertainty Quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Zhen; Safta, Cosmin; Sargsyan, Khachik

    2014-09-01

    In this project we have developed atmospheric measurement capabilities and a suite of atmospheric modeling and analysis tools that are well suited for verifying emissions of green- house gases (GHGs) on an urban-through-regional scale. We have for the first time applied the Community Multiscale Air Quality (CMAQ) model to simulate atmospheric CO 2 . This will allow for the examination of regional-scale transport and distribution of CO 2 along with air pollutants traditionally studied using CMAQ at relatively high spatial and temporal resolution with the goal of leveraging emissions verification efforts for both air quality and climate. We have developedmore » a bias-enhanced Bayesian inference approach that can remedy the well-known problem of transport model errors in atmospheric CO 2 inversions. We have tested the approach using data and model outputs from the TransCom3 global CO 2 inversion comparison project. We have also performed two prototyping studies on inversion approaches in the generalized convection-diffusion context. One of these studies employed Polynomial Chaos Expansion to accelerate the evaluation of a regional transport model and enable efficient Markov Chain Monte Carlo sampling of the posterior for Bayesian inference. The other approach uses de- terministic inversion of a convection-diffusion-reaction system in the presence of uncertainty. These approaches should, in principle, be applicable to realistic atmospheric problems with moderate adaptation. We outline a regional greenhouse gas source inference system that integrates (1) two ap- proaches of atmospheric dispersion simulation and (2) a class of Bayesian inference and un- certainty quantification algorithms. We use two different and complementary approaches to simulate atmospheric dispersion. Specifically, we use a Eulerian chemical transport model CMAQ and a Lagrangian Particle Dispersion Model - FLEXPART-WRF. These two models share the same WRF assimilated meteorology fields, making it possible to perform a hybrid simulation, in which the Eulerian model (CMAQ) can be used to compute the initial condi- tion needed by the Lagrangian model, while the source-receptor relationships for a large state vector can be efficiently computed using the Lagrangian model in its backward mode. In ad- dition, CMAQ has a complete treatment of atmospheric chemistry of a suite of traditional air pollutants, many of which could help attribute GHGs from different sources. The inference of emissions sources using atmospheric observations is cast as a Bayesian model calibration problem, which is solved using a variety of Bayesian techniques, such as the bias-enhanced Bayesian inference algorithm, which accounts for the intrinsic model deficiency, Polynomial Chaos Expansion to accelerate model evaluation and Markov Chain Monte Carlo sampling, and Karhunen-Lo %60 eve (KL) Expansion to reduce the dimensionality of the state space. We have established an atmospheric measurement site in Livermore, CA and are collect- ing continuous measurements of CO 2 , CH 4 and other species that are typically co-emitted with these GHGs. Measurements of co-emitted species can assist in attributing the GHGs to different emissions sectors. Automatic calibrations using traceable standards are performed routinely for the gas-phase measurements. We are also collecting standard meteorological data at the Livermore site as well as planetary boundary height measurements using a ceilometer. The location of the measurement site is well suited to sample air transported between the San Francisco Bay area and the California Central Valley.« less

  1. Analysis of training sample selection strategies for regression-based quantitative landslide susceptibility mapping methods

    NASA Astrophysics Data System (ADS)

    Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem

    2017-07-01

    All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.

  2. Bayesian Inference on the Radio-quietness of Gamma-ray Pulsars

    NASA Astrophysics Data System (ADS)

    Yu, Hoi-Fung; Hui, Chung Yue; Kong, Albert K. H.; Takata, Jumpei

    2018-04-01

    For the first time we demonstrate using a robust Bayesian approach to analyze the populations of radio-quiet (RQ) and radio-loud (RL) gamma-ray pulsars. We quantify their differences and obtain their distributions of the radio-cone opening half-angle δ and the magnetic inclination angle α by Bayesian inference. In contrast to the conventional frequentist point estimations that might be non-representative when the distribution is highly skewed or multi-modal, which is often the case when data points are scarce, Bayesian statistics displays the complete posterior distribution that the uncertainties can be readily obtained regardless of the skewness and modality. We found that the spin period, the magnetic field strength at the light cylinder, the spin-down power, the gamma-ray-to-X-ray flux ratio, and the spectral curvature significance of the two groups of pulsars exhibit significant differences at the 99% level. Using Bayesian inference, we are able to infer the values and uncertainties of δ and α from the distribution of RQ and RL pulsars. We found that δ is between 10° and 35° and the distribution of α is skewed toward large values.

  3. Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

    PubMed

    Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

    2016-03-01

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.

  4. Classification of Large-Scale Remote Sensing Images for Automatic Identification of Health Hazards: Smoke Detection Using an Autologistic Regression Classifier.

    PubMed

    Wolters, Mark A; Dean, C B

    2017-01-01

    Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.

  5. Imprints of the large-scale structure on AGN formation and evolution

    NASA Astrophysics Data System (ADS)

    Porqueres, Natàlia; Jasche, Jens; Enßlin, Torsten A.; Lavaux, Guilhem

    2018-04-01

    Black hole masses are found to correlate with several global properties of their host galaxies, suggesting that black holes and galaxies have an intertwined evolution and that active galactic nuclei (AGN) have a significant impact on galaxy evolution. Since the large-scale environment can also affect AGN, this work studies how their formation and properties depend on the environment. We have used a reconstructed three-dimensional high-resolution density field obtained from a Bayesian large-scale structure reconstruction method applied to the 2M++ galaxy sample. A web-type classification relying on the shear tensor is used to identify different structures on the cosmic web, defining voids, sheets, filaments, and clusters. We confirm that the environmental density affects the AGN formation and their properties. We found that the AGN abundance is equivalent to the galaxy abundance, indicating that active and inactive galaxies reside in similar dark matter halos. However, occurrence rates are different for each spectral type and accretion rate. These differences are consistent with the AGN evolutionary sequence suggested by previous authors, Seyferts and Transition objects transforming into low-ionization nuclear emission line regions (LINERs), the weaker counterpart of Seyferts. We conclude that AGN properties depend on the environmental density more than on the web-type. More powerful starbursts and younger stellar populations are found in high densities, where interactions and mergers are more likely. AGN hosts show smaller masses in clusters for Seyferts and Transition objects, which might be due to gas stripping. In voids, the AGN population is dominated by the most massive galaxy hosts.

  6. Bayesian accounts of covert selective attention: A tutorial review.

    PubMed

    Vincent, Benjamin T

    2015-05-01

    Decision making and optimal observer models offer an important theoretical approach to the study of covert selective attention. While their probabilistic formulation allows quantitative comparison to human performance, the models can be complex and their insights are not always immediately apparent. Part 1 establishes the theoretical appeal of the Bayesian approach, and introduces the way in which probabilistic approaches can be applied to covert search paradigms. Part 2 presents novel formulations of Bayesian models of 4 important covert attention paradigms, illustrating optimal observer predictions over a range of experimental manipulations. Graphical model notation is used to present models in an accessible way and Supplementary Code is provided to help bridge the gap between model theory and practical implementation. Part 3 reviews a large body of empirical and modelling evidence showing that many experimental phenomena in the domain of covert selective attention are a set of by-products. These effects emerge as the result of observers conducting Bayesian inference with noisy sensory observations, prior expectations, and knowledge of the generative structure of the stimulus environment.

  7. Revised standards for statistical evidence.

    PubMed

    Johnson, Valen E

    2013-11-26

    Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25-50:1, and to 100-200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

  8. Application of a predictive Bayesian model to environmental accounting.

    PubMed

    Anex, R P; Englehardt, J D

    2001-03-30

    Environmental accounting techniques are intended to capture important environmental costs and benefits that are often overlooked in standard accounting practices. Environmental accounting methods themselves often ignore or inadequately represent large but highly uncertain environmental costs and costs conditioned by specific prior events. Use of a predictive Bayesian model is demonstrated for the assessment of such highly uncertain environmental and contingent costs. The predictive Bayesian approach presented generates probability distributions for the quantity of interest (rather than parameters thereof). A spreadsheet implementation of a previously proposed predictive Bayesian model, extended to represent contingent costs, is described and used to evaluate whether a firm should undertake an accelerated phase-out of its PCB containing transformers. Variability and uncertainty (due to lack of information) in transformer accident frequency and severity are assessed simultaneously using a combination of historical accident data, engineering model-based cost estimates, and subjective judgement. Model results are compared using several different risk measures. Use of the model for incorporation of environmental risk management into a company's overall risk management strategy is discussed.

  9. Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records.

    PubMed

    Cheung, Li C; Pan, Qing; Hyun, Noorie; Schiffman, Mark; Fetterman, Barbara; Castle, Philip E; Lorey, Thomas; Katki, Hormuzd A

    2017-09-30

    For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  10. Large Scale Helium Liquefaction and Considerations for Site Services for a Plant Located in Algeria

    NASA Astrophysics Data System (ADS)

    Froehlich, P.; Clausen, J. J.

    2008-03-01

    The large-scale liquefaction of helium extracted from natural gas is depicted. Based on a block diagram the process chain, starting with the pipeline downstream of the natural-gas plant to the final storage of liquid helium, is explained. Information will be provided about the recent experiences during installation and start-up of a bulk helium liquefaction plant located in Skikda, Algeria, including part-load operation based on a reduced feed gas supply. The local working and ambient conditions are described, including challenging logistic problems like shipping and receiving of parts, qualified and semi-qualified subcontractors, basic provisions and tools on site, and precautions to sea water and ambient conditions. Finally, the differences in commissioning (technically and evaluation of time and work packages) to European locations and standards will be discussed.

  11. Genome-wide regression and prediction with the BGLR statistical package.

    PubMed

    Pérez, Paulino; de los Campos, Gustavo

    2014-10-01

    Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.

  12. Improving photoelectron counting and particle identification in scintillation detectors with Bayesian techniques

    NASA Astrophysics Data System (ADS)

    Akashi-Ronquest, M.; Amaudruz, P.-A.; Batygov, M.; Beltran, B.; Bodmer, M.; Boulay, M. G.; Broerman, B.; Buck, B.; Butcher, A.; Cai, B.; Caldwell, T.; Chen, M.; Chen, Y.; Cleveland, B.; Coakley, K.; Dering, K.; Duncan, F. A.; Formaggio, J. A.; Gagnon, R.; Gastler, D.; Giuliani, F.; Gold, M.; Golovko, V. V.; Gorel, P.; Graham, K.; Grace, E.; Guerrero, N.; Guiseppe, V.; Hallin, A. L.; Harvey, P.; Hearns, C.; Henning, R.; Hime, A.; Hofgartner, J.; Jaditz, S.; Jillings, C. J.; Kachulis, C.; Kearns, E.; Kelsey, J.; Klein, J. R.; Kuźniak, M.; LaTorre, A.; Lawson, I.; Li, O.; Lidgard, J. J.; Liimatainen, P.; Linden, S.; McFarlane, K.; McKinsey, D. N.; MacMullin, S.; Mastbaum, A.; Mathew, R.; McDonald, A. B.; Mei, D.-M.; Monroe, J.; Muir, A.; Nantais, C.; Nicolics, K.; Nikkel, J. A.; Noble, T.; O'Dwyer, E.; Olsen, K.; Orebi Gann, G. D.; Ouellet, C.; Palladino, K.; Pasuthip, P.; Perumpilly, G.; Pollmann, T.; Rau, P.; Retière, F.; Rielage, K.; Schnee, R.; Seibert, S.; Skensved, P.; Sonley, T.; Vázquez-Jáuregui, E.; Veloce, L.; Walding, J.; Wang, B.; Wang, J.; Ward, M.; Zhang, C.

    2015-05-01

    Many current and future dark matter and neutrino detectors are designed to measure scintillation light with a large array of photomultiplier tubes (PMTs). The energy resolution and particle identification capabilities of these detectors depend in part on the ability to accurately identify individual photoelectrons in PMT waveforms despite large variability in pulse amplitudes and pulse pileup. We describe a Bayesian technique that can identify the times of individual photoelectrons in a sampled PMT waveform without deconvolution, even when pileup is present. To demonstrate the technique, we apply it to the general problem of particle identification in single-phase liquid argon dark matter detectors. Using the output of the Bayesian photoelectron counting algorithm described in this paper, we construct several test statistics for rejection of backgrounds for dark matter searches in argon. Compared to simpler methods based on either observed charge or peak finding, the photoelectron counting technique improves both energy resolution and particle identification of low energy events in calibration data from the DEAP-1 detector and simulation of the larger MiniCLEAN dark matter detector.

  13. The watershed-scale optimized and rearranged landscape design (WORLD) model and local biomass processing depots for sustainable biofuel production: Integrated life cycle assessments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eranki, Pragnya L.; Manowitz, David H.; Bals, Bryan D.

    An array of feedstock is being evaluated as potential raw material for cellulosic biofuel production. Thorough assessments are required in regional landscape settings before these feedstocks can be cultivated and sustainable management practices can be implemented. On the processing side, a potential solution to the logistical challenges of large biorefi neries is provided by a network of distributed processing facilities called local biomass processing depots. A large-scale cellulosic ethanol industry is likely to emerge soon in the United States. We have the opportunity to influence the sustainability of this emerging industry. The watershed-scale optimized and rearranged landscape design (WORLD) modelmore » estimates land allocations for different cellulosic feedstocks at biorefinery scale without displacing current animal nutrition requirements. This model also incorporates a network of the aforementioned depots. An integrated life cycle assessment is then conducted over the unified system of optimized feedstock production, processing, and associated transport operations to evaluate net energy yields (NEYs) and environmental impacts.« less

  14. Dynamical Bayesian inference of time-evolving interactions: From a pair of coupled oscillators to networks of oscillators

    NASA Astrophysics Data System (ADS)

    Duggento, Andrea; Stankovski, Tomislav; McClintock, Peter V. E.; Stefanovska, Aneta

    2012-12-01

    Living systems have time-evolving interactions that, until recently, could not be identified accurately from recorded time series in the presence of noise. Stankovski [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.109.024101 109, 024101 (2012)] introduced a method based on dynamical Bayesian inference that facilitates the simultaneous detection of time-varying synchronization, directionality of influence, and coupling functions. It can distinguish unsynchronized dynamics from noise-induced phase slips. The method is based on phase dynamics, with Bayesian inference of the time-evolving parameters being achieved by shaping the prior densities to incorporate knowledge of previous samples. We now present the method in detail using numerically generated data, data from an analog electronic circuit, and cardiorespiratory data. We also generalize the method to encompass networks of interacting oscillators and thus demonstrate its applicability to small-scale networks.

  15. Predicting Software Suitability Using a Bayesian Belief Network

    NASA Technical Reports Server (NTRS)

    Beaver, Justin M.; Schiavone, Guy A.; Berrios, Joseph S.

    2005-01-01

    The ability to reliably predict the end quality of software under development presents a significant advantage for a development team. It provides an opportunity to address high risk components earlier in the development life cycle, when their impact is minimized. This research proposes a model that captures the evolution of the quality of a software product, and provides reliable forecasts of the end quality of the software being developed in terms of product suitability. Development team skill, software process maturity, and software problem complexity are hypothesized as driving factors of software product quality. The cause-effect relationships between these factors and the elements of software suitability are modeled using Bayesian Belief Networks, a machine learning method. This research presents a Bayesian Network for software quality, and the techniques used to quantify the factors that influence and represent software quality. The developed model is found to be effective in predicting the end product quality of small-scale software development efforts.

  16. Bayesian denoising in digital radiography: a comparison in the dental field.

    PubMed

    Frosio, I; Olivieri, C; Lucchese, M; Borghese, N A; Boccacci, P

    2013-01-01

    We compared two Bayesian denoising algorithms for digital radiographs, based on Total Variation regularization and wavelet decomposition. The comparison was performed on simulated radiographs with different photon counts and frequency content and on real dental radiographs. Four different quality indices were considered to quantify the quality of the filtered radiographs. The experimental results suggested that Total Variation is more suited to preserve fine anatomical details, whereas wavelets produce images of higher quality at global scale; they also highlighted the need for more reliable image quality indices. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Global biogeography of scaly tree ferns (Cyatheaceae): evidence for Gondwanan vicariance and limited transoceanic dispersal

    PubMed Central

    Korall, Petra; Pryer, Kathleen M

    2014-01-01

    Aim Scaly tree ferns, Cyatheaceae, are a well-supported group of mostly tree-forming ferns found throughout the tropics, the subtropics and the south-temperate zone. Fossil evidence shows that the lineage originated in the Late Jurassic period. We reconstructed large-scale historical biogeographical patterns of Cyatheaceae and tested the hypothesis that some of the observed distribution patterns are in fact compatible, in time and space, with a vicariance scenario related to the break-up of Gondwana. Location Tropics, subtropics and south-temperate areas of the world. Methods The historical biogeography of Cyatheaceae was analysed in a maximum likelihood framework using Lagrange. The 78 ingroup taxa are representative of the geographical distribution of the entire family. The phylogenies that served as a basis for the analyses were obtained by Bayesian inference analyses of mainly previously published DNA sequence data using MrBayes. Lineage divergence dates were estimated in a Bayesian Markov chain Monte Carlo framework using beast. Results Cyatheaceae originated in the Late Jurassic in either South America or Australasia. Following a range expansion, the ancestral distribution of the marginate-scaled clade included both these areas, whereas Sphaeropteris is reconstructed as having its origin only in Australasia. Within the marginate-scaled clade, reconstructions of early divergences are hampered by the unresolved relationships among the Alsophila, Cyathea and Gymnosphaera lineages. Nevertheless, it is clear that the occurrence of the Cyathea and Sphaeropteris lineages in South America may be related to vicariance, whereas transoceanic dispersal needs to be inferred for the range shifts seen in Alsophila and Gymnosphaera. Main conclusions The evolutionary history of Cyatheaceae involves both Gondwanan vicariance scenarios as well as long-distance dispersal events. The number of transoceanic dispersals reconstructed for the family is rather few when compared with other fern lineages. We suggest that a causal relationship between reproductive mode (outcrossing) and dispersal limitations is the most plausible explanation for the pattern observed. PMID:25435648

  18. Global biogeography of scaly tree ferns (Cyatheaceae): evidence for Gondwanan vicariance and limited transoceanic dispersal.

    PubMed

    Korall, Petra; Pryer, Kathleen M

    2014-02-01

    Scaly tree ferns, Cyatheaceae, are a well-supported group of mostly tree-forming ferns found throughout the tropics, the subtropics and the south-temperate zone. Fossil evidence shows that the lineage originated in the Late Jurassic period. We reconstructed large-scale historical biogeographical patterns of Cyatheaceae and tested the hypothesis that some of the observed distribution patterns are in fact compatible, in time and space, with a vicariance scenario related to the break-up of Gondwana. Tropics, subtropics and south-temperate areas of the world. The historical biogeography of Cyatheaceae was analysed in a maximum likelihood framework using Lagrange. The 78 ingroup taxa are representative of the geographical distribution of the entire family. The phylogenies that served as a basis for the analyses were obtained by Bayesian inference analyses of mainly previously published DNA sequence data using MrBayes. Lineage divergence dates were estimated in a Bayesian Markov chain Monte Carlo framework using beast. Cyatheaceae originated in the Late Jurassic in either South America or Australasia. Following a range expansion, the ancestral distribution of the marginate-scaled clade included both these areas, whereas Sphaeropteris is reconstructed as having its origin only in Australasia. Within the marginate-scaled clade, reconstructions of early divergences are hampered by the unresolved relationships among the Alsophila , Cyathea and Gymnosphaera lineages. Nevertheless, it is clear that the occurrence of the Cyathea and Sphaeropteris lineages in South America may be related to vicariance, whereas transoceanic dispersal needs to be inferred for the range shifts seen in Alsophila and Gymnosphaera . The evolutionary history of Cyatheaceae involves both Gondwanan vicariance scenarios as well as long-distance dispersal events. The number of transoceanic dispersals reconstructed for the family is rather few when compared with other fern lineages. We suggest that a causal relationship between reproductive mode (outcrossing) and dispersal limitations is the most plausible explanation for the pattern observed.

  19. Closed-loop supply chain models with considering the environmental impact.

    PubMed

    Mohajeri, Amir; Fallah, Mohammad

    2014-01-01

    Global warming and climate changes created by large scale emissions of greenhouse gases are a worldwide concern. Due to this, the issue of green supply chain management has received more attention in the last decade. In this study, a closed-loop logistic concept which serves the purposes of recycling, reuse, and recovery required in a green supply chain is applied to integrate the environmental issues into a traditional logistic system. Here, we formulate a comprehensive closed-loop model for the logistics planning considering profitability and ecological goals. In this way, we can achieve the ecological goal reducing the overall amount of CO2 emitted from journeys. Moreover, the profitability criterion can be supported in the cyclic network with the minimum costs and maximum service level. We apply three scenarios and develop problem formulations for each scenario corresponding to the specified regulations and investigate the effect of the regulation on the preferred transport mode and the emissions. To validate the models, some numerical experiments are worked out and a comparative analysis is investigated.

  20. The network adjustment aimed for the campaigned gravity survey using a Bayesian approach: methodology and model test

    NASA Astrophysics Data System (ADS)

    Chen, Shi; Liao, Xu; Ma, Hongsheng; Zhou, Longquan; Wang, Xingzhou; Zhuang, Jiancang

    2017-04-01

    The relative gravimeter, which generally uses zero-length springs as the gravity senor, is still as the first choice in the field of terrestrial gravity measurement because of its efficiency and low-cost. Because the drift rate of instrument can be changed with the time and meter, it is necessary for estimating the drift rate to back to the base or known gravity value stations for repeated measurement at regular hour's interval during the practical survey. However, the campaigned gravity survey for the large-scale region, which the distance of stations is far away from serval or tens kilometers, the frequent back to close measurement will highly reduce the gravity survey efficiency and extremely time-consuming. In this paper, we proposed a new gravity data adjustment method for estimating the meter drift by means of Bayesian statistical interference. In our approach, we assumed the change of drift rate is a smooth function depend on the time-lapse. The trade-off parameters were be used to control the fitting residuals. We employed the Akaike's Bayesian Information Criterion (ABIC) for the estimated these trade-off parameters. The comparison and analysis of simulated data between the classical and Bayesian adjustment show that our method is robust and has self-adaptive ability for facing to the unregularly non-linear meter drift. At last, we used this novel approach to process the realistic campaigned gravity data at the North China. Our adjustment method is suitable to recover the time-varied drift rate function of each meter, and also to detect the meter abnormal drift during the gravity survey. We also defined an alternative error estimation for the inversed gravity value at the each station on the basis of the marginal distribution theory. Acknowledgment: This research is supported by Science Foundation Institute of Geophysics, CEA from the Ministry of Science and Technology of China (Nos. DQJB16A05; DQJB16B07), China National Special Fund for Earthquake Scientific Research in Public Interest (Nos. 201508006; 201508009).

Top