Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T
2010-11-01
Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.
Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.
Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo
2007-01-01
The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.
ERIC Educational Resources Information Center
Vos, Hans J.
An approach to simultaneous optimization of assignments of subjects to treatments followed by an end-of-mastery test is presented using the framework of Bayesian decision theory. Focus is on demonstrating how rules for the simultaneous optimization of sequences of decisions can be found. The main advantages of the simultaneous approach, compared…
Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D
2004-10-01
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.
Numerical study on the sequential Bayesian approach for radioactive materials detection
NASA Astrophysics Data System (ADS)
Qingpei, Xiang; Dongfeng, Tian; Jianyu, Zhu; Fanhua, Hao; Ge, Ding; Jun, Zeng
2013-01-01
A new detection method, based on the sequential Bayesian approach proposed by Candy et al., offers new horizons for the research of radioactive detection. Compared with the commonly adopted detection methods incorporated with statistical theory, the sequential Bayesian approach offers the advantages of shorter verification time during the analysis of spectra that contain low total counts, especially in complex radionuclide components. In this paper, a simulation experiment platform implanted with the methodology of sequential Bayesian approach was developed. Events sequences of γ-rays associating with the true parameters of a LaBr3(Ce) detector were obtained based on an events sequence generator using Monte Carlo sampling theory to study the performance of the sequential Bayesian approach. The numerical experimental results are in accordance with those of Candy. Moreover, the relationship between the detection model and the event generator, respectively represented by the expected detection rate (Am) and the tested detection rate (Gm) parameters, is investigated. To achieve an optimal performance for this processor, the interval of the tested detection rate as a function of the expected detection rate is also presented.
Bayesian Optimization Under Mixed Constraints with A Slack-Variable Augmented Lagrangian
DOE Office of Scientific and Technical Information (OSTI.GOV)
Picheny, Victor; Gramacy, Robert B.; Wild, Stefan M.
An augmented Lagrangian (AL) can convert a constrained optimization problem into a sequence of simpler (e.g., unconstrained) problems, which are then usually solved with local solvers. Recently, surrogate-based Bayesian optimization (BO) sub-solvers have been successfully deployed in the AL framework for a more global search in the presence of inequality constraints; however, a drawback was that expected improvement (EI) evaluations relied on Monte Carlo. Here we introduce an alternative slack variable AL, and show that in this formulation the EI may be evaluated with library routines. The slack variables furthermore facilitate equality as well as inequality constraints, and mixtures thereof.more » We show our new slack “ALBO” compares favorably to the original. Its superiority over conventional alternatives is reinforced on several mixed constraint examples.« less
A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences
Zhu, Youding; Fujimura, Kikuo
2010-01-01
This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach. PMID:22399933
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
Park, Joong-Ki; Rho, Hyun Soo; Kristensen, Reinhardt Møbjerg; Kim, Won; Giribet, Gonzalo
2006-11-01
Recent progress in molecular techniques has generated a wealth of information for phylogenetic analysis. Among metazoans all but a single phylum have been incorporated into some sort of molecular analysis. However, the minute and rare species of the phylum Loricifera have remained elusive to molecular systematists. Here we report the first molecular sequence data (nearly complete 18S rRNA) for a member of the phylum Loricifera, Pliciloricus sp. from Korea. The new sequence data were analyzed together with 52 other ecdysozoan sequences, with all other phyla represented by three or more sequences. The data set was analyzed using parsimony as an optimality criterion under direct optimization as well as using a Bayesian approach. The parsimony analysis was also accompanied by a sensitivity analysis. The results of both analyses are largely congruent, finding monophyly of each ecdysozoan phylum, except for Priapulida, in which the coelomate Meiopriapulus is separate from a clade of pseudocoelomate priapulids. The data also suggest a relationship of the pseudocoelomate priapulids to kinorhynchs, and a relationship of nematodes to tardigrades. The Bayesian analysis placed the arthropods as the sister group to a clade that includes tardigrades and nematodes. However, these results were shown to be parameter dependent in the sensitivity analysis. The position of Loricifera was extremely unstable to parameter variation, and support for a relationship of loriciferans to any particular ecdysozoan phylum was not found in the data.
A Bayesian Assessment of Seismic Semi-Periodicity Forecasts
NASA Astrophysics Data System (ADS)
Nava, F.; Quinteros, C.; Glowacka, E.; Frez, J.
2016-01-01
Among the schemes for earthquake forecasting, the search for semi-periodicity during large earthquakes in a given seismogenic region plays an important role. When considering earthquake forecasts based on semi-periodic sequence identification, the Bayesian formalism is a useful tool for: (1) assessing how well a given earthquake satisfies a previously made forecast; (2) re-evaluating the semi-periodic sequence probability; and (3) testing other prior estimations of the sequence probability. A comparison of Bayesian estimates with updated estimates of semi-periodic sequences that incorporate new data not used in the original estimates shows extremely good agreement, indicating that: (1) the probability that a semi-periodic sequence is not due to chance is an appropriate estimate for the prior sequence probability estimate; and (2) the Bayesian formalism does a very good job of estimating corrected semi-periodicity probabilities, using slightly less data than that used for updated estimates. The Bayesian approach is exemplified explicitly by its application to the Parkfield semi-periodic forecast, and results are given for its application to other forecasts in Japan and Venezuela.
Friston, Karl J.; Dolan, Raymond J.
2017-01-01
Normative models of human cognition often appeal to Bayesian filtering, which provides optimal online estimates of unknown or hidden states of the world, based on previous observations. However, in many cases it is necessary to optimise beliefs about sequences of states rather than just the current state. Importantly, Bayesian filtering and sequential inference strategies make different predictions about beliefs and subsequent choices, rendering them behaviourally dissociable. Taking data from a probabilistic reversal task we show that subjects’ choices provide strong evidence that they are representing short sequences of states. Between-subject measures of this implicit sequential inference strategy had a neurobiological underpinning and correlated with grey matter density in prefrontal and parietal cortex, as well as the hippocampus. Our findings provide, to our knowledge, the first evidence for sequential inference in human cognition, and by exploiting between-subject variation in this measure we provide pointers to its neuronal substrates. PMID:28486504
a Novel Discrete Optimal Transport Method for Bayesian Inverse Problems
NASA Astrophysics Data System (ADS)
Bui-Thanh, T.; Myers, A.; Wang, K.; Thiery, A.
2017-12-01
We present the Augmented Ensemble Transform (AET) method for generating approximate samples from a high-dimensional posterior distribution as a solution to Bayesian inverse problems. Solving large-scale inverse problems is critical for some of the most relevant and impactful scientific endeavors of our time. Therefore, constructing novel methods for solving the Bayesian inverse problem in more computationally efficient ways can have a profound impact on the science community. This research derives the novel AET method for exploring a posterior by solving a sequence of linear programming problems, resulting in a series of transport maps which map prior samples to posterior samples, allowing for the computation of moments of the posterior. We show both theoretical and numerical results, indicating this method can offer superior computational efficiency when compared to other SMC methods. Most of this efficiency is derived from matrix scaling methods to solve the linear programming problem and derivative-free optimization for particle movement. We use this method to determine inter-well connectivity in a reservoir and the associated uncertainty related to certain parameters. The attached file shows the difference between the true parameter and the AET parameter in an example 3D reservoir problem. The error is within the Morozov discrepancy allowance with lower computational cost than other particle methods.
Search Parameter Optimization for Discrete, Bayesian, and Continuous Search Algorithms
2017-09-01
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS SEARCH PARAMETER OPTIMIZATION FOR DISCRETE , BAYESIAN, AND CONTINUOUS SEARCH ALGORITHMS by...to 09-22-2017 4. TITLE AND SUBTITLE SEARCH PARAMETER OPTIMIZATION FOR DISCRETE , BAYESIAN, AND CON- TINUOUS SEARCH ALGORITHMS 5. FUNDING NUMBERS 6...simple search and rescue acts to prosecuting aerial/surface/submersible targets on mission. This research looks at varying the known discrete and
Bayesian ensemble refinement by replica simulations and reweighting.
Hummer, Gerhard; Köfinger, Jürgen
2015-12-28
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting
NASA Astrophysics Data System (ADS)
Hummer, Gerhard; Köfinger, Jürgen
2015-12-01
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
A Bayesian Sampler for Optimization of Protein Domain Hierarchies
2014-01-01
Abstract The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy. PMID:24494927
The Bayesian reader: explaining word recognition as an optimal Bayesian decision process.
Norris, Dennis
2006-04-01
This article presents a theory of visual word recognition that assumes that, in the tasks of word identification, lexical decision, and semantic categorization, human readers behave as optimal Bayesian decision makers. This leads to the development of a computational model of word recognition, the Bayesian reader. The Bayesian reader successfully simulates some of the most significant data on human reading. The model accounts for the nature of the function relating word frequency to reaction time and identification threshold, the effects of neighborhood density and its interaction with frequency, and the variation in the pattern of neighborhood density effects seen in different experimental tasks. Both the general behavior of the model and the way the model predicts different patterns of results in different tasks follow entirely from the assumption that human readers approximate optimal Bayesian decision makers. ((c) 2006 APA, all rights reserved).
Encoding probabilistic brain atlases using Bayesian inference.
Van Leemput, Koen
2009-06-01
This paper addresses the problem of creating probabilistic brain atlases from manually labeled training data. Probabilistic atlases are typically constructed by counting the relative frequency of occurrence of labels in corresponding locations across the training images. However, such an "averaging" approach generalizes poorly to unseen cases when the number of training images is limited, and provides no principled way of aligning the training datasets using deformable registration. In this paper, we generalize the generative image model implicitly underlying standard "average" atlases, using mesh-based representations endowed with an explicit deformation model. Bayesian inference is used to infer the optimal model parameters from the training data, leading to a simultaneous group-wise registration and atlas estimation scheme that encompasses standard averaging as a special case. We also use Bayesian inference to compare alternative atlas models in light of the training data, and show how this leads to a data compression problem that is intuitive to interpret and computationally feasible. Using this technique, we automatically determine the optimal amount of spatial blurring, the best deformation field flexibility, and the most compact mesh representation. We demonstrate, using 2-D training datasets, that the resulting models are better at capturing the structure in the training data than conventional probabilistic atlases. We also present experiments of the proposed atlas construction technique in 3-D, and show the resulting atlases' potential in fully-automated, pulse sequence-adaptive segmentation of 36 neuroanatomical structures in brain MRI scans.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.
Duarte, Belmiro P M; Wong, Weng Kee
2015-08-01
This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach
Duarte, Belmiro P. M.; Wong, Weng Kee
2014-01-01
Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159
Stochastic Model of Seasonal Runoff Forecasts
NASA Astrophysics Data System (ADS)
Krzysztofowicz, Roman; Watada, Leslie M.
1986-03-01
Each year the National Weather Service and the Soil Conservation Service issue a monthly sequence of five (or six) categorical forecasts of the seasonal snowmelt runoff volume. To describe uncertainties in these forecasts for the purposes of optimal decision making, a stochastic model is formulated. It is a discrete-time, finite, continuous-space, nonstationary Markov process. Posterior densities of the actual runoff conditional upon a forecast, and transition densities of forecasts are obtained from a Bayesian information processor. Parametric densities are derived for the process with a normal prior density of the runoff and a linear model of the forecast error. The structure of the model and the estimation procedure are motivated by analyses of forecast records from five stations in the Snake River basin, from the period 1971-1983. The advantages of supplementing the current forecasting scheme with a Bayesian analysis are discussed.
Iterative updating of model error for Bayesian inversion
NASA Astrophysics Data System (ADS)
Calvetti, Daniela; Dunlop, Matthew; Somersalo, Erkki; Stuart, Andrew
2018-02-01
In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations when the data is finite dimensional. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit under a simplifying assumption. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.
NASA Technical Reports Server (NTRS)
Mengshoel, Ole J.; Wilkins, David C.; Roth, Dan
2010-01-01
For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which algorithms are effective for initialization? When should the search process be restarted? In the present work we investigate these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE search time for application BNs by several orders of magnitude compared to using uniform at random initialization without restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion
Padhukasahasram, Badri; Rannala, Bruce
2011-01-01
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Review of Reliability-Based Design Optimization Approach and Its Integration with Bayesian Method
NASA Astrophysics Data System (ADS)
Zhang, Xiangnan
2018-03-01
A lot of uncertain factors lie in practical engineering, such as external load environment, material property, geometrical shape, initial condition, boundary condition, etc. Reliability method measures the structural safety condition and determine the optimal design parameter combination based on the probabilistic theory. Reliability-based design optimization (RBDO) is the most commonly used approach to minimize the structural cost or other performance under uncertainty variables which combines the reliability theory and optimization. However, it cannot handle the various incomplete information. The Bayesian approach is utilized to incorporate this kind of incomplete information in its uncertainty quantification. In this paper, the RBDO approach and its integration with Bayesian method are introduced.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Nonlinear and non-Gaussian Bayesian based handwriting beautification
NASA Astrophysics Data System (ADS)
Shi, Cao; Xiao, Jianguo; Xu, Canhui; Jia, Wenhua
2013-03-01
A framework is proposed in this paper to effectively and efficiently beautify handwriting by means of a novel nonlinear and non-Gaussian Bayesian algorithm. In the proposed framework, format and size of handwriting image are firstly normalized, and then typeface in computer system is applied to optimize vision effect of handwriting. The Bayesian statistics is exploited to characterize the handwriting beautification process as a Bayesian dynamic model. The model parameters to translate, rotate and scale typeface in computer system are controlled by state equation, and the matching optimization between handwriting and transformed typeface is employed by measurement equation. Finally, the new typeface, which is transformed from the original one and gains the best nonlinear and non-Gaussian optimization, is the beautification result of handwriting. Experimental results demonstrate the proposed framework provides a creative handwriting beautification methodology to improve visual acceptance.
BMDS: A Collection of R Functions for Bayesian Multidimensional Scaling
ERIC Educational Resources Information Center
Okada, Kensuke; Shigemasu, Kazuo
2009-01-01
Bayesian multidimensional scaling (MDS) has attracted a great deal of attention because: (1) it provides a better fit than do classical MDS and ALSCAL; (2) it provides estimation errors of the distances; and (3) the Bayesian dimension selection criterion, MDSIC, provides a direct indication of optimal dimensionality. However, Bayesian MDS is not…
Bayesian Correlation Analysis for Sequence Count Data
Lau, Nelson; Perkins, Theodore J.
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449
Bayesian Just-So Stories in Psychology and Neuroscience
ERIC Educational Resources Information Center
Bowers, Jeffrey S.; Davis, Colin J.
2012-01-01
According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak.…
NASA Astrophysics Data System (ADS)
Felgaer, Pablo; Britos, Paola; García-Martínez, Ramón
A Bayesian network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency; they are used to provide: a compact form to represent the knowledge and flexible methods of reasoning. Obtaining it from data is a learning process that is divided in two steps: structural learning and parametric learning. In this paper we define an automatic learning method that optimizes the Bayesian networks applied to classification, using a hybrid method of learning that combines the advantages of the induction techniques of the decision trees (TDIDT-C4.5) with those of the Bayesian networks. The resulting method is applied to prediction in health domain.
Ruff, Kiersten M.; Harmon, Tyler S.; Pappu, Rohit V.
2015-01-01
We report the development and deployment of a coarse-graining method that is well suited for computer simulations of aggregation and phase separation of protein sequences with block-copolymeric architectures. Our algorithm, named CAMELOT for Coarse-grained simulations Aided by MachinE Learning Optimization and Training, leverages information from converged all atom simulations that is used to determine a suitable resolution and parameterize the coarse-grained model. To parameterize a system-specific coarse-grained model, we use a combination of Boltzmann inversion, non-linear regression, and a Gaussian process Bayesian optimization approach. The accuracy of the coarse-grained model is demonstrated through direct comparisons to results from all atom simulations. We demonstrate the utility of our coarse-graining approach using the block-copolymeric sequence from the exon 1 encoded sequence of the huntingtin protein. This sequence comprises of 17 residues from the N-terminal end of huntingtin (N17) followed by a polyglutamine (polyQ) tract. Simulations based on the CAMELOT approach are used to show that the adsorption and unfolding of the wild type N17 and its sequence variants on the surface of polyQ tracts engender a patchy colloid like architecture that promotes the formation of linear aggregates. These results provide a plausible explanation for experimental observations, which show that N17 accelerates the formation of linear aggregates in block-copolymeric N17-polyQ sequences. The CAMELOT approach is versatile and is generalizable for simulating the aggregation and phase behavior of a range of block-copolymeric protein sequences. PMID:26723608
Arbitrary norm support vector machines.
Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R
2009-02-01
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.
Bayesian Integration of Spatial Information
ERIC Educational Resources Information Center
Cheng, Ken; Shettleworth, Sara J.; Huttenlocher, Janellen; Rieser, John J.
2007-01-01
Spatial judgments and actions are often based on multiple cues. The authors review a multitude of phenomena on the integration of spatial cues in diverse species to consider how nearly optimally animals combine the cues. Under the banner of Bayesian perception, cues are sometimes combined and weighted in a near optimal fashion. In other instances…
Support vector machine multiuser receiver for DS-CDMA signals in multipath channels.
Chen, S; Samingan, A K; Hanzo, L
2001-01-01
The problem of constructing an adaptive multiuser detector (MUD) is considered for direct sequence code division multiple access (DS-CDMA) signals transmitted through multipath channels. The emerging learning technique, called support vector machines (SVM), is proposed as a method of obtaining a nonlinear MUD from a relatively small training data block. Computer simulation is used to study this SVM MUD, and the results show that it can closely match the performance of the optimal Bayesian one-shot detector. Comparisons with an adaptive radial basis function (RBF) MUD trained by an unsupervised clustering algorithm are discussed.
Determining the Intensity of a Point-Like Source Observed on the Background of AN Extended Source
NASA Astrophysics Data System (ADS)
Kornienko, Y. V.; Skuratovskiy, S. I.
2014-12-01
The problem of determining the time dependence of intensity of a point-like source in case of atmospheric blur is formulated and solved by using the Bayesian statistical approach. A pointlike source is supposed to be observed on the background of an extended source with constant in time though unknown brightness. The equation system for optimal statistical estimation of the sequence of intensity values in observation moments is obtained. The problem is particularly relevant for studying gravitational mirages which appear while observing a quasar through the gravitational field of a far galaxy.
Protein construct storage: Bayesian variable selection and prediction with mixtures.
Clyde, M A; Parmigiani, G
1998-07-01
Determining optimal conditions for protein storage while maintaining a high level of protein activity is an important question in pharmaceutical research. A designed experiment based on a space-filling design was conducted to understand the effects of factors affecting protein storage and to establish optimal storage conditions. Different model-selection strategies to identify important factors may lead to very different answers about optimal conditions. Uncertainty about which factors are important, or model uncertainty, can be a critical issue in decision-making. We use Bayesian variable selection methods for linear models to identify important variables in the protein storage data, while accounting for model uncertainty. We also use the Bayesian framework to build predictions based on a large family of models, rather than an individual model, and to evaluate the probability that certain candidate storage conditions are optimal.
Bayesian estimation of the discrete coefficient of determination.
Chen, Ting; Braga-Neto, Ulisses M
2016-12-01
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
Human-in-the-loop Bayesian optimization of wearable device parameters
Malcolm, Philippe; Speeckaert, Jozefien; Siviy, Christoper J.; Walsh, Conor J.; Kuindersma, Scott
2017-01-01
The increasing capabilities of exoskeletons and powered prosthetics for walking assistance have paved the way for more sophisticated and individualized control strategies. In response to this opportunity, recent work on human-in-the-loop optimization has considered the problem of automatically tuning control parameters based on realtime physiological measurements. However, the common use of metabolic cost as a performance metric creates significant experimental challenges due to its long measurement times and low signal-to-noise ratio. We evaluate the use of Bayesian optimization—a family of sample-efficient, noise-tolerant, and global optimization methods—for quickly identifying near-optimal control parameters. To manage experimental complexity and provide comparisons against related work, we consider the task of minimizing metabolic cost by optimizing walking step frequencies in unaided human subjects. Compared to an existing approach based on gradient descent, Bayesian optimization identified a near-optimal step frequency with a faster time to convergence (12 minutes, p < 0.01), smaller inter-subject variability in convergence time (± 2 minutes, p < 0.01), and lower overall energy expenditure (p < 0.01). PMID:28926613
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
ERIC Educational Resources Information Center
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Using Alien Coins to Test Whether Simple Inference Is Bayesian
ERIC Educational Resources Information Center
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Bayesian Optimization for Neuroimaging Pre-processing in Brain Age Classification and Prediction
Lancaster, Jenessa; Lorenz, Romy; Leech, Rob; Cole, James H.
2018-01-01
Neuroimaging-based age prediction using machine learning is proposed as a biomarker of brain aging, relating to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalization to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimizing resampling parameters using Bayesian optimization. Using data on N = 2003 healthy individuals (aged 16–90 years) we trained support vector machines to (i) distinguish between young (<22 years) and old (>50 years) brains (classification) and (ii) predict chronological age (regression). We also evaluated generalisability of the age-regression model to an independent dataset (CamCAN, N = 648, aged 18–88 years). Bayesian optimization was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 88.1% was achieved, (optimal voxel size = 11.5 mm3, smoothing kernel = 2.3 mm). For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, (optimal voxel size = 3.73 mm3, smoothing kernel = 3.68 mm). This was compared to performance using default values of 1.5 mm3 and 4mm respectively, resulting in MAE = 5.48 years, though this 7.3% improvement was not statistically significant. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimization framework to the new dataset, out-performing the parameters optimized for the initial training dataset. Our study outlines the proof-of-principle that neuroimaging models for brain-age prediction can use Bayesian optimization to derive case-specific pre-processing parameters. Our results suggest that different pre-processing parameters are selected when optimization is conducted in specific contexts. This potentially motivates use of optimization techniques at many different points during the experimental process, which may improve statistical sensitivity and reduce opportunities for experimenter-led bias. PMID:29483870
Advanced obstacle avoidance for a laser based wheelchair using optimised Bayesian neural networks.
Trieu, Hoang T; Nguyen, Hung T; Willey, Keith
2008-01-01
In this paper we present an advanced method of obstacle avoidance for a laser based intelligent wheelchair using optimized Bayesian neural networks. Three neural networks are designed for three separate sub-tasks: passing through a door way, corridor and wall following and general obstacle avoidance. The accurate usable accessible space is determined by including the actual wheelchair dimensions in a real-time map used as inputs to each networks. Data acquisitions are performed separately to collect the patterns required for specified sub-tasks. Bayesian frame work is used to determine the optimal neural network structure in each case. Then these networks are trained under the supervision of Bayesian rule. Experiment results showed that compare to the VFH algorithm our neural networks navigated a smoother path following a near optimum trajectory.
Bayesian just-so stories in psychology and neuroscience.
Bowers, Jeffrey S; Davis, Colin J
2012-05-01
According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak. This weakness relates to the many arbitrary ways that priors, likelihoods, and utility functions can be altered in order to account for the data that are obtained, making the models unfalsifiable. It further relates to the fact that Bayesian theories are rarely better at predicting data compared with alternative (and simpler) non-Bayesian theories. Second, we show that the empirical evidence for Bayesian theories in neuroscience is weaker still. There are impressive mathematical analyses showing how populations of neurons could compute in a Bayesian manner but little or no evidence that they do. Third, we challenge the general scientific approach that characterizes Bayesian theorizing in cognitive science. A common premise is that theories in psychology should largely be constrained by a rational analysis of what the mind ought to do. We question this claim and argue that many of the important constraints come from biological, evolutionary, and processing (algorithmic) considerations that have no adaptive relevance to the problem per se. In our view, these factors have contributed to the development of many Bayesian "just so" stories in psychology and neuroscience; that is, mathematical analyses of cognition that can be used to explain almost any behavior as optimal. 2012 APA, all rights reserved.
BOP2: Bayesian optimal design for phase II clinical trials with simple and complex endpoints.
Zhou, Heng; Lee, J Jack; Yuan, Ying
2017-09-20
We propose a flexible Bayesian optimal phase II (BOP2) design that is capable of handling simple (e.g., binary) and complicated (e.g., ordinal, nested, and co-primary) endpoints under a unified framework. We use a Dirichlet-multinomial model to accommodate different types of endpoints. At each interim, the go/no-go decision is made by evaluating a set of posterior probabilities of the events of interest, which is optimized to maximize power or minimize the number of patients under the null hypothesis. Unlike other existing Bayesian designs, the BOP2 design explicitly controls the type I error rate, thereby bridging the gap between Bayesian designs and frequentist designs. In addition, the stopping boundary of the BOP2 design can be enumerated prior to the onset of the trial. These features make the BOP2 design accessible to a wide range of users and regulatory agencies and particularly easy to implement in practice. Simulation studies show that the BOP2 design has favorable operating characteristics with higher power and lower risk of incorrectly terminating the trial than some existing Bayesian phase II designs. The software to implement the BOP2 design is freely available at www.trialdesign.org. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Is multiple-sequence alignment required for accurate inference of phylogeny?
Höhl, Michael; Ragan, Mark A
2007-04-01
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
Is That What Bayesians Believe? Reply to Griffiths, Chater, Norris, and Pouget (2012)
ERIC Educational Resources Information Center
Bowers, Jeffrey S.; Davis, Colin J.
2012-01-01
Griffiths, Chater, Norris, and Pouget (2012) argue that we have misunderstood the Bayesian approach. In their view, it is rarely the case that researchers are making claims that performance in a given task is near optimal, and few, if any, researchers adopt the theoretical Bayesian perspective according to which the mind or brain is actually…
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Human Inferences about Sequences: A Minimal Transition Probability Model
2016-01-01
The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Li, Hu; Leavengood, John M.; Chapman, Eric G.; Burkhardt, Daniel; Song, Fan; Jiang, Pei; Liu, Jinpeng; Cai, Wanzhi
2017-01-01
Hemiptera, the largest non-holometabolous order of insects, represents approximately 7% of metazoan diversity. With extraordinary life histories and highly specialized morphological adaptations, hemipterans have exploited diverse habitats and food sources through approximately 300 Myr of evolution. To elucidate the phylogeny and evolutionary history of Hemiptera, we carried out the most comprehensive mitogenomics analysis on the richest taxon sampling to date covering all the suborders and infraorders, including 34 newly sequenced and 94 published mitogenomes. With optimized branch length and sequence heterogeneity, Bayesian analyses using a site-heterogeneous mixture model resolved the higher-level hemipteran phylogeny as (Sternorrhyncha, (Auchenorrhyncha, (Coleorrhyncha, Heteroptera))). Ancestral character state reconstruction and divergence time estimation suggest that the success of true bugs (Heteroptera) is probably due to angiosperm coevolution, but key adaptive innovations (e.g. prognathous mouthpart, predatory behaviour, and haemelytron) facilitated multiple independent shifts among diverse feeding habits and multiple independent colonizations of aquatic habitats. PMID:28878063
Inferring action structure and causal relationships in continuous sequences of human action.
Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare
2015-02-01
In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.
Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias
2012-01-01
Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.
Convergence among cave catfishes: long-branch attraction and a Bayesian relative rates test.
Wilcox, T P; García de León, F J; Hendrickson, D A; Hillis, D M
2004-06-01
Convergence has long been of interest to evolutionary biologists. Cave organisms appear to be ideal candidates for studying convergence in morphological, physiological, and developmental traits. Here we report apparent convergence in two cave-catfishes that were described on morphological grounds as congeners: Prietella phreatophila and Prietella lundbergi. We collected mitochondrial DNA sequence data from 10 species of catfishes, representing five of the seven genera in Ictaluridae, as well as seven species from a broad range of siluriform outgroups. Analysis of the sequence data under parsimony supports a monophyletic Prietella. However, both maximum-likelihood and Bayesian analyses support polyphyly of the genus, with P. lundbergi sister to Ictalurus and P. phreatophila sister to Ameiurus. The topological difference between parsimony and the other methods appears to result from long-branch attraction between the Prietella species. Similarly, the sequence data do not support several other relationships within Ictaluridae supported by morphology. We develop a new Bayesian method for examining variation in molecular rates of evolution across a phylogeny.
Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest
2007-01-01
WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes
Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.
2012-01-01
Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300
Informative priors on fetal fraction increase power of the noninvasive prenatal screen.
Xu, Hanli; Wang, Shaowei; Ma, Lin-Lin; Huang, Shuai; Liang, Lin; Liu, Qian; Liu, Yang-Yang; Liu, Ke-Di; Tan, Ze-Min; Ban, Hao; Guan, Yongtao; Lu, Zuhong
2017-11-09
PurposeNoninvasive prenatal screening (NIPS) sequences a mixture of the maternal and fetal cell-free DNA. Fetal trisomy can be detected by examining chromosomal dosages estimated from sequencing reads. The traditional method uses the Z-test, which compares a subject against a set of euploid controls, where the information of fetal fraction is not fully utilized. Here we present a Bayesian method that leverages informative priors on the fetal fraction.MethodOur Bayesian method combines the Z-test likelihood and informative priors of the fetal fraction, which are learned from the sex chromosomes, to compute Bayes factors. Bayesian framework can account for nongenetic risk factors through the prior odds, and our method can report individual positive/negative predictive values.ResultsOur Bayesian method has more power than the Z-test method. We analyzed 3,405 NIPS samples and spotted at least 9 (of 51) possible Z-test false positives.ConclusionBayesian NIPS is more powerful than the Z-test method, is able to account for nongenetic risk factors through prior odds, and can report individual positive/negative predictive values.Genetics in Medicine advance online publication, 9 November 2017; doi:10.1038/gim.2017.186.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.
Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori
2018-01-01
We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.
Don't Fear Optimality: Sampling for Probabilistic-Logic Sequence Models
NASA Astrophysics Data System (ADS)
Thon, Ingo
One of the current challenges in artificial intelligence is modeling dynamic environments that change due to the actions or activities undertaken by people or agents. The task of inferring hidden states, e.g. the activities or intentions of people, based on observations is called filtering. Standard probabilistic models such as Dynamic Bayesian Networks are able to solve this task efficiently using approximative methods such as particle filters. However, these models do not support logical or relational representations. The key contribution of this paper is the upgrade of a particle filter algorithm for use with a probabilistic logical representation through the definition of a proposal distribution. The performance of the algorithm depends largely on how well this distribution fits the target distribution. We adopt the idea of logical compilation into Binary Decision Diagrams for sampling. This allows us to use the optimal proposal distribution which is normally prohibitively slow.
Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals
Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E
2018-01-01
Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587
A Bayesian Account of Vocal Adaptation to Pitch-Shifted Auditory Feedback
Hahnloser, Richard H. R.
2017-01-01
Motor systems are highly adaptive. Both birds and humans compensate for synthetically induced shifts in the pitch (fundamental frequency) of auditory feedback stemming from their vocalizations. Pitch-shift compensation is partial in the sense that large shifts lead to smaller relative compensatory adjustments of vocal pitch than small shifts. Also, compensation is larger in subjects with high motor variability. To formulate a mechanistic description of these findings, we adapt a Bayesian model of error relevance. We assume that vocal-auditory feedback loops in the brain cope optimally with known sensory and motor variability. Based on measurements of motor variability, optimal compensatory responses in our model provide accurate fits to published experimental data. Optimal compensation correctly predicts sensory acuity, which has been estimated in psychophysical experiments as just-noticeable pitch differences. Our model extends the utility of Bayesian approaches to adaptive vocal behaviors. PMID:28135267
Merlé, Y; Mentré, F
1995-02-01
In this paper 3 criteria to design experiments for Bayesian estimation of the parameters of nonlinear models with respect to their parameters, when a prior distribution is available, are presented: the determinant of the Bayesian information matrix, the determinant of the pre-posterior covariance matrix, and the expected information provided by an experiment. A procedure to simplify the computation of these criteria is proposed in the case of continuous prior distributions and is compared with the criterion obtained from a linearization of the model about the mean of the prior distribution for the parameters. This procedure is applied to two models commonly encountered in the area of pharmacokinetics and pharmacodynamics: the one-compartment open model with bolus intravenous single-dose injection and the Emax model. They both involve two parameters. Additive as well as multiplicative gaussian measurement errors are considered with normal prior distributions. Various combinations of the variances of the prior distribution and of the measurement error are studied. Our attention is restricted to designs with limited numbers of measurements (1 or 2 measurements). This situation often occurs in practice when Bayesian estimation is performed. The optimal Bayesian designs that result vary with the variances of the parameter distribution and with the measurement error. The two-point optimal designs sometimes differ from the D-optimal designs for the mean of the prior distribution and may consist of replicating measurements. For the studied cases, the determinant of the Bayesian information matrix and its linearized form lead to the same optimal designs. In some cases, the pre-posterior covariance matrix can be far from its lower bound, namely, the inverse of the Bayesian information matrix, especially for the Emax model and a multiplicative measurement error. The expected information provided by the experiment and the determinant of the pre-posterior covariance matrix generally lead to the same designs except for the Emax model and the multiplicative measurement error. Results show that these criteria can be easily computed and that they could be incorporated in modules for designing experiments.
Randomized path optimization for thevMitigated counter detection of UAVS
2017-06-01
using Bayesian filtering . The KL divergence is used to compare the probability density of aircraft termination to a normal distribution around the...Bayesian filtering . The KL divergence is used to compare the probability density of aircraft termination to a normal distribution around the true terminal...algorithm’s success. A recursive Bayesian filtering scheme is used to assimilate noisy measurements of the UAVs position to predict its terminal location. We
An Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI for Spelling
NASA Astrophysics Data System (ADS)
Moghadamfalahi, Mohammad; Akcakaya, Murat; Nezamfar, Hooman; Sourati, Jamshid; Erdogmus, Deniz
2017-10-01
A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed.
Ramachandran, Parameswaran; Sánchez-Taltavull, Daniel; Perkins, Theodore J
2017-01-01
Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing-with its unique statistical properties-became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca.
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Estimation of submarine mass failure probability from a sequence of deposits with age dates
Geist, Eric L.; Chaytor, Jason D.; Parsons, Thomas E.; ten Brink, Uri S.
2013-01-01
The empirical probability of submarine mass failure is quantified from a sequence of dated mass-transport deposits. Several different techniques are described to estimate the parameters for a suite of candidate probability models. The techniques, previously developed for analyzing paleoseismic data, include maximum likelihood and Type II (Bayesian) maximum likelihood methods derived from renewal process theory and Monte Carlo methods. The estimated mean return time from these methods, unlike estimates from a simple arithmetic mean of the center age dates and standard likelihood methods, includes the effects of age-dating uncertainty and of open time intervals before the first and after the last event. The likelihood techniques are evaluated using Akaike’s Information Criterion (AIC) and Akaike’s Bayesian Information Criterion (ABIC) to select the optimal model. The techniques are applied to mass transport deposits recorded in two Integrated Ocean Drilling Program (IODP) drill sites located in the Ursa Basin, northern Gulf of Mexico. Dates of the deposits were constrained by regional bio- and magnetostratigraphy from a previous study. Results of the analysis indicate that submarine mass failures in this location occur primarily according to a Poisson process in which failures are independent and return times follow an exponential distribution. However, some of the model results suggest that submarine mass failures may occur quasiperiodically at one of the sites (U1324). The suite of techniques described in this study provides quantitative probability estimates of submarine mass failure occurrence, for any number of deposits and age uncertainty distributions.
Computational Neuropsychology and Bayesian Inference.
Parr, Thomas; Rees, Geraint; Friston, Karl J
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine 'prior' beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology - optimal inference with suboptimal priors - and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient's behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology.
Computational Neuropsychology and Bayesian Inference
Parr, Thomas; Rees, Geraint; Friston, Karl J.
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine ‘prior’ beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology – optimal inference with suboptimal priors – and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient’s behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology. PMID:29527157
Bayesian accounts of covert selective attention: A tutorial review.
Vincent, Benjamin T
2015-05-01
Decision making and optimal observer models offer an important theoretical approach to the study of covert selective attention. While their probabilistic formulation allows quantitative comparison to human performance, the models can be complex and their insights are not always immediately apparent. Part 1 establishes the theoretical appeal of the Bayesian approach, and introduces the way in which probabilistic approaches can be applied to covert search paradigms. Part 2 presents novel formulations of Bayesian models of 4 important covert attention paradigms, illustrating optimal observer predictions over a range of experimental manipulations. Graphical model notation is used to present models in an accessible way and Supplementary Code is provided to help bridge the gap between model theory and practical implementation. Part 3 reviews a large body of empirical and modelling evidence showing that many experimental phenomena in the domain of covert selective attention are a set of by-products. These effects emerge as the result of observers conducting Bayesian inference with noisy sensory observations, prior expectations, and knowledge of the generative structure of the stimulus environment.
NASA Astrophysics Data System (ADS)
Frosini, Mikael; Bernard, Denis
2017-09-01
We revisit the precision of the measurement of track parameters (position, angle) with optimal methods in the presence of detector resolution, multiple scattering and zero magnetic field. We then obtain an optimal estimator of the track momentum by a Bayesian analysis of the filtering innovations of a series of Kalman filters applied to the track. This work could pave the way to the development of autonomous high-performance gas time-projection chambers (TPC) or silicon wafer γ-ray space telescopes and be a powerful guide in the optimization of the design of the multi-kilo-ton liquid argon TPCs that are under development for neutrino studies.
Hu, X H; Li, Y P; Huang, G H; Zhuang, X W; Ding, X W
2016-05-01
In this study, a Bayesian-based two-stage inexact optimization (BTIO) method is developed for supporting water quality management through coupling Bayesian analysis with interval two-stage stochastic programming (ITSP). The BTIO method is capable of addressing uncertainties caused by insufficient inputs in water quality model as well as uncertainties expressed as probabilistic distributions and interval numbers. The BTIO method is applied to a real case of water quality management for the Xiangxi River basin in the Three Gorges Reservoir region to seek optimal water quality management schemes under various uncertainties. Interval solutions for production patterns under a range of probabilistic water quality constraints have been generated. Results obtained demonstrate compromises between the system benefit and the system failure risk due to inherent uncertainties that exist in various system components. Moreover, information about pollutant emission is accomplished, which would help managers to adjust production patterns of regional industry and local policies considering interactions of water quality requirement, economic benefit, and industry structure.
Optimal execution in high-frequency trading with Bayesian learning
NASA Astrophysics Data System (ADS)
Du, Bian; Zhu, Hongliang; Zhao, Jingdong
2016-11-01
We consider optimal trading strategies in which traders submit bid and ask quotes to maximize the expected quadratic utility of total terminal wealth in a limit order book. The trader's bid and ask quotes will be changed by the Poisson arrival of market orders. Meanwhile, the trader may update his estimate of other traders' target sizes and directions by Bayesian learning. The solution of optimal execution in the limit order book is a two-step procedure. First, we model an inactive trading with no limit order in the market. The dealer simply holds dollars and shares of stocks until terminal time. Second, he calibrates his bid and ask quotes to the limit order book. The optimal solutions are given by dynamic programming and in fact they are globally optimal. We also give numerical simulation to the value function and optimal quotes at the last part of the article.
NASA Astrophysics Data System (ADS)
Bai, Bing
2012-03-01
There has been a lot of work on total variation (TV) regularized tomographic image reconstruction recently. Many of them use gradient-based optimization algorithms with a differentiable approximation of the TV functional. In this paper we apply TV regularization in Positron Emission Tomography (PET) image reconstruction. We reconstruct the PET image in a Bayesian framework, using Poisson noise model and TV prior functional. The original optimization problem is transformed to an equivalent problem with inequality constraints by adding auxiliary variables. Then we use an interior point method with logarithmic barrier functions to solve the constrained optimization problem. In this method, a series of points approaching the solution from inside the feasible region are found by solving a sequence of subproblems characterized by an increasing positive parameter. We use preconditioned conjugate gradient (PCG) algorithm to solve the subproblems directly. The nonnegativity constraint is enforced by bend line search. The exact expression of the TV functional is used in our calculations. Simulation results show that the algorithm converges fast and the convergence is insensitive to the values of the regularization and reconstruction parameters.
A Bayesian Approach to Interactive Retrieval
ERIC Educational Resources Information Center
Tague, Jean M.
1973-01-01
A probabilistic model for interactive retrieval is presented. Bayesian statistical decision theory principles are applied: use of prior and sample information about the relationship of document descriptions to query relevance; maximization of expected value of a utility function, to the problem of optimally restructuring search strategies in an…
Model averaging, optimal inference, and habit formation
FitzGerald, Thomas H. B.; Dolan, Raymond J.; Friston, Karl J.
2014-01-01
Postulating that the brain performs approximate Bayesian inference generates principled and empirically testable models of neuronal function—the subject of much current interest in neuroscience and related disciplines. Current formulations address inference and learning under some assumed and particular model. In reality, organisms are often faced with an additional challenge—that of determining which model or models of their environment are the best for guiding behavior. Bayesian model averaging—which says that an agent should weight the predictions of different models according to their evidence—provides a principled way to solve this problem. Importantly, because model evidence is determined by both the accuracy and complexity of the model, optimal inference requires that these be traded off against one another. This means an agent's behavior should show an equivalent balance. We hypothesize that Bayesian model averaging plays an important role in cognition, given that it is both optimal and realizable within a plausible neuronal architecture. We outline model averaging and how it might be implemented, and then explore a number of implications for brain and behavior. In particular, we propose that model averaging can explain a number of apparently suboptimal phenomena within the framework of approximate (bounded) Bayesian inference, focusing particularly upon the relationship between goal-directed and habitual behavior. PMID:25018724
Moscoso del Prado Martín, Fermín
2013-12-01
I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
NASA Astrophysics Data System (ADS)
Hanish Nithin, Anu; Omenzetter, Piotr
2017-04-01
Optimization of the life-cycle costs and reliability of offshore wind turbines (OWTs) is an area of immense interest due to the widespread increase in wind power generation across the world. Most of the existing studies have used structural reliability and the Bayesian pre-posterior analysis for optimization. This paper proposes an extension to the previous approaches in a framework for probabilistic optimization of the total life-cycle costs and reliability of OWTs by combining the elements of structural reliability/risk analysis (SRA), the Bayesian pre-posterior analysis with optimization through a genetic algorithm (GA). The SRA techniques are adopted to compute the probabilities of damage occurrence and failure associated with the deterioration model. The probabilities are used in the decision tree and are updated using the Bayesian analysis. The output of this framework would determine the optimal structural health monitoring and maintenance schedules to be implemented during the life span of OWTs while maintaining a trade-off between the life-cycle costs and risk of the structural failure. Numerical illustrations with a generic deterioration model for one monitoring exercise in the life cycle of a system are demonstrated. Two case scenarios, namely to build initially an expensive and robust or a cheaper but more quickly deteriorating structures and to adopt expensive monitoring system, are presented to aid in the decision-making process.
Simple summation rule for optimal fixation selection in visual search.
Najemnik, Jiri; Geisler, Wilson S
2009-06-01
When searching for a known target in a natural texture, practiced humans achieve near-optimal performance compared to a Bayesian ideal searcher constrained with the human map of target detectability across the visual field [Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387-391]. To do so, humans must be good at choosing where to fixate during the search [Najemnik, J., & Geisler, W.S. (2008). Eye movement statistics in humans are consistent with an optimal strategy. Journal of Vision, 8(3), 1-14. 4]; however, it seems unlikely that a biological nervous system would implement the computations for the Bayesian ideal fixation selection because of their complexity. Here we derive and test a simple heuristic for optimal fixation selection that appears to be a much better candidate for implementation within a biological nervous system. Specifically, we show that the near-optimal fixation location is the maximum of the current posterior probability distribution for target location after the distribution is filtered by (convolved with) the square of the retinotopic target detectability map. We term the model that uses this strategy the entropy limit minimization (ELM) searcher. We show that when constrained with human-like retinotopic map of target detectability and human search error rates, the ELM searcher performs as well as the Bayesian ideal searcher, and produces fixation statistics similar to human.
A Bayesian Hybrid Adaptive Randomisation Design for Clinical Trials with Survival Outcomes.
Moatti, M; Chevret, S; Zohar, S; Rosenberger, W F
2016-01-01
Response-adaptive randomisation designs have been proposed to improve the efficiency of phase III randomised clinical trials and improve the outcomes of the clinical trial population. In the setting of failure time outcomes, Zhang and Rosenberger (2007) developed a response-adaptive randomisation approach that targets an optimal allocation, based on a fixed sample size. The aim of this research is to propose a response-adaptive randomisation procedure for survival trials with an interim monitoring plan, based on the following optimal criterion: for fixed variance of the estimated log hazard ratio, what allocation minimizes the expected hazard of failure? We demonstrate the utility of the design by redesigning a clinical trial on multiple myeloma. To handle continuous monitoring of data, we propose a Bayesian response-adaptive randomisation procedure, where the log hazard ratio is the effect measure of interest. Combining the prior with the normal likelihood, the mean posterior estimate of the log hazard ratio allows derivation of the optimal target allocation. We perform a simulation study to assess and compare the performance of this proposed Bayesian hybrid adaptive design to those of fixed, sequential or adaptive - either frequentist or fully Bayesian - designs. Non informative normal priors of the log hazard ratio were used, as well as mixture of enthusiastic and skeptical priors. Stopping rules based on the posterior distribution of the log hazard ratio were computed. The method is then illustrated by redesigning a phase III randomised clinical trial of chemotherapy in patients with multiple myeloma, with mixture of normal priors elicited from experts. As expected, there was a reduction in the proportion of observed deaths in the adaptive vs. non-adaptive designs; this reduction was maximized using a Bayes mixture prior, with no clear-cut improvement by using a fully Bayesian procedure. The use of stopping rules allows a slight decrease in the observed proportion of deaths under the alternate hypothesis compared with the adaptive designs with no stopping rules. Such Bayesian hybrid adaptive survival trials may be promising alternatives to traditional designs, reducing the duration of survival trials, as well as optimizing the ethical concerns for patients enrolled in the trial.
Dolz, Roser; Valle, Rosa; Perera, Carmen L.; Bertran, Kateri; Frías, Maria T.; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J.
2013-01-01
Background Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Methodology/Principal Findings Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. Conclusions/Significance To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide. PMID:23805195
Alfonso-Morales, Abdulahi; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J
2013-01-01
Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide.
Statistical estimation via convex optimization for trending and performance monitoring
NASA Astrophysics Data System (ADS)
Samar, Sikandar
This thesis presents an optimization-based statistical estimation approach to find unknown trends in noisy data. A Bayesian framework is used to explicitly take into account prior information about the trends via trend models and constraints. The main focus is on convex formulation of the Bayesian estimation problem, which allows efficient computation of (globally) optimal estimates. There are two main parts of this thesis. The first part formulates trend estimation in systems described by known detailed models as a convex optimization problem. Statistically optimal estimates are then obtained by maximizing a concave log-likelihood function subject to convex constraints. We consider the problem of increasing problem dimension as more measurements become available, and introduce a moving horizon framework to enable recursive estimation of the unknown trend by solving a fixed size convex optimization problem at each horizon. We also present a distributed estimation framework, based on the dual decomposition method, for a system formed by a network of complex sensors with local (convex) estimation. Two specific applications of the convex optimization-based Bayesian estimation approach are described in the second part of the thesis. Batch estimation for parametric diagnostics in a flight control simulation of a space launch vehicle is shown to detect incipient fault trends despite the natural masking properties of feedback in the guidance and control loops. Moving horizon approach is used to estimate time varying fault parameters in a detailed nonlinear simulation model of an unmanned aerial vehicle. An excellent performance is demonstrated in the presence of winds and turbulence.
NASA Astrophysics Data System (ADS)
Beck, Joakim; Dia, Ben Mansour; Espath, Luis F. R.; Long, Quan; Tempone, Raúl
2018-06-01
In calculating expected information gain in optimal Bayesian experimental design, the computation of the inner loop in the classical double-loop Monte Carlo requires a large number of samples and suffers from underflow if the number of samples is small. These drawbacks can be avoided by using an importance sampling approach. We present a computationally efficient method for optimal Bayesian experimental design that introduces importance sampling based on the Laplace method to the inner loop. We derive the optimal values for the method parameters in which the average computational cost is minimized according to the desired error tolerance. We use three numerical examples to demonstrate the computational efficiency of our method compared with the classical double-loop Monte Carlo, and a more recent single-loop Monte Carlo method that uses the Laplace method as an approximation of the return value of the inner loop. The first example is a scalar problem that is linear in the uncertain parameter. The second example is a nonlinear scalar problem. The third example deals with the optimal sensor placement for an electrical impedance tomography experiment to recover the fiber orientation in laminate composites.
Drug delivery optimization through Bayesian networks.
Bellazzi, R.
1992-01-01
This paper describes how Bayesian Networks can be used in combination with compartmental models to plan Recombinant Human Erythropoietin (r-HuEPO) delivery in the treatment of anemia of chronic uremic patients. Past measurements of hematocrit or hemoglobin concentration in a patient during the therapy can be exploited to adjust the parameters of a compartmental model of the erythropoiesis. This adaptive process allows more accurate patient-specific predictions, and hence a more rational dosage planning. We describe a drug delivery optimization protocol, based on our approach. Some results obtained on real data are presented. PMID:1482938
Open-loop-feedback control of serum drug concentrations: pharmacokinetic approaches to drug therapy.
Jelliffe, R W
1983-01-01
Recent developments to optimize open-loop-feedback control of drug dosage regimens, generally applicable to pharmacokinetically oriented therapy with many drugs, involve computation of patient-individualized strategies for obtaining desired serum drug concentrations. Analyses of past therapy are performed by least squares, extended least squares, and maximum a posteriori probability Bayesian methods of fitting pharmacokinetic models to serum level data. Future possibilities for truly optimal open-loop-feedback therapy with full Bayesian methods, and conceivably for optimal closed-loop therapy in such data-poor clinical situations, are also discussed. Implementation of these various therapeutic strategies, using automated, locally controlled infusion devices, has also been achieved in prototype form.
MDTS: automatic complex materials design using Monte Carlo tree search.
M Dieb, Thaer; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji
2017-01-01
Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
MDTS: automatic complex materials design using Monte Carlo tree search
NASA Astrophysics Data System (ADS)
Dieb, Thaer M.; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji
2017-12-01
Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
The choice of sample size: a mixed Bayesian / frequentist approach.
Pezeshk, Hamid; Nematollahi, Nader; Maroufy, Vahed; Gittins, John
2009-04-01
Sample size computations are largely based on frequentist or classical methods. In the Bayesian approach the prior information on the unknown parameters is taken into account. In this work we consider a fully Bayesian approach to the sample size determination problem which was introduced by Grundy et al. and developed by Lindley. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. We then find the optimal sample size for the trial by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.
Probabilistic models in human sensorimotor control
Wolpert, Daniel M.
2009-01-01
Sensory and motor uncertainty form a fundamental constraint on human sensorimotor control. Bayesian decision theory (BDT) has emerged as a unifying framework to understand how the central nervous system performs optimal estimation and control in the face of such uncertainty. BDT has two components: Bayesian statistics and decision theory. Here we review Bayesian statistics and show how it applies to estimating the state of the world and our own body. Recent results suggest that when learning novel tasks we are able to learn the statistical properties of both the world and our own sensory apparatus so as to perform estimation using Bayesian statistics. We review studies which suggest that humans can combine multiple sources of information to form maximum likelihood estimates, can incorporate prior beliefs about possible states of the world so as to generate maximum a posteriori estimates and can use Kalman filter-based processes to estimate time-varying states. Finally, we review Bayesian decision theory in motor control and how the central nervous system processes errors to determine loss functions and optimal actions. We review results that suggest we plan movements based on statistics of our actions that result from signal-dependent noise on our motor outputs. Taken together these studies provide a statistical framework for how the motor system performs in the presence of uncertainty. PMID:17628731
BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data
Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han
2011-01-01
Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia
2015-01-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.
Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia
2015-06-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
Current treatment paradigms in rheumatoid arthritis.
Fries, J F
2000-06-01
Rheumatoid arthritis (RA) has traditionally been treated using the pyramid approach, in which non-steroidal anti-inflammatory drugs (NSAIDs) are the first-line treatment and disease-modifying anti-rheumatic drugs (DMARDs) are introduced relatively late in the disease. This approach is no longer valid. Previously regarded as a benign disease, RA is now recognized as causing substantial morbidity and mortality, as do the NSAIDs used in treatment. DMARDs are more effective in controlling the pain and disability of RA than NSAIDs, and are often no more toxic. The current treatment paradigm emphasizes early, consistent use of DMARDs. A 'sawtooth' strategy of DMARD use has been proposed, in which a rising but low level of disability triggers a change in therapy. Determining the most clinically useful DMARD combinations and the optimal sequence of DMARD use requires effectiveness studies, Bayesian approaches and analyses of long-term outcomes. Such approaches will allow optimization of multiple drug therapies in RA, and should substantially improve the long-term outcome for many patients.
Seeing Like a Geologist: Bayesian Use of Expert Categories in Location Memory
ERIC Educational Resources Information Center
Holden, Mark P.; Newcombe, Nora S.; Resnick, Ilyse; Shipley, Thomas F.
2016-01-01
Memory for spatial location is typically biased, with errors trending toward the center of a surrounding region. According to the category adjustment model (CAM), this bias reflects the optimal, Bayesian combination of fine-grained and categorical representations of a location. However, there is disagreement about whether categories are malleable.…
Depaoli, Sarah
2013-06-01
Growth mixture modeling (GMM) represents a technique that is designed to capture change over time for unobserved subgroups (or latent classes) that exhibit qualitatively different patterns of growth. The aim of the current article was to explore the impact of latent class separation (i.e., how similar growth trajectories are across latent classes) on GMM performance. Several estimation conditions were compared: maximum likelihood via the expectation maximization (EM) algorithm and the Bayesian framework implementing diffuse priors, "accurate" informative priors, weakly informative priors, data-driven informative priors, priors reflecting partial-knowledge of parameters, and "inaccurate" (but informative) priors. The main goal was to provide insight about the optimal estimation condition under different degrees of latent class separation for GMM. Results indicated that optimal parameter recovery was obtained though the Bayesian approach using "accurate" informative priors, and partial-knowledge priors showed promise for the recovery of the growth trajectory parameters. Maximum likelihood and the remaining Bayesian estimation conditions yielded poor parameter recovery for the latent class proportions and the growth trajectories. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Bayesian Lagrangian Data Assimilation and Drifter Deployment Strategies
NASA Astrophysics Data System (ADS)
Dutt, A.; Lermusiaux, P. F. J.
2017-12-01
Ocean currents transport a variety of natural (e.g. water masses, phytoplankton, zooplankton, sediments, etc.) and man-made materials and other objects (e.g. pollutants, floating debris, search and rescue, etc.). Lagrangian Coherent Structures (LCSs) or the most influential/persistent material lines in a flow, provide a robust approach to characterize such Lagrangian transports and organize classic trajectories. Using the flow-map stochastic advection and a dynamically-orthogonal decomposition, we develop uncertainty prediction schemes for both Eulerian and Lagrangian variables. We then extend our Bayesian Gaussian Mixture Model (GMM)-DO filter to a joint Eulerian-Lagrangian Bayesian data assimilation scheme. The resulting nonlinear filter allows the simultaneous non-Gaussian estimation of Eulerian variables (e.g. velocity, temperature, salinity, etc.) and Lagrangian variables (e.g. drifter/float positions, trajectories, LCSs, etc.). Its results are showcased using a double-gyre flow with a random frequency, a stochastic flow past a cylinder, and realistic ocean examples. We further show how our Bayesian mutual information and adaptive sampling equations provide a rigorous efficient methodology to plan optimal drifter deployment strategies and predict the optimal times, locations, and types of measurements to be collected.
Perdikaris, Paris; Karniadakis, George Em
2016-05-01
We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. © 2016 The Author(s).
Perdikaris, Paris; Karniadakis, George Em
2016-01-01
We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation. PMID:27194481
Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.
Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H
2013-05-01
Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.
The genetic diversity of hepatitis A genotype I in Bulgaria
Cella, Eleonora; Golkocheva-Markova, Elitsa N.; Trandeva-Bankova, Diljana; Gregori, Giulia; Bruni, Roberto; Taffon, Stefania; Equestre, Michele; Costantino, Angela; Spoto, Silvia; Curtis, Melissa; Ciccaglione, Anna Rita; Ciccozzi, Massimo; Angeletti, Silvia
2018-01-01
Abstract The purpose of this study was to analyze sequences of hepatitis A virus (HAV) Ia and Ib genotypes from Bulgarian patients to investigate the molecular epidemiology of HAV genotype I during the years 2012 to 2014. Around 105 serum samples were collected by the Department of Virology of the National Center of Infectious and Parasitic Diseases in Bulgaria. The sequenced region encompassed the VP1/2A region of HAV genome. The sequences obtained from the samples were 103. For the phylogenetic analyses, 5 datasets were built to investigate the viral gene in/out flow among distinct HAV subpopulations in different geographic areas and to build a Bayesian dated tree, Bayesian phylogenetic and migration pattern analyses were performed. HAV Ib Bulgarian sequences mostly grouped into a single clade. This indicates that the Bulgarian epidemic is partially compartmentalized. It originated from a limited number of viruses and then spread through fecal-oral local transmission. HAV Ia Bulgarian sequences were intermixed with European sequences, suggesting that an Ia epidemic is not restricted to Bulgaria but can affect other European countries. The time-scaled phylogeny reconstruction showed the root of the tree dating in 2008 for genotype Ib and in 1999 for genotype Ia with a second epidemic entrance in 2003. The Bayesian skyline plot for genotype Ib showed a slow but continuous growth, sustained by fecal-oral route transmission. For genotype Ia, there was an exponential growth followed by a plateau, which suggests better infection control. Bidirectional viral flow for Ib genotype, involving different Bulgarian areas, was observed, whereas a unidirectional flow from Sofia to Ihtiman for genotype Ia was highlighted, suggesting the fecal-oral transmission route for Ia. PMID:29504993
The genetic diversity of hepatitis A genotype I in Bulgaria.
Cella, Eleonora; Golkocheva-Markova, Elitsa N; Trandeva-Bankova, Diljana; Gregori, Giulia; Bruni, Roberto; Taffon, Stefania; Equestre, Michele; Costantino, Angela; Spoto, Silvia; Curtis, Melissa; Ciccaglione, Anna Rita; Ciccozzi, Massimo; Angeletti, Silvia
2018-01-01
The purpose of this study was to analyze sequences of hepatitis A virus (HAV) Ia and Ib genotypes from Bulgarian patients to investigate the molecular epidemiology of HAV genotype I during the years 2012 to 2014. Around 105 serum samples were collected by the Department of Virology of the National Center of Infectious and Parasitic Diseases in Bulgaria. The sequenced region encompassed the VP1/2A region of HAV genome. The sequences obtained from the samples were 103. For the phylogenetic analyses, 5 datasets were built to investigate the viral gene in/out flow among distinct HAV subpopulations in different geographic areas and to build a Bayesian dated tree, Bayesian phylogenetic and migration pattern analyses were performed. HAV Ib Bulgarian sequences mostly grouped into a single clade. This indicates that the Bulgarian epidemic is partially compartmentalized. It originated from a limited number of viruses and then spread through fecal-oral local transmission. HAV Ia Bulgarian sequences were intermixed with European sequences, suggesting that an Ia epidemic is not restricted to Bulgaria but can affect other European countries. The time-scaled phylogeny reconstruction showed the root of the tree dating in 2008 for genotype Ib and in 1999 for genotype Ia with a second epidemic entrance in 2003. The Bayesian skyline plot for genotype Ib showed a slow but continuous growth, sustained by fecal-oral route transmission. For genotype Ia, there was an exponential growth followed by a plateau, which suggests better infection control. Bidirectional viral flow for Ib genotype, involving different Bulgarian areas, was observed, whereas a unidirectional flow from Sofia to Ihtiman for genotype Ia was highlighted, suggesting the fecal-oral transmission route for Ia. Copyright © 2017 The Authors. Published by Wolters Kluwer Health, Inc. All rights reserved.
Predicting ICU mortality: a comparison of stationary and nonstationary temporal models.
Kayaalp, M.; Cooper, G. F.; Clermont, G.
2000-01-01
OBJECTIVE: This study evaluates the effectiveness of the stationarity assumption in predicting the mortality of intensive care unit (ICU) patients at the ICU discharge. DESIGN: This is a comparative study. A stationary temporal Bayesian network learned from data was compared to a set of (33) nonstationary temporal Bayesian networks learned from data. A process observed as a sequence of events is stationary if its stochastic properties stay the same when the sequence is shifted in a positive or negative direction by a constant time parameter. The temporal Bayesian networks forecast mortalities of patients, where each patient has one record per day. The predictive performance of the stationary model is compared with nonstationary models using the area under the receiver operating characteristics (ROC) curves. RESULTS: The stationary model usually performed best. However, one nonstationary model using large data sets performed significantly better than the stationary model. CONCLUSION: Results suggest that using a combination of stationary and nonstationary models may predict better than using either alone. PMID:11079917
Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran.
Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar
2016-01-01
HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts.
Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran
Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar
2016-01-01
HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts. PMID:27280293
Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M
2017-03-27
Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Dumont, Cyrielle; Lestini, Giulia; Le Nagard, Hervé; Mentré, France; Comets, Emmanuelle; Nguyen, Thu Thuy; Group, For The Pfim
2018-03-01
Nonlinear mixed-effect models (NLMEMs) are increasingly used for the analysis of longitudinal studies during drug development. When designing these studies, the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trial simulations. The function PFIM is the first tool for design evaluation and optimization that has been developed in R. In this article, we present an extended version, PFIM 4.0, which includes several new features. Compared with version 3.0, PFIM 4.0 includes a more complete pharmacokinetic/pharmacodynamic library of models and accommodates models including additional random effects for inter-occasion variability as well as discrete covariates. A new input method has been added to specify user-defined models through an R function. Optimization can be performed assuming some fixed parameters or some fixed sampling times. New outputs have been added regarding the FIM such as eigenvalues, conditional numbers, and the option of saving the matrix obtained after evaluation or optimization. Previously obtained results, which are summarized in a FIM, can be taken into account in evaluation or optimization of one-group protocols. This feature enables the use of PFIM for adaptive designs. The Bayesian individual FIM has been implemented, taking into account a priori distribution of random effects. Designs for maximum a posteriori Bayesian estimation of individual parameters can now be evaluated or optimized and the predicted shrinkage is also reported. It is also possible to visualize the graphs of the model and the sensitivity functions without performing evaluation or optimization. The usefulness of these approaches and the simplicity of use of PFIM 4.0 are illustrated by two examples: (i) an example of designing a population pharmacokinetic study accounting for previous results, which highlights the advantage of adaptive designs; (ii) an example of Bayesian individual design optimization for a pharmacodynamic study, showing that the Bayesian individual FIM can be a useful tool in therapeutic drug monitoring, allowing efficient prediction of estimation precision and shrinkage for individual parameters. PFIM 4.0 is a useful tool for design evaluation and optimization of longitudinal studies in pharmacometrics and is freely available at http://www.pfim.biostat.fr. Copyright © 2018 Elsevier B.V. All rights reserved.
Bayesian models: A statistical primer for ecologists
Hobbs, N. Thompson; Hooten, Mevin B.
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir
2014-01-01
Abstract Phylogenetic relationships among Malaysia’s long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo’s population was distinguished from Peninsula’s population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia’s M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia. PMID:24899832
Abdul-Latiff, Muhammad Abu Bakar; Ruslin, Farhani; Fui, Vun Vui; Abu, Mohd-Hashim; Rovie-Ryan, Jeffrine Japning; Abdul-Patah, Pazil; Lakim, Maklarin; Roos, Christian; Yaakop, Salmah; Md-Zain, Badrul Munir
2014-01-01
Phylogenetic relationships among Malaysia's long-tailed macaques have yet to be established, despite abundant genetic studies of the species worldwide. The aims of this study are to examine the phylogenetic relationships of Macaca fascicularis in Malaysia and to test its classification as a morphological subspecies. A total of 25 genetic samples of M. fascicularis yielding 383 bp of Cytochrome b (Cyt b) sequences were used in phylogenetic analysis along with one sample each of M. nemestrina and M. arctoides used as outgroups. Sequence character analysis reveals that Cyt b locus is a highly conserved region with only 23% parsimony informative character detected among ingroups. Further analysis indicates a clear separation between populations originating from different regions; the Malay Peninsula versus Borneo Insular, the East Coast versus West Coast of the Malay Peninsula, and the island versus mainland Malay Peninsula populations. Phylogenetic trees (NJ, MP and Bayesian) portray a consistent clustering paradigm as Borneo's population was distinguished from Peninsula's population (99% and 100% bootstrap value in NJ and MP respectively and 1.00 posterior probability in Bayesian trees). The East coast population was separated from other Peninsula populations (64% in NJ, 66% in MP and 0.53 posterior probability in Bayesian). West coast populations were divided into 2 clades: the North-South (47%/54% in NJ, 26/26% in MP and 1.00/0.80 posterior probability in Bayesian) and Island-Mainland (93% in NJ, 90% in MP and 1.00 posterior probability in Bayesian). The results confirm the previous morphological assignment of 2 subspecies, M. f. fascicularis and M. f. argentimembris, in the Malay Peninsula. These populations should be treated as separate genetic entities in order to conserve the genetic diversity of Malaysia's M. fascicularis. These findings are crucial in aiding the conservation management and translocation process of M. fascicularis populations in Malaysia.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2014-11-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud masks were designed to be numerically efficient and suited for the processing of large amounts of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient amounts of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2015-04-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
Bayesian estimation inherent in a Mexican-hat-type neural network
NASA Astrophysics Data System (ADS)
Takiyama, Ken
2016-05-01
Brain functions, such as perception, motor control and learning, and decision making, have been explained based on a Bayesian framework, i.e., to decrease the effects of noise inherent in the human nervous system or external environment, our brain integrates sensory and a priori information in a Bayesian optimal manner. However, it remains unclear how Bayesian computations are implemented in the brain. Herein, I address this issue by analyzing a Mexican-hat-type neural network, which was used as a model of the visual cortex, motor cortex, and prefrontal cortex. I analytically demonstrate that the dynamics of an order parameter in the model corresponds exactly to a variational inference of a linear Gaussian state-space model, a Bayesian estimation, when the strength of recurrent synaptic connectivity is appropriately stronger than that of an external stimulus, a plausible condition in the brain. This exact correspondence can reveal the relationship between the parameters in the Bayesian estimation and those in the neural network, providing insight for understanding brain functions.
Zhang, J L; Li, Y P; Huang, G H; Baetz, B W; Liu, J
2017-06-01
In this study, a Bayesian estimation-based simulation-optimization modeling approach (BESMA) is developed for identifying effluent trading strategies. BESMA incorporates nutrient fate modeling with soil and water assessment tool (SWAT), Bayesian estimation, and probabilistic-possibilistic interval programming with fuzzy random coefficients (PPI-FRC) within a general framework. Based on the water quality protocols provided by SWAT, posterior distributions of parameters can be analyzed through Bayesian estimation; stochastic characteristic of nutrient loading can be investigated which provides the inputs for the decision making. PPI-FRC can address multiple uncertainties in the form of intervals with fuzzy random boundaries and the associated system risk through incorporating the concept of possibility and necessity measures. The possibility and necessity measures are suitable for optimistic and pessimistic decision making, respectively. BESMA is applied to a real case of effluent trading planning in the Xiangxihe watershed, China. A number of decision alternatives can be obtained under different trading ratios and treatment rates. The results can not only facilitate identification of optimal effluent-trading schemes, but also gain insight into the effects of trading ratio and treatment rate on decision making. The results also reveal that decision maker's preference towards risk would affect decision alternatives on trading scheme as well as system benefit. Compared with the conventional optimization methods, it is proved that BESMA is advantageous in (i) dealing with multiple uncertainties associated with randomness and fuzziness in effluent-trading planning within a multi-source, multi-reach and multi-period context; (ii) reflecting uncertainties existing in nutrient transport behaviors to improve the accuracy in water quality prediction; and (iii) supporting pessimistic and optimistic decision making for effluent trading as well as promoting diversity of decision alternatives. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bayesian Recurrent Neural Network for Language Modeling.
Chien, Jen-Tzung; Ku, Yuan-Chu
2016-02-01
A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.
Le Bras, Ronan J; Kuzma, Heidi; Sucic, Victor; Bokelmann, Götz
2016-05-01
A notable sequence of calls was encountered, spanning several days in January 2003, in the central part of the Indian Ocean on a hydrophone triplet recording acoustic data at a 250 Hz sampling rate. This paper presents signal processing methods applied to the waveform data to detect, group, extract amplitude and bearing estimates for the recorded signals. An approximate location for the source of the sequence of calls is inferred from extracting the features from the waveform. As the source approaches the hydrophone triplet, the source level (SL) of the calls is estimated at 187 ± 6 dB re: 1 μPa-1 m in the 15-60 Hz frequency range. The calls are attributed to a subgroup of blue whales, Balaenoptera musculus, with a characteristic acoustic signature. A Bayesian location method using probabilistic models for bearing and amplitude is demonstrated on the calls sequence. The method is applied to the case of detection at a single triad of hydrophones and results in a probability distribution map for the origin of the calls. It can be extended to detections at multiple triads and because of the Bayesian formulation, additional modeling complexity can be built-in as needed.
Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data
Hu, Bo; Xu, Yaomin
2013-01-01
Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. PMID:23710259
NASA Astrophysics Data System (ADS)
Swinburne, Thomas D.; Perez, Danny
2018-05-01
A massively parallel method to build large transition rate matrices from temperature-accelerated molecular dynamics trajectories is presented. Bayesian Markov model analysis is used to estimate the expected residence time in the known state space, providing crucial uncertainty quantification for higher-scale simulation schemes such as kinetic Monte Carlo or cluster dynamics. The estimators are additionally used to optimize where exploration is performed and the degree of temperature acceleration on the fly, giving an autonomous, optimal procedure to explore the state space of complex systems. The method is tested against exactly solvable models and used to explore the dynamics of C15 interstitial defects in iron. Our uncertainty quantification scheme allows for accurate modeling of the evolution of these defects over timescales of several seconds.
Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F
2015-07-01
The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Causal gene identification using combinatorial V-structure search.
Cai, Ruichu; Zhang, Zhenjie; Hao, Zhifeng
2013-07-01
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data. Copyright © 2013 Elsevier Ltd. All rights reserved.
Posada, David; Buckley, Thomas R
2004-10-01
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons.
Yaeli, Steve; Meir, Ron
2010-01-01
Biological systems display impressive capabilities in effectively responding to environmental signals in real time. There is increasing evidence that organisms may indeed be employing near optimal Bayesian calculations in their decision-making. An intriguing question relates to the properties of optimal encoding methods, namely determining the properties of neural populations in sensory layers that optimize performance, subject to physiological constraints. Within an ecological theory of neural encoding/decoding, we show that optimal Bayesian performance requires neural adaptation which reflects environmental changes. Specifically, we predict that neuronal tuning functions possess an optimal width, which increases with prior uncertainty and environmental noise, and decreases with the decoding time window. Furthermore, even for static stimuli, we demonstrate that dynamic sensory tuning functions, acting at relatively short time scales, lead to improved performance. Interestingly, the narrowing of tuning functions as a function of time was recently observed in several biological systems. Such results set the stage for a functional theory which may explain the high reliability of sensory systems, and the utility of neuronal adaptation occurring at multiple time scales.
Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
van Tonder, Andries J.; Mistry, Shilan; Bray, James E.; Hill, Dorothea M. C.; Cody, Alison J.; Farmer, Chris L.; Klugman, Keith P.; von Gottberg, Anne; Bentley, Stephen D.; Parkhill, Julian; Jolley, Keith A.; Maiden, Martin C. J.; Brueggemann, Angela B.
2014-01-01
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance. PMID:25144616
Yu, Yi-Kuo; Capra, John A.; Stojmirović, Aleksandar; Landsman, David; Altschul, Stephen F.
2015-01-01
Motivation: DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns. Results: We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column’s observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions. Availability and implementation: Our new measures are implemented in an open-source Web-based logo generation program, which is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/logoddslogo/index.html. A stand-alone version of the program is also available from this site. Contact: altschul@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25294922
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Khan, Haseeb A; Arif, Ibrahim A; Bahkali, Ali H; Al Farhan, Ahmad H; Al Homaidan, Ali A
2008-10-06
This investigation was aimed to compare the inference of antelope phylogenies resulting from the 16S rRNA, cytochrome-b (cyt-b) and d-loop segments of mitochondrial DNA using three different computational models including Bayesian (BA), maximum parsimony (MP) and unweighted pair group method with arithmetic mean (UPGMA). The respective nucleotide sequences of three Oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) and an out-group (Addax nasomaculatus) were aligned and subjected to BA, MP and UPGMA models for comparing the topologies of respective phylogenetic trees. The 16S rRNA region possessed the highest frequency of conserved sequences (97.65%) followed by cyt-b (94.22%) and d-loop (87.29%). There were few transitions (2.35%) and none transversions in 16S rRNA as compared to cyt-b (5.61% transitions and 0.17% transversions) and d-loop (11.57% transitions and 1.14% transversions) while comparing the four taxa. All the three mitochondrial segments clearly differentiated the genus Addax from Oryx using the BA or UPGMA models. The topologies of all the gamma-corrected Bayesian trees were identical irrespective of the marker type. The UPGMA trees resulting from 16S rRNA and d-loop sequences were also identical (Oryx dammah grouped with Oryx leucoryx) to Bayesian trees except that the UPGMA tree based on cyt-b showed a slightly different phylogeny (Oryx dammah grouped with Oryx gazella) with a low bootstrap support. However, the MP model failed to differentiate the genus Addax from Oryx. These findings demonstrate the efficiency and robustness of BA and UPGMA methods for phylogenetic analysis of antelopes using mitochondrial markers.
Khan, Haseeb A.; Arif, Ibrahim A.; Bahkali, Ali H.; Al Farhan, Ahmad H.; Al Homaidan, Ali A.
2008-01-01
This investigation was aimed to compare the inference of antelope phylogenies resulting from the 16S rRNA, cytochrome-b (cyt-b) and d-loop segments of mitochondrial DNA using three different computational models including Bayesian (BA), maximum parsimony (MP) and unweighted pair group method with arithmetic mean (UPGMA). The respective nucleotide sequences of three Oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) and an out-group (Addax nasomaculatus) were aligned and subjected to BA, MP and UPGMA models for comparing the topologies of respective phylogenetic trees. The 16S rRNA region possessed the highest frequency of conserved sequences (97.65%) followed by cyt-b (94.22%) and d-loop (87.29%). There were few transitions (2.35%) and none transversions in 16S rRNA as compared to cyt-b (5.61% transitions and 0.17% transversions) and d-loop (11.57% transitions and 1.14% transversions) while comparing the four taxa. All the three mitochondrial segments clearly differentiated the genus Addax from Oryx using the BA or UPGMA models. The topologies of all the gamma-corrected Bayesian trees were identical irrespective of the marker type. The UPGMA trees resulting from 16S rRNA and d-loop sequences were also identical (Oryx dammah grouped with Oryx leucoryx) to Bayesian trees except that the UPGMA tree based on cyt-b showed a slightly different phylogeny (Oryx dammah grouped with Oryx gazella) with a low bootstrap support. However, the MP model failed to differentiate the genus Addax from Oryx. These findings demonstrate the efficiency and robustness of BA and UPGMA methods for phylogenetic analysis of antelopes using mitochondrial markers. PMID:19204824
A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study.
Kaplan, David; Chen, Jianshen
2012-07-01
A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.
Reducing uncertainties in decadal variability of the global carbon budget with multiple datasets
Li, Wei; Ciais, Philippe; Wang, Yilong; Peng, Shushi; Broquet, Grégoire; Ballantyne, Ashley P.; Canadell, Josep G.; Cooper, Leila; Friedlingstein, Pierre; Le Quéré, Corinne; Myneni, Ranga B.; Peters, Glen P.; Piao, Shilong; Pongratz, Julia
2016-01-01
Conventional calculations of the global carbon budget infer the land sink as a residual between emissions, atmospheric accumulation, and the ocean sink. Thus, the land sink accumulates the errors from the other flux terms and bears the largest uncertainty. Here, we present a Bayesian fusion approach that combines multiple observations in different carbon reservoirs to optimize the land (B) and ocean (O) carbon sinks, land use change emissions (L), and indirectly fossil fuel emissions (F) from 1980 to 2014. Compared with the conventional approach, Bayesian optimization decreases the uncertainties in B by 41% and in O by 46%. The L uncertainty decreases by 47%, whereas F uncertainty is marginally improved through the knowledge of natural fluxes. Both ocean and net land uptake (B + L) rates have positive trends of 29 ± 8 and 37 ± 17 Tg C⋅y−2 since 1980, respectively. Our Bayesian fusion of multiple observations reduces uncertainties, thereby allowing us to isolate important variability in global carbon cycle processes. PMID:27799533
Exact calculation of distributions on integers, with application to sequence alignment.
Newberg, Lee A; Lawrence, Charles E
2009-01-01
Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
Bayesian exploration for intelligent identification of textures.
Fishel, Jeremy A; Loeb, Gerald E
2012-01-01
In order to endow robots with human-like abilities to characterize and identify objects, they must be provided with tactile sensors and intelligent algorithms to select, control, and interpret data from useful exploratory movements. Humans make informed decisions on the sequence of exploratory movements that would yield the most information for the task, depending on what the object may be and prior knowledge of what to expect from possible exploratory movements. This study is focused on texture discrimination, a subset of a much larger group of exploratory movements and percepts that humans use to discriminate, characterize, and identify objects. Using a testbed equipped with a biologically inspired tactile sensor (the BioTac), we produced sliding movements similar to those that humans make when exploring textures. Measurement of tactile vibrations and reaction forces when exploring textures were used to extract measures of textural properties inspired from psychophysical literature (traction, roughness, and fineness). Different combinations of normal force and velocity were identified to be useful for each of these three properties. A total of 117 textures were explored with these three movements to create a database of prior experience to use for identifying these same textures in future encounters. When exploring a texture, the discrimination algorithm adaptively selects the optimal movement to make and property to measure based on previous experience to differentiate the texture from a set of plausible candidates, a process we call Bayesian exploration. Performance of 99.6% in correctly discriminating pairs of similar textures was found to exceed human capabilities. Absolute classification from the entire set of 117 textures generally required a small number of well-chosen exploratory movements (median = 5) and yielded a 95.4% success rate. The method of Bayesian exploration developed and tested in this paper may generalize well to other cognitive problems.
Bayesian Exploration for Intelligent Identification of Textures
Fishel, Jeremy A.; Loeb, Gerald E.
2012-01-01
In order to endow robots with human-like abilities to characterize and identify objects, they must be provided with tactile sensors and intelligent algorithms to select, control, and interpret data from useful exploratory movements. Humans make informed decisions on the sequence of exploratory movements that would yield the most information for the task, depending on what the object may be and prior knowledge of what to expect from possible exploratory movements. This study is focused on texture discrimination, a subset of a much larger group of exploratory movements and percepts that humans use to discriminate, characterize, and identify objects. Using a testbed equipped with a biologically inspired tactile sensor (the BioTac), we produced sliding movements similar to those that humans make when exploring textures. Measurement of tactile vibrations and reaction forces when exploring textures were used to extract measures of textural properties inspired from psychophysical literature (traction, roughness, and fineness). Different combinations of normal force and velocity were identified to be useful for each of these three properties. A total of 117 textures were explored with these three movements to create a database of prior experience to use for identifying these same textures in future encounters. When exploring a texture, the discrimination algorithm adaptively selects the optimal movement to make and property to measure based on previous experience to differentiate the texture from a set of plausible candidates, a process we call Bayesian exploration. Performance of 99.6% in correctly discriminating pairs of similar textures was found to exceed human capabilities. Absolute classification from the entire set of 117 textures generally required a small number of well-chosen exploratory movements (median = 5) and yielded a 95.4% success rate. The method of Bayesian exploration developed and tested in this paper may generalize well to other cognitive problems. PMID:22783186
MPN estimation of qPCR target sequence recoveries from whole cell calibrator samples.
Sivaganesan, Mano; Siefring, Shawn; Varma, Manju; Haugland, Richard A
2011-12-01
DNA extracts from enumerated target organism cells (calibrator samples) have been used for estimating Enterococcus cell equivalent densities in surface waters by a comparative cycle threshold (Ct) qPCR analysis method. To compare surface water Enterococcus density estimates from different studies by this approach, either a consistent source of calibrator cells must be used or the estimates must account for any differences in target sequence recoveries from different sources of calibrator cells. In this report we describe two methods for estimating target sequence recoveries from whole cell calibrator samples based on qPCR analyses of their serially diluted DNA extracts and most probable number (MPN) calculation. The first method employed a traditional MPN calculation approach. The second method employed a Bayesian hierarchical statistical modeling approach and a Monte Carlo Markov Chain (MCMC) simulation method to account for the uncertainty in these estimates associated with different individual samples of the cell preparations, different dilutions of the DNA extracts and different qPCR analytical runs. The two methods were applied to estimate mean target sequence recoveries per cell from two different lots of a commercially available source of enumerated Enterococcus cell preparations. The mean target sequence recovery estimates (and standard errors) per cell from Lot A and B cell preparations by the Bayesian method were 22.73 (3.4) and 11.76 (2.4), respectively, when the data were adjusted for potential false positive results. Means were similar for the traditional MPN approach which cannot comparably assess uncertainty in the estimates. Cell numbers and estimates of recoverable target sequences in calibrator samples prepared from the two cell sources were also used to estimate cell equivalent and target sequence quantities recovered from surface water samples in a comparative Ct method. Our results illustrate the utility of the Bayesian method in accounting for uncertainty, the high degree of precision attainable by the MPN approach and the need to account for the differences in target sequence recoveries from different calibrator sample cell sources when they are used in the comparative Ct method. Published by Elsevier B.V.
Assessment of phylogenetic sensitivity for reconstructing HIV-1 epidemiological relationships.
Beloukas, Apostolos; Magiorkinis, Emmanouil; Magiorkinis, Gkikas; Zavitsanou, Asimina; Karamitros, Timokratis; Hatzakis, Angelos; Paraskevis, Dimitrios
2012-06-01
Phylogenetic analysis has been extensively used as a tool for the reconstruction of epidemiological relations for research or for forensic purposes. It was our objective to assess the sensitivity of different phylogenetic methods and various phylogenetic programs to reconstruct epidemiological links among HIV-1 infected patients that is the probability to reveal a true transmission relationship. Multiple datasets (90) were prepared consisting of HIV-1 sequences in protease (PR) and partial reverse transcriptase (RT) sampled from patients with documented epidemiological relationship (target population), and from unrelated individuals (control population) belonging to the same HIV-1 subtype as the target population. Each dataset varied regarding the number, the geographic origin and the transmission risk groups of the sequences among the control population. Phylogenetic trees were inferred by neighbor-joining (NJ), maximum likelihood heuristics (hML) and Bayesian methods. All clusters of sequences belonging to the target population were correctly reconstructed by NJ and Bayesian methods receiving high bootstrap and posterior probability (PP) support, respectively. On the other hand, TreePuzzle failed to reconstruct or provide significant support for several clusters; high puzzling step support was associated with the inclusion of control sequences from the same geographic area as the target population. In contrary, all clusters were correctly reconstructed by hML as implemented in PhyML 3.0 receiving high bootstrap support. We report that under the conditions of our study, hML using PhyML, NJ and Bayesian methods were the most sensitive for the reconstruction of epidemiological links mostly from sexually infected individuals. Copyright © 2012 Elsevier B.V. All rights reserved.
Bayesian reconstruction of transmission within outbreaks using genomic variants.
De Maio, Nicola; Worby, Colin J; Wilson, Daniel J; Stoesser, Nicole
2018-04-01
Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.
QUANTIFYING ALTERNATIVE SPLICING FROM PAIRED-END RNA-SEQUENCING DATA.
Rossell, David; Stephan-Otto Attolini, Camille; Kroiss, Manuel; Stöcker, Almond
2014-03-01
RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a several fold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.
BAYESIAN PROTEIN STRUCTURE ALIGNMENT.
Rodriguez, Abel; Schmidler, Scott C
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
A phylogenetic study of Laeliinae (Orchidaceae) based on combined nuclear and plastid DNA sequences
van den Berg, Cássio; Higgins, Wesley E.; Dressler, Robert L.; Whitten, W. Mark; Soto-Arenas, Miguel A.; Chase, Mark W.
2009-01-01
Background and Aims Laeliinae are a neotropical orchid subtribe with approx. 1500 species in 50 genera. In this study, an attempt is made to assess generic alliances based on molecular phylogenetic analysis of DNA sequence data. Methods Six DNA datasets were gathered: plastid trnL intron, trnL-F spacer, matK gene and trnK introns upstream and dowstream from matK and nuclear ITS rDNA. Data were analysed with maximum parsimony (MP) and Bayesian analysis with mixed models (BA). Key Results Although relationships between Laeliinae and outgroups are well supported, within the subtribe sequence variation is low considering the broad taxonomic range covered. Localized incongruence between the ITS and plastid trees was found. A combined tree followed the ITS trees more closely, but the levels of support obtained with MP were low. The Bayesian analysis recovered more well-supported nodes. The trees from combined MP and BA allowed eight generic alliances to be recognized within Laeliinae, all of which show trends in morphological characters but lack unambiguous synapomorphies. Conclusions By using combined plastid and nuclear DNA data in conjunction with mixed-models Bayesian inference, it is possible to delimit smaller groups within Laeliinae and discuss general patterns of pollination and hybridization compatibility. Furthermore, these small groups can now be used for further detailed studies to explain morphological evolution and diversification patterns within the subtribe. PMID:19423551
Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.
Fracassetti, Marco; Griffin, Philippa C; Willi, Yvonne
2015-01-01
Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Optimal Sequential Rules for Computer-Based Instruction.
ERIC Educational Resources Information Center
Vos, Hans J.
1998-01-01
Formulates sequential rules for adapting the appropriate amount of instruction to learning needs in the context of computer-based instruction. Topics include Bayesian decision theory, threshold and linear-utility structure, psychometric model, optimal sequential number of test questions, and an empirical example of sequential instructional…
Bayesian Spatial Design of Optimal Deep Tubewell Locations in Matlab, Bangladesh.
Warren, Joshua L; Perez-Heydrich, Carolina; Yunus, Mohammad
2013-09-01
We introduce a method for statistically identifying the optimal locations of deep tubewells (dtws) to be installed in Matlab, Bangladesh. Dtw installations serve to mitigate exposure to naturally occurring arsenic found at groundwater depths less than 200 meters, a serious environmental health threat for the population of Bangladesh. We introduce an objective function, which incorporates both arsenic level and nearest town population size, to identify optimal locations for dtw placement. Assuming complete knowledge of the arsenic surface, we then demonstrate how minimizing the objective function over a domain favors dtws placed in areas with high arsenic values and close to largely populated regions. Given only a partial realization of the arsenic surface over a domain, we use a Bayesian spatial statistical model to predict the full arsenic surface and estimate the optimal dtw locations. The uncertainty associated with these estimated locations is correctly characterized as well. The new method is applied to a dataset from a village in Matlab and the estimated optimal locations are analyzed along with their respective 95% credible regions.
Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu
2016-12-01
The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
Conceptual issues in Bayesian divergence time estimation
2016-01-01
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’. PMID:27325831
Conceptual issues in Bayesian divergence time estimation.
Rannala, Bruce
2016-07-19
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).
TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.
Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud
2011-09-01
Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.
Spatial Prediction and Optimized Sampling Design for Sodium Concentration in Groundwater
Shabbir, Javid; M. AbdEl-Salam, Nasser; Hussain, Tajammal
2016-01-01
Sodium is an integral part of water, and its excessive amount in drinking water causes high blood pressure and hypertension. In the present paper, spatial distribution of sodium concentration in drinking water is modeled and optimized sampling designs for selecting sampling locations is calculated for three divisions in Punjab, Pakistan. Universal kriging and Bayesian universal kriging are used to predict the sodium concentrations. Spatial simulated annealing is used to generate optimized sampling designs. Different estimation methods (i.e., maximum likelihood, restricted maximum likelihood, ordinary least squares, and weighted least squares) are used to estimate the parameters of the variogram model (i.e, exponential, Gaussian, spherical and cubic). It is concluded that Bayesian universal kriging fits better than universal kriging. It is also observed that the universal kriging predictor provides minimum mean universal kriging variance for both adding and deleting locations during sampling design. PMID:27683016
A Bayesian multi-stage cost-effectiveness design for animal studies in stroke research
Cai, Chunyan; Ning, Jing; Huang, Xuelin
2017-01-01
Much progress has been made in the area of adaptive designs for clinical trials. However, little has been done regarding adaptive designs to identify optimal treatment strategies in animal studies. Motivated by an animal study of a novel strategy for treating strokes, we propose a Bayesian multi-stage cost-effectiveness design to simultaneously identify the optimal dose and determine the therapeutic treatment window for administrating the experimental agent. We consider a non-monotonic pattern for the dose-schedule-efficacy relationship and develop an adaptive shrinkage algorithm to assign more cohorts to admissible strategies. We conduct simulation studies to evaluate the performance of the proposed design by comparing it with two standard designs. These simulation studies show that the proposed design yields a significantly higher probability of selecting the optimal strategy, while it is generally more efficient and practical in terms of resource usage. PMID:27405325
NASA Astrophysics Data System (ADS)
Seko, Atsuto; Togo, Atsushi; Hayashi, Hiroyuki; Tsuda, Koji; Chaput, Laurent; Tanaka, Isao
2015-11-01
Compounds of low lattice thermal conductivity (LTC) are essential for seeking thermoelectric materials with high conversion efficiency. Some strategies have been used to decrease LTC. However, such trials have yielded successes only within a limited exploration space. Here, we report the virtual screening of a library containing 54 779 compounds. Our strategy is to search the library through Bayesian optimization using for the initial data the LTC obtained from first-principles anharmonic lattice-dynamics calculations for a set of 101 compounds. We discovered 221 materials with very low LTC. Two of them even have an electronic band gap <1 eV , which makes them exceptional candidates for thermoelectric applications. In addition to those newly discovered thermoelectric materials, the present strategy is believed to be powerful for many other applications in which the chemistry of materials is required to be optimized.
Veneziano, D.; Agarwal, A.; Karaca, E.
2009-01-01
The problem of accounting for epistemic uncertainty in risk management decisions is conceptually straightforward, but is riddled with practical difficulties. Simple approximations are often used whereby future variations in epistemic uncertainty are ignored or worst-case scenarios are postulated. These strategies tend to produce sub-optimal decisions. We develop a general framework based on Bayesian decision theory and exemplify it for the case of seismic design of buildings. When temporal fluctuations of the epistemic uncertainties and regulatory safety constraints are included, the optimal level of seismic protection exceeds the normative level at the time of construction. Optimal Bayesian decisions do not depend on the aleatory or epistemic nature of the uncertainties, but only on the total (epistemic plus aleatory) uncertainty and how that total uncertainty varies randomly during the lifetime of the project. ?? 2009 Elsevier Ltd. All rights reserved.
Optimization of the resources management in fighting wildfires.
Martin-Fernández, Susana; Martínez-Falero, Eugenio; Pérez-González, J Manuel
2002-09-01
Wildfires lead to important economic, social, and environmental losses, especially in areas of Mediterranean climate where they are of a high intensity and frequency. Over the past 30 years there has been a dramatic surge in the development and use of fire spread models. However, given the chaotic nature of environmental systems, it is very difficult to develop real-time fire-extinguishing models. This article proposes a method of optimizing the performance of wildfire fighting resources such that losses are kept to a minimum. The optimization procedure includes discrete simulation algorithms and Bayesian optimization methods for discrete and continuous problems (simulated annealing and Bayesian global optimization). Fast calculus algorithms are applied to provide optimization outcomes in short periods of time such that the predictions of the model and the real behavior of the fire, combat resources, and meteorological conditions are similar. In addition, adaptive algorithms take into account the chaotic behavior of wildfire so that the system can be updated with data corresponding to the real situation to obtain a new optimum solution. The application of this method to the Northwest Forest of Madrid (Spain) is also described. This application allowed us to check that it is a helpful tool in the decision-making process.
Optimization of the Resources Management in Fighting Wildfires
NASA Astrophysics Data System (ADS)
Martin-Fernández, Susana; Martínez-Falero, Eugenio; Pérez-González, J. Manuel
2002-09-01
Wildfires lead to important economic, social, and environmental losses, especially in areas of Mediterranean climate where they are of a high intensity and frequency. Over the past 30 years there has been a dramatic surge in the development and use of fire spread models. However, given the chaotic nature of environmental systems, it is very difficult to develop real-time fire-extinguishing models. This article proposes a method of optimizing the performance of wildfire fighting resources such that losses are kept to a minimum. The optimization procedure includes discrete simulation algorithms and Bayesian optimization methods for discrete and continuous problems (simulated annealing and Bayesian global optimization). Fast calculus algorithms are applied to provide optimization outcomes in short periods of time such that the predictions of the model and the real behavior of the fire, combat resources, and meteorological conditions are similar. In addition, adaptive algorithms take into account the chaotic behavior of wildfire so that the system can be updated with data corresponding to the real situation to obtain a new optimum solution. The application of this method to the Northwest Forest of Madrid (Spain) is also described. This application allowed us to check that it is a helpful tool in the decision-making process.
User-customized brain computer interfaces using Bayesian optimization
NASA Astrophysics Data System (ADS)
Bashashati, Hossein; Ward, Rabab K.; Bashashati, Ali
2016-04-01
Objective. The brain characteristics of different people are not the same. Brain computer interfaces (BCIs) should thus be customized for each individual person. In motor-imagery based synchronous BCIs, a number of parameters (referred to as hyper-parameters) including the EEG frequency bands, the channels and the time intervals from which the features are extracted should be pre-determined based on each subject’s brain characteristics. Approach. To determine the hyper-parameter values, previous work has relied on manual or semi-automatic methods that are not applicable to high-dimensional search spaces. In this paper, we propose a fully automatic, scalable and computationally inexpensive algorithm that uses Bayesian optimization to tune these hyper-parameters. We then build different classifiers trained on the sets of hyper-parameter values proposed by the Bayesian optimization. A final classifier aggregates the results of the different classifiers. Main Results. We have applied our method to 21 subjects from three BCI competition datasets. We have conducted rigorous statistical tests, and have shown the positive impact of hyper-parameter optimization in improving the accuracy of BCIs. Furthermore, We have compared our results to those reported in the literature. Significance. Unlike the best reported results in the literature, which are based on more sophisticated feature extraction and classification methods, and rely on prestudies to determine the hyper-parameter values, our method has the advantage of being fully automated, uses less sophisticated feature extraction and classification methods, and yields similar or superior results compared to the best performing designs in the literature.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.
Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D
2014-09-01
The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Dyvorne, Hadrien A; Galea, Nicola; Nevers, Thomas; Fiel, M Isabel; Carpenter, David; Wong, Edmund; Orton, Matthew; de Oliveira, Andre; Feiweier, Thorsten; Vachon, Marie-Louise; Babb, James S; Taouli, Bachir
2013-03-01
To optimize intravoxel incoherent motion (IVIM) diffusion-weighted (DW) imaging by estimating the effects of diffusion gradient polarity and breathing acquisition scheme on image quality, signal-to-noise ratio (SNR), IVIM parameters, and parameter reproducibility, as well as to investigate the potential of IVIM in the detection of hepatic fibrosis. In this institutional review board-approved prospective study, 20 subjects (seven healthy volunteers, 13 patients with hepatitis C virus infection; 14 men, six women; mean age, 46 years) underwent IVIM DW imaging with four sequences: (a) respiratory-triggered (RT) bipolar (BP) sequence, (b) RT monopolar (MP) sequence, (c) free-breathing (FB) BP sequence, and (d) FB MP sequence. Image quality scores were assessed for all sequences. A biexponential analysis with the Bayesian method yielded true diffusion coefficient (D), pseudodiffusion coefficient (D*), and perfusion fraction (PF) in liver parenchyma. Mixed-model analysis of variance was used to compare image quality, SNR, IVIM parameters, and interexamination variability between the four sequences, as well as the ability to differentiate areas of liver fibrosis from normal liver tissue. Image quality with RT sequences was superior to that with FB acquisitions (P = .02) and was not affected by gradient polarity. SNR did not vary significantly between sequences. IVIM parameter reproducibility was moderate to excellent for PF and D, while it was less reproducible for D*. PF and D were both significantly lower in patients with hepatitis C virus than in healthy volunteers with the RT BP sequence (PF = 13.5% ± 5.3 [standard deviation] vs 9.2% ± 2.5, P = .038; D = [1.16 ± 0.07] × 10(-3) mm(2)/sec vs [1.03 ± 0.1] × 10(-3) mm(2)/sec, P = .006). The RT BP DW imaging sequence had the best results in terms of image quality, reproducibility, and ability to discriminate between healthy and fibrotic liver with biexponential fitting.
Simultaneous Optimization of Decisions Using a Linear Utility Function.
ERIC Educational Resources Information Center
Vos, Hans J.
1990-01-01
An approach is presented to simultaneously optimize decision rules for combinations of elementary decisions through a framework derived from Bayesian decision theory. The developed linear utility model for selection-mastery decisions was applied to a sample of 43 first year medical students to illustrate the procedure. (SLD)
Optimal Predictions in Everyday Cognition: The Wisdom of Individuals or Crowds?
ERIC Educational Resources Information Center
Mozer, Michael C.; Pashler, Harold; Homaei, Hadjar
2008-01-01
Griffiths and Tenenbaum (2006) asked individuals to make predictions about the duration or extent of everyday events (e.g., cake baking times), and reported that predictions were optimal, employing Bayesian inference based on veridical prior distributions. Although the predictions conformed strikingly to statistics of the world, they reflect…
A Rational Analysis of the Selection Task as Optimal Data Selection.
ERIC Educational Resources Information Center
Oaksford, Mike; Chater, Nick
1994-01-01
Experimental data on human reasoning in hypothesis-testing tasks is reassessed in light of a Bayesian model of optimal data selection in inductive hypothesis testing. The rational analysis provided by the model suggests that reasoning in such tasks may be rational rather than subject to systematic bias. (SLD)
Zhao, Wei; Cella, Massimo; Della Pasqua, Oscar; Burger, David; Jacqz-Aigrain, Evelyne
2012-01-01
AIMS To develop a population pharmacokinetic model for abacavir in HIV-infected infants and toddlers, which will be used to describe both once and twice daily pharmacokinetic profiles, identify covariates that explain variability and propose optimal time points to optimize the area under the concentration–time curve (AUC) targeted dosage and individualize therapy. METHODS The pharmacokinetics of abacavir was described with plasma concentrations from 23 patients using nonlinear mixed-effects modelling (NONMEM) software. A two-compartment model with first-order absorption and elimination was developed. The final model was validated using bootstrap, visual predictive check and normalized prediction distribution errors. The Bayesian estimator was validated using the cross-validation and simulation–estimation method. RESULTS The typical population pharmacokinetic parameters and relative standard errors (RSE) were apparent systemic clearance (CL) 13.4 l h−1 (RSE 6.3%), apparent central volume of distribution 4.94 l (RSE 28.7%), apparent peripheral volume of distribution 8.12 l (RSE14.2%), apparent intercompartment clearance 1.25 l h−1 (RSE 16.9%) and absorption rate constant 0.758 h−1 (RSE 5.8%). The covariate analysis identified weight as the individual factor influencing the apparent oral clearance: CL = 13.4 × (weight/12)1.14. The maximum a posteriori probability Bayesian estimator, based on three concentrations measured at 0, 1 or 2, and 3 h after drug intake allowed predicting individual AUC0–t. CONCLUSIONS The population pharmacokinetic model developed for abacavir in HIV-infected infants and toddlers accurately described both once and twice daily pharmacokinetic profiles. The maximum a posteriori probability Bayesian estimator of AUC0–t was developed from the final model and can be used routinely to optimize individual dosing. PMID:21988586
The trade-off between morphology and control in the co-optimized design of robots.
Rosendo, Andre; von Atzigen, Marco; Iida, Fumiya
2017-01-01
Conventionally, robot morphologies are developed through simulations and calculations, and different control methods are applied afterwards. Assuming that simulations and predictions are simplified representations of our reality, how sure can roboticists be that the chosen morphology is the most adequate for the possible control choices in the real-world? Here we study the influence of the design parameters in the creation of a robot with a Bayesian morphology-control (MC) co-optimization process. A robot autonomously creates child robots from a set of possible design parameters and uses Bayesian Optimization (BO) to infer the best locomotion behavior from real world experiments. Then, we systematically change from an MC co-optimization to a control-only (C) optimization, which better represents the traditional way that robots are developed, to explore the trade-off between these two methods. We show that although C processes can greatly improve the behavior of poor morphologies, such agents are still outperformed by MC co-optimization results with as few as 25 iterations. Our findings, on one hand, suggest that BO should be used in the design process of robots for both morphological and control parameters to reach optimal performance, and on the other hand, point to the downfall of current design methods in face of new search techniques.
The trade-off between morphology and control in the co-optimized design of robots
Iida, Fumiya
2017-01-01
Conventionally, robot morphologies are developed through simulations and calculations, and different control methods are applied afterwards. Assuming that simulations and predictions are simplified representations of our reality, how sure can roboticists be that the chosen morphology is the most adequate for the possible control choices in the real-world? Here we study the influence of the design parameters in the creation of a robot with a Bayesian morphology-control (MC) co-optimization process. A robot autonomously creates child robots from a set of possible design parameters and uses Bayesian Optimization (BO) to infer the best locomotion behavior from real world experiments. Then, we systematically change from an MC co-optimization to a control-only (C) optimization, which better represents the traditional way that robots are developed, to explore the trade-off between these two methods. We show that although C processes can greatly improve the behavior of poor morphologies, such agents are still outperformed by MC co-optimization results with as few as 25 iterations. Our findings, on one hand, suggest that BO should be used in the design process of robots for both morphological and control parameters to reach optimal performance, and on the other hand, point to the downfall of current design methods in face of new search techniques. PMID:29023482
Spike-Based Bayesian-Hebbian Learning of Temporal Sequences
Lindén, Henrik; Lansner, Anders
2016-01-01
Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model’s feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx). We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison. PMID:27213810
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
A Bayesian hierarchical diffusion model decomposition of performance in Approach–Avoidance Tasks
Krypotos, Angelos-Miltiadis; Beckers, Tom; Kindt, Merel; Wagenmakers, Eric-Jan
2015-01-01
Common methods for analysing response time (RT) tasks, frequently used across different disciplines of psychology, suffer from a number of limitations such as the failure to directly measure the underlying latent processes of interest and the inability to take into account the uncertainty associated with each individual's point estimate of performance. Here, we discuss a Bayesian hierarchical diffusion model and apply it to RT data. This model allows researchers to decompose performance into meaningful psychological processes and to account optimally for individual differences and commonalities, even with relatively sparse data. We highlight the advantages of the Bayesian hierarchical diffusion model decomposition by applying it to performance on Approach–Avoidance Tasks, widely used in the emotion and psychopathology literature. Model fits for two experimental data-sets demonstrate that the model performs well. The Bayesian hierarchical diffusion model overcomes important limitations of current analysis procedures and provides deeper insight in latent psychological processes of interest. PMID:25491372
Inferring Phylogenetic Networks Using PhyloNet.
Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay
2018-07-01
PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.
Quantitative trait nucleotide analysis using Bayesian model selection.
Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D
2005-10-01
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.
NASA Astrophysics Data System (ADS)
Jeffs, Brian D.; Christou, Julian C.
1998-09-01
This paper addresses post processing for resolution enhancement of sequences of short exposure adaptive optics (AO) images of space objects. The unknown residual blur is removed using Bayesian maximum a posteriori blind image restoration techniques. In the problem formulation, both the true image and the unknown blur psf's are represented by the flexible generalized Gaussian Markov random field (GGMRF) model. The GGMRF probability density function provides a natural mechanism for expressing available prior information about the image and blur. Incorporating such prior knowledge in the deconvolution optimization is crucial for the success of blind restoration algorithms. For example, space objects often contain sharp edge boundaries and geometric structures, while the residual blur psf in the corresponding partially corrected AO image is spectrally band limited, and exhibits while the residual blur psf in the corresponding partially corrected AO image is spectrally band limited, and exhibits smoothed, random , texture-like features on a peaked central core. By properly choosing parameters, GGMRF models can accurately represent both the blur psf and the object, and serve to regularize the deconvolution problem. These two GGMRF models also serve as discriminator functions to separate blur and object in the solution. Algorithm performance is demonstrated with examples from synthetic AO images. Results indicate significant resolution enhancement when applied to partially corrected AO images. An efficient computational algorithm is described.
Sequential Bayesian geoacoustic inversion for mobile and compact source-receiver configuration.
Carrière, Olivier; Hermand, Jean-Pierre
2012-04-01
Geoacoustic characterization of wide areas through inversion requires easily deployable configurations including free-drifting platforms, underwater gliders and autonomous vehicles, typically performing repeated transmissions during their course. In this paper, the inverse problem is formulated as sequential Bayesian filtering to take advantage of repeated transmission measurements. Nonlinear Kalman filters implement a random-walk model for geometry and environment and an acoustic propagation code in the measurement model. Data from MREA/BP07 sea trials are tested consisting of multitone and frequency-modulated signals (bands: 0.25-0.8 and 0.8-1.6 kHz) received on a shallow vertical array of four hydrophones 5-m spaced drifting over 0.7-1.6 km range. Space- and time-coherent processing are applied to the respective signal types. Kalman filter outputs are compared to a sequence of global optimizations performed independently on each received signal. For both signal types, the sequential approach is more accurate but also more efficient. Due to frequency diversity, the processing of modulated signals produces a more stable tracking. Although an extended Kalman filter provides comparable estimates of the tracked parameters, the ensemble Kalman filter is necessary to properly assess uncertainty. In spite of mild range dependence and simplified bottom model, all tracked geoacoustic parameters are consistent with high-resolution seismic profiling, core logging P-wave velocity, and previous inversion results with fixed geometries.
Efficient Implementation of MrBayes on Multi-GPU
Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang
2013-01-01
MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters. PMID:23493260
Efficient implementation of MrBayes on multi-GPU.
Bao, Jie; Xia, Hongju; Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang
2013-06-01
MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)(3) (aMCMCMC) for MrBayes (MC)(3) on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)(3) achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)(3) is dramatically faster than all the previous (MC)(3) algorithms and scales well to large GPU clusters.
Ye, Qing; Pan, Hao; Liu, Changhua
2015-01-01
This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-01
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng
2015-01-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Zollanvari, Amin; Dougherty, Edward R
2016-12-01
In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.
Bayesian assessment of the expected data impact on prediction confidence in optimal sampling design
NASA Astrophysics Data System (ADS)
Leube, P. C.; Geiges, A.; Nowak, W.
2012-02-01
Incorporating hydro(geo)logical data, such as head and tracer data, into stochastic models of (subsurface) flow and transport helps to reduce prediction uncertainty. Because of financial limitations for investigation campaigns, information needs toward modeling or prediction goals should be satisfied efficiently and rationally. Optimal design techniques find the best one among a set of investigation strategies. They optimize the expected impact of data on prediction confidence or related objectives prior to data collection. We introduce a new optimal design method, called PreDIA(gnosis) (Preposterior Data Impact Assessor). PreDIA derives the relevant probability distributions and measures of data utility within a fully Bayesian, generalized, flexible, and accurate framework. It extends the bootstrap filter (BF) and related frameworks to optimal design by marginalizing utility measures over the yet unknown data values. PreDIA is a strictly formal information-processing scheme free of linearizations. It works with arbitrary simulation tools, provides full flexibility concerning measurement types (linear, nonlinear, direct, indirect), allows for any desired task-driven formulations, and can account for various sources of uncertainty (e.g., heterogeneity, geostatistical assumptions, boundary conditions, measurement values, model structure uncertainty, a large class of model errors) via Bayesian geostatistics and model averaging. Existing methods fail to simultaneously provide these crucial advantages, which our method buys at relatively higher-computational costs. We demonstrate the applicability and advantages of PreDIA over conventional linearized methods in a synthetic example of subsurface transport. In the example, we show that informative data is often invisible for linearized methods that confuse zero correlation with statistical independence. Hence, PreDIA will often lead to substantially better sampling designs. Finally, we extend our example to specifically highlight the consideration of conceptual model uncertainty.
Jones, Matt; Love, Bradley C
2011-08-01
The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls that have plagued previous theoretical movements.
Ned B. Klopfenstein; Jane E. Stewart; Yuko Ota; John W. Hanna; Bryce A. Richardson; Amy L. Ross-Davis; Ruben D. Elias-Roman; Kari Korhonen; Nenad Keca; Eugenia Iturritxa; Dionicio Alvarado-Rosales; Halvor Solheim; Nicholas J. Brazee; Piotr Lakomy; Michelle R. Cleary; Eri Hasegawa; Taisei Kikuchi; Fortunato Garza-Ocanas; Panaghiotis Tsopelas; Daniel Rigling; Simone Prospero; Tetyana Tsykun; Jean A. Berube; Franck O. P. Stefani; Saeideh Jafarpour; Vladimir Antonin; Michal Tomsovsky; Geral I. McDonald; Stephen Woodward; Mee-Sook Kim
2017-01-01
Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequenceâbased analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation...
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
2015-07-01
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data Guy Van den Broeck∗ and Karthika Mohan∗ and Arthur Choi and Adnan ...notwithstanding any other provision of law , no person shall be subject to a penalty for failing to comply with a collection of information if it does...Wasserman, L. (2011). All of Statistics. Springer Science & Business Media. Yaramakala, S., & Margaritis, D. (2005). Speculative markov blanket discovery for optimal feature selection. In Proceedings of ICDM.
Dynamical foundations of the neural circuit for bayesian decision making.
Morita, Kenji
2009-07-01
On the basis of accumulating behavioral and neural evidences, it has recently been proposed that the brain neural circuits of humans and animals are equipped with several specific properties, which ensure that perceptual decision making implemented by the circuits can be nearly optimal in terms of Bayesian inference. Here, I introduce the basic ideas of such a proposal and discuss its implications from the standpoint of biophysical modeling developed in the framework of dynamical systems.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Dynamic Bayesian wavelet transform: New methodology for extraction of repetitive transients
NASA Astrophysics Data System (ADS)
Wang, Dong; Tsui, Kwok-Leung
2017-05-01
Thanks to some recent research works, dynamic Bayesian wavelet transform as new methodology for extraction of repetitive transients is proposed in this short communication to reveal fault signatures hidden in rotating machine. The main idea of the dynamic Bayesian wavelet transform is to iteratively estimate posterior parameters of wavelet transform via artificial observations and dynamic Bayesian inference. First, a prior wavelet parameter distribution can be established by one of many fast detection algorithms, such as the fast kurtogram, the improved kurtogram, the enhanced kurtogram, the sparsogram, the infogram, continuous wavelet transform, discrete wavelet transform, wavelet packets, multiwavelets, empirical wavelet transform, empirical mode decomposition, local mean decomposition, etc.. Second, artificial observations can be constructed based on one of many metrics, such as kurtosis, the sparsity measurement, entropy, approximate entropy, the smoothness index, a synthesized criterion, etc., which are able to quantify repetitive transients. Finally, given artificial observations, the prior wavelet parameter distribution can be posteriorly updated over iterations by using dynamic Bayesian inference. More importantly, the proposed new methodology can be extended to establish the optimal parameters required by many other signal processing methods for extraction of repetitive transients.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
The anatomy of choice: dopamine and decision-making
Friston, Karl; Schwartenbeck, Philipp; FitzGerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J.
2014-01-01
This paper considers goal-directed decision-making in terms of embodied or active inference. We associate bounded rationality with approximate Bayesian inference that optimizes a free energy bound on model evidence. Several constructs such as expected utility, exploration or novelty bonuses, softmax choice rules and optimism bias emerge as natural consequences of free energy minimization. Previous accounts of active inference have focused on predictive coding. In this paper, we consider variational Bayes as a scheme that the brain might use for approximate Bayesian inference. This scheme provides formal constraints on the computational anatomy of inference and action, which appear to be remarkably consistent with neuroanatomy. Active inference contextualizes optimal decision theory within embodied inference, where goals become prior beliefs. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (associated with softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution. Crucially, this sensitivity corresponds to the precision of beliefs about behaviour. The changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses—and they may provide a new perspective on the role of dopamine in assimilating reward prediction errors to optimize decision-making. PMID:25267823
Doubly Bayesian Analysis of Confidence in Perceptual Decision-Making.
Aitchison, Laurence; Bang, Dan; Bahrami, Bahador; Latham, Peter E
2015-10-01
Humans stand out from other animals in that they are able to explicitly report on the reliability of their internal operations. This ability, which is known as metacognition, is typically studied by asking people to report their confidence in the correctness of some decision. However, the computations underlying confidence reports remain unclear. In this paper, we present a fully Bayesian method for directly comparing models of confidence. Using a visual two-interval forced-choice task, we tested whether confidence reports reflect heuristic computations (e.g. the magnitude of sensory data) or Bayes optimal ones (i.e. how likely a decision is to be correct given the sensory data). In a standard design in which subjects were first asked to make a decision, and only then gave their confidence, subjects were mostly Bayes optimal. In contrast, in a less-commonly used design in which subjects indicated their confidence and decision simultaneously, they were roughly equally likely to use the Bayes optimal strategy or to use a heuristic but suboptimal strategy. Our results suggest that, while people's confidence reports can reflect Bayes optimal computations, even a small unusual twist or additional element of complexity can prevent optimality.
The anatomy of choice: dopamine and decision-making.
Friston, Karl; Schwartenbeck, Philipp; FitzGerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J
2014-11-05
This paper considers goal-directed decision-making in terms of embodied or active inference. We associate bounded rationality with approximate Bayesian inference that optimizes a free energy bound on model evidence. Several constructs such as expected utility, exploration or novelty bonuses, softmax choice rules and optimism bias emerge as natural consequences of free energy minimization. Previous accounts of active inference have focused on predictive coding. In this paper, we consider variational Bayes as a scheme that the brain might use for approximate Bayesian inference. This scheme provides formal constraints on the computational anatomy of inference and action, which appear to be remarkably consistent with neuroanatomy. Active inference contextualizes optimal decision theory within embodied inference, where goals become prior beliefs. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (associated with softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution. Crucially, this sensitivity corresponds to the precision of beliefs about behaviour. The changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses-and they may provide a new perspective on the role of dopamine in assimilating reward prediction errors to optimize decision-making.
NASA Astrophysics Data System (ADS)
Feyen, Luc; Gorelick, Steven M.
2005-03-01
We propose a framework that combines simulation optimization with Bayesian decision analysis to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas. A stochastic simulation optimization management model is employed to plan regionally distributed groundwater pumping while preserving the hydroecological balance in wetland areas. Because predictions made by an aquifer model are uncertain, groundwater supply systems operate below maximum yield. Collecting data from the groundwater system can potentially reduce predictive uncertainty and increase safe water production. The price paid for improvement in water management is the cost of collecting the additional data. Efficient data collection using Bayesian decision analysis proceeds in three stages: (1) The prior analysis determines the optimal pumping scheme and profit from water sales on the basis of known information. (2) The preposterior analysis estimates the optimal measurement locations and evaluates whether each sequential measurement will be cost-effective before it is taken. (3) The posterior analysis then revises the prior optimal pumping scheme and consequent profit, given the new information. Stochastic simulation optimization employing a multiple-realization approach is used to determine the optimal pumping scheme in each of the three stages. The cost of new data must not exceed the expected increase in benefit obtained in optimal groundwater exploitation. An example based on groundwater management practices in Florida aimed at wetland protection showed that the cost of data collection more than paid for itself by enabling a safe and reliable increase in production.
Lehikoinen, Annukka; Luoma, Emilia; Mäntyniemi, Samu; Kuikka, Sakari
2013-02-19
Oil transport has greatly increased in the Gulf of Finland over the years, and risks of an oil accident occurring have risen. Thus, an effective oil combating strategy is needed. We developed a Bayesian Network (BN) to examine the recovery efficiency and optimal disposition of the Finnish oil combating vessels in the Gulf of Finland (GoF), Eastern Baltic Sea. Four alternative home harbors, five accident points, and ten oil combating vessels were included in the model to find the optimal disposition policy that would maximize the recovery efficiency. With this composition, the placement of the oil combating vessels seems not to have a significant effect on the recovery efficiency. The process seems to be strongly controlled by certain random factors independent of human action, e.g. wave height and stranding time of the oil. Therefore, the success of oil combating is rather uncertain, so it is also important to develop activities that aim for preventing accidents. We found that the model developed is suitable for this type of multidecision optimization. The methodology, results, and practices are further discussed.
Exploiting range imagery: techniques and applications
NASA Astrophysics Data System (ADS)
Armbruster, Walter
2009-07-01
Practically no applications exist for which automatic processing of 2D intensity imagery can equal human visual perception. This is not the case for range imagery. The paper gives examples of 3D laser radar applications, for which automatic data processing can exceed human visual cognition capabilities and describes basic processing techniques for attaining these results. The examples are drawn from the fields of helicopter obstacle avoidance, object detection in surveillance applications, object recognition at high range, multi-object-tracking, and object re-identification in range image sequences. Processing times and recognition performances are summarized. The techniques used exploit the bijective continuity of the imaging process as well as its independence of object reflectivity, emissivity and illumination. This allows precise formulations of the probability distributions involved in figure-ground segmentation, feature-based object classification and model based object recognition. The probabilistic approach guarantees optimal solutions for single images and enables Bayesian learning in range image sequences. Finally, due to recent results in 3D-surface completion, no prior model libraries are required for recognizing and re-identifying objects of quite general object categories, opening the way to unsupervised learning and fully autonomous cognitive systems.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Traffic Video Image Segmentation Model Based on Bayesian and Spatio-Temporal Markov Random Field
NASA Astrophysics Data System (ADS)
Zhou, Jun; Bao, Xu; Li, Dawei; Yin, Yongwen
2017-10-01
Traffic video image is a kind of dynamic image and its background and foreground is changed at any time, which results in the occlusion. In this case, using the general method is more difficult to get accurate image segmentation. A segmentation algorithm based on Bayesian and Spatio-Temporal Markov Random Field is put forward, which respectively build the energy function model of observation field and label field to motion sequence image with Markov property, then according to Bayesian' rule, use the interaction of label field and observation field, that is the relationship of label field’s prior probability and observation field’s likelihood probability, get the maximum posterior probability of label field’s estimation parameter, use the ICM model to extract the motion object, consequently the process of segmentation is finished. Finally, the segmentation methods of ST - MRF and the Bayesian combined with ST - MRF were analyzed. Experimental results: the segmentation time in Bayesian combined with ST-MRF algorithm is shorter than in ST-MRF, and the computing workload is small, especially in the heavy traffic dynamic scenes the method also can achieve better segmentation effect.
Enhancements of Bayesian Blocks; Application to Large Light Curve Databases
NASA Technical Reports Server (NTRS)
Scargle, Jeff
2015-01-01
Bayesian Blocks are optimal piecewise linear representations (step function fits) of light-curves. The simple algorithm implementing this idea, using dynamic programming, has been extended to include more data modes and fitness metrics, multivariate analysis, and data on the circle (Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations, Scargle, Norris, Jackson and Chiang 2013, ApJ, 764, 167), as well as new results on background subtraction and refinement of the procedure for precise timing of transient events in sparse data. Example demonstrations will include exploratory analysis of the Kepler light curve archive in a search for "star-tickling" signals from extraterrestrial civilizations. (The Cepheid Galactic Internet, Learned, Kudritzki, Pakvasa1, and Zee, 2008, arXiv: 0809.0339; Walkowicz et al., in progress).
Attention in the predictive mind.
Ransom, Madeleine; Fazelpour, Sina; Mole, Christopher
2017-01-01
It has recently become popular to suggest that cognition can be explained as a process of Bayesian prediction error minimization. Some advocates of this view propose that attention should be understood as the optimization of expected precisions in the prediction-error signal (Clark, 2013, 2016; Feldman & Friston, 2010; Hohwy, 2012, 2013). This proposal successfully accounts for several attention-related phenomena. We claim that it cannot account for all of them, since there are certain forms of voluntary attention that it cannot accommodate. We therefore suggest that, although the theory of Bayesian prediction error minimization introduces some powerful tools for the explanation of mental phenomena, its advocates have been wrong to claim that Bayesian prediction error minimization is 'all the brain ever does'. Copyright © 2016 Elsevier Inc. All rights reserved.
PyClone: statistical inference of clonal population structure in cancer.
Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2014-04-01
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
Bayesian estimation of post-Messinian divergence times in Balearic Island lizards.
Brown, R P; Terrasa, B; Pérez-Mellado, V; Castro, J A; Hoskisson, P A; Picornell, A; Ramon, M M
2008-07-01
Phylogenetic relationships and timings of major cladogenesis events are investigated in the Balearic Island lizards Podarcislilfordi and P.pityusensis using 2675bp of mitochondrial and nuclear DNA sequences. Partitioned Bayesian and Maximum Parsimony analyses provided a well-resolved phylogeny with high node-support values. Bayesian MCMC estimation of node dates was investigated by comparing means of posterior distributions from different subsets of the sequence against the most robust analysis which used multiple partitions and allowed for rate heterogeneity among branches under a rate-drift model. Evolutionary rates were systematically underestimated and thus divergence times overestimated when sequences containing lower numbers of variable sites were used (based on ingroup node constraints). The following analyses allowed the best recovery of node times under the constant-rate (i.e., perfect clock) model: (i) all cytochrome b sequence (partitioned by codon position), (ii) cytochrome b (codon position 3 alone), (iii) NADH dehydrogenase (subunits 1 and 2; partitioned by codon position), (iv) cytochrome b and NADH dehydrogenase sequence together (six gene-codon partitions), (v) all unpartitioned sequence, (vi) a full multipartition analysis (nine partitions). Of these, only (iv) and (vi) performed well under the rate-drift model. These findings have significant implications for dating of recent divergence times in other taxa. The earliest P.lilfordi cladogenesis event (divergence of Menorcan populations), occurred before the end of the Pliocene, some 2.6Ma. Subsequent events led to a West Mallorcan lineage (2.0Ma ago), followed 1.2Ma ago by divergence of populations from the southern part of the Cabrera archipelago from a widely-distributed group from north Cabrera, northern and southern Mallorcan islets. Divergence within P.pityusensis is more recent with the main Ibiza and Formentera clades sharing a common ancestor at about 1.0Ma ago. Climatic and sea level changes are likely to have initiated cladogenesis, with lineages making secondary contact during periodic landbridge formation. This oscillating cross-archipelago pattern in which ancient divergence is followed by repeated contact resembles that seen between East-West refugia populations from mainland Europe.
Bayesian Tracking of Emerging Epidemics Using Ensemble Optimal Statistical Interpolation
Cobb, Loren; Krishnamurthy, Ashok; Mandel, Jan; Beezley, Jonathan D.
2014-01-01
We present a preliminary test of the Ensemble Optimal Statistical Interpolation (EnOSI) method for the statistical tracking of an emerging epidemic, with a comparison to its popular relative for Bayesian data assimilation, the Ensemble Kalman Filter (EnKF). The spatial data for this test was generated by a spatial susceptible-infectious-removed (S-I-R) epidemic model of an airborne infectious disease. Both tracking methods in this test employed Poisson rather than Gaussian noise, so as to handle epidemic data more accurately. The EnOSI and EnKF tracking methods worked well on the main body of the simulated spatial epidemic, but the EnOSI was able to detect and track a distant secondary focus of infection that the EnKF missed entirely. PMID:25113590
Ortega, Alonso; Labrenz, Stephan; Markowitsch, Hans J; Piefke, Martina
2013-01-01
In the last decade, different statistical techniques have been introduced to improve assessment of malingering-related poor effort. In this context, we have recently shown preliminary evidence that a Bayesian latent group model may help to optimize classification accuracy using a simulation research design. In the present study, we conducted two analyses. Firstly, we evaluated how accurately this Bayesian approach can distinguish between participants answering in an honest way (honest response group) and participants feigning cognitive impairment (experimental malingering group). Secondly, we tested the accuracy of our model in the differentiation between patients who had real cognitive deficits (cognitively impaired group) and participants who belonged to the experimental malingering group. All Bayesian analyses were conducted using the raw scores of a visual recognition forced-choice task (2AFC), the Test of Memory Malingering (TOMM, Trial 2), and the Word Memory Test (WMT, primary effort subtests). The first analysis showed 100% accuracy for the Bayesian model in distinguishing participants of both groups with all effort measures. The second analysis showed outstanding overall accuracy of the Bayesian model when estimates were obtained from the 2AFC and the TOMM raw scores. Diagnostic accuracy of the Bayesian model diminished when using the WMT total raw scores. Despite, overall diagnostic accuracy can still be considered excellent. The most plausible explanation for this decrement is the low performance in verbal recognition and fluency tasks of some patients of the cognitively impaired group. Additionally, the Bayesian model provides individual estimates, p(zi |D), of examinees' effort levels. In conclusion, both high classification accuracy levels and Bayesian individual estimates of effort may be very useful for clinicians when assessing for effort in medico-legal settings.
How to deal with climate change uncertainty in the planning of engineering systems
NASA Astrophysics Data System (ADS)
Spackova, Olga; Dittes, Beatrice; Straub, Daniel
2016-04-01
The effect of extreme events such as floods on the infrastructure and built environment is associated with significant uncertainties: These include the uncertain effect of climate change, uncertainty on extreme event frequency estimation due to limited historic data and imperfect models, and, not least, uncertainty on future socio-economic developments, which determine the damage potential. One option for dealing with these uncertainties is the use of adaptable (flexible) infrastructure that can easily be adjusted in the future without excessive costs. The challenge is in quantifying the value of adaptability and in finding the optimal sequence of decision. Is it worth to build a (potentially more expensive) adaptable system that can be adjusted in the future depending on the future conditions? Or is it more cost-effective to make a conservative design without counting with the possible future changes to the system? What is the optimal timing of the decision to build/adjust the system? We develop a quantitative decision-support framework for evaluation of alternative infrastructure designs under uncertainties, which: • probabilistically models the uncertain future (trough a Bayesian approach) • includes the adaptability of the systems (the costs of future changes) • takes into account the fact that future decisions will be made under uncertainty as well (using pre-posterior decision analysis) • allows to identify the optimal capacity and optimal timing to build/adjust the infrastructure. Application of the decision framework will be demonstrated on an example of flood mitigation planning in Bavaria.
NASA Astrophysics Data System (ADS)
Zhang, Peng; Peng, Jing; Sims, S. Richard F.
2005-05-01
In ATR applications, each feature is a convolution of an image with a filter. It is important to use most discriminant features to produce compact representations. We propose two novel subspace methods for dimension reduction to address limitations associated with Fukunaga-Koontz Transform (FKT). The first method, Scatter-FKT, assumes that target is more homogeneous, while clutter can be anything other than target and anywhere. Thus, instead of estimating a clutter covariance matrix, Scatter-FKT computes a clutter scatter matrix that measures the spread of clutter from the target mean. We choose dimensions along which the difference in variation between target and clutter is most pronounced. When the target follows a Gaussian distribution, Scatter-FKT can be viewed as a generalization of FKT. The second method, Optimal Bayesian Subspace, is derived from the optimal Bayesian classifier. It selects dimensions such that the minimum Bayes error rate can be achieved. When both target and clutter follow Gaussian distributions, OBS computes optimal subspace representations. We compare our methods against FKT using character image as well as IR data.
Patel, Nitin R; Ankolekar, Suresh
2007-11-30
Classical approaches to clinical trial design ignore economic factors that determine economic viability of a new drug. We address the choice of sample size in Phase III trials as a decision theory problem using a hybrid approach that takes a Bayesian view from the perspective of a drug company and a classical Neyman-Pearson view from the perspective of regulatory authorities. We incorporate relevant economic factors in the analysis to determine the optimal sample size to maximize the expected profit for the company. We extend the analysis to account for risk by using a 'satisficing' objective function that maximizes the chance of meeting a management-specified target level of profit. We extend the models for single drugs to a portfolio of clinical trials and optimize the sample sizes to maximize the expected profit subject to budget constraints. Further, we address the portfolio risk and optimize the sample sizes to maximize the probability of achieving a given target of expected profit.
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno
2012-01-01
Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Feldman, Sanford H; Ntenda, Abraham M
2011-01-01
We used high-fidelity PCR to amplify 2 overlapping regions of the ribosomal gene complex from the rodent fur mite Myobia musculi. The amplicons encompassed a large portion of the mite's ribosomal gene complex spanning 3128 nucleotides containing the entire 18S rRNA, internal transcribed spacer (ITS) 1, 5.8S rRNA, ITS2, and a portion of the 5′-end of the 28S rRNA. M. musculi’s 179-nucleotide 5.8S rRNA nucleotide sequence was not conserved, so this region was identified by conservation of rRNA secondary structure. Maximum likelihood and Bayesian inference phylogenetic analyses were performed by using multiple sequence alignment consisting of 1524 nucleotides of M. musculi 18S rRNA and homologous sequences from 42 prostigmatid mites and the tick Dermacentor andersoni. The phylograms produced by both methods were in agreement regarding terminal, secondary, and some tertiary phylogenetic relationships among mites. Bayesian inference discriminated most infraordinal relationships between Eleutherengona and Parasitengona mites in the suborder Anystina. Basal relationships between suborders Anystina and Eupodina historically determined by comparing differences in anatomic characteristics were less well-supported by our molecular analysis. Our results recapitulated similar 18S rRNA sequence analyses recently reported. Our study supports M. musculi as belonging to the suborder Anystina, infraorder Eleutherenona, and superfamily Cheyletoidea. PMID:22330574
Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan
2011-09-01
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
A Compensatory Approach to Optimal Selection with Mastery Scores. Research Report 94-2.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Vos, Hans J.
This paper presents some Bayesian theories of simultaneous optimization of decision rules for test-based decisions. Simultaneous decision making arises when an institution has to make a series of selection, placement, or mastery decisions with respect to subjects from a population. An obvious example is the use of individualized instruction in…
A new Bayesian recursive technique for parameter estimation
NASA Astrophysics Data System (ADS)
Kaheil, Yasir H.; Gill, M. Kashif; McKee, Mac; Bastidas, Luis
2006-08-01
The performance of any model depends on how well its associated parameters are estimated. In the current application, a localized Bayesian recursive estimation (LOBARE) approach is devised for parameter estimation. The LOBARE methodology is an extension of the Bayesian recursive estimation (BARE) method. It is applied in this paper on two different types of models: an artificial intelligence (AI) model in the form of a support vector machine (SVM) application for forecasting soil moisture and a conceptual rainfall-runoff (CRR) model represented by the Sacramento soil moisture accounting (SAC-SMA) model. Support vector machines, based on statistical learning theory (SLT), represent the modeling task as a quadratic optimization problem and have already been used in various applications in hydrology. They require estimation of three parameters. SAC-SMA is a very well known model that estimates runoff. It has a 13-dimensional parameter space. In the LOBARE approach presented here, Bayesian inference is used in an iterative fashion to estimate the parameter space that will most likely enclose a best parameter set. This is done by narrowing the sampling space through updating the "parent" bounds based on their fitness. These bounds are actually the parameter sets that were selected by BARE runs on subspaces of the initial parameter space. The new approach results in faster convergence toward the optimal parameter set using minimum training/calibration data and fewer sets of parameter values. The efficacy of the localized methodology is also compared with the previously used BARE algorithm.
Winterton, Shaun L; Wiegmann, Brian M; Schlinger, Evert I
2007-06-01
The first formal analysis of phylogenetic relationships among small-headed flies (Acroceridae) is presented based on DNA sequence data from two ribosomal (16S and 28S) and two protein-encoding genes: carbomoylphosphate synthase (CPS) domain of CAD (i.e., rudimentary locus) and cytochrome oxidase I (COI). DNA sequences from 40 species in 22 genera of Acroceridae (representing all three subfamilies) were compared with outgroup exemplars from Nemestrinidae, Stratiomyidae, Tabanidae, and Xylophagidae. Parsimony and Bayesian simultaneous analyses of the full data set recover a well-resolved and strongly supported hypothesis of phylogenetic relationships for major lineages within the family. Molecular evidence supports the monophyly of traditionally recognised subfamilies Philopotinae and Panopinae, but Acrocerinae are polyphyletic. Panopinae, sometimes considered "primitive" based on morphology and host-use, are always placed in a more derived position in the current study. Furthermore, these data support emerging morphological evidence that the type genus Acrocera Meigen, and its sister genus Sphaerops, are atypical acrocerids, comprising a sister lineage to all other Acroceridae. Based on the phylogeny generated in the simultaneous analysis, historical divergence times were estimated using Bayesian methodology constrained with fossil data. These estimates indicate Acroceridae likely evolved during the late Triassic but did not diversify greatly until the Cretaceous.
A cost minimisation and Bayesian inference model predicts startle reflex modulation across species.
Bach, Dominik R
2015-04-07
In many species, rapid defensive reflexes are paramount to escaping acute danger. These reflexes are modulated by the state of the environment. This is exemplified in fear-potentiated startle, a more vigorous startle response during conditioned anticipation of an unrelated threatening event. Extant explanations of this phenomenon build on descriptive models of underlying psychological states, or neural processes. Yet, they fail to predict invigorated startle during reward anticipation and instructed attention, and do not explain why startle reflex modulation evolved. Here, we fill this lacuna by developing a normative cost minimisation model based on Bayesian optimality principles. This model predicts the observed pattern of startle modification by rewards, punishments, instructed attention, and several other states. Moreover, the mathematical formalism furnishes predictions that can be tested experimentally. Comparing the model with existing data suggests a specific neural implementation of the underlying computations which yields close approximations to the optimal solution under most circumstances. This analysis puts startle modification into the framework of Bayesian decision theory and predictive coding, and illustrates the importance of an adaptive perspective to interpret defensive behaviour across species. Copyright © 2015 The Author. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Gu, Xiaohui; Yang, Shaopu; Liu, Yongqiang; Hao, Rujiang
2018-06-01
Two most important signatures of repetitive transients in the vibration signals of a faulty rotating machine are impulsiveness and cyclostationarity. In the newly proposed infogram, the time-domain and frequency-domain spectral negentropy were put forward to characterize these two aspects, respectively. However, in extension of the infogram to Bayesian inference based optimal wavelet filtering, only one spectral negentropy was employed in identifying the informative frequency band. To overcome its drawback, a novel Pareto-based Bayesian approach was proposed in this paper. The Pareto optimal solutions which can simultaneously maximize the time-domain and frequency-domain spectral negentropy were utilized in estimating the posterior wavelet parameters distributions. Moreover, the relationship between the impulsive and cyclostationary signatures was established by the domination. It can help balance the contributions due to these two aspects other than simply synthesize by the average weight in the infogram. Three instance studies including simulated and experimental signals were investigated to illustrate the effectiveness of the proposed method by challenging different noises and interferences. In addition, some comparisons with the aforementioned peer methods were also conducted to show its superiority and robustness in extracting the repetitive transients.
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.
Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong
2015-11-01
The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.
Bayesian Retrieval of Complete Posterior PDFs of Oceanic Rain Rate From Microwave Observations
NASA Technical Reports Server (NTRS)
Chiu, J. Christine; Petty, Grant W.
2005-01-01
This paper presents a new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measurements Mission (TRMM) Microwave Imager (TMI) over the ocean, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes Theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance our understanding of theoretical benefits of the Bayesian approach, we have conducted sensitivity analyses based on two synthetic datasets for which the true conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak, due to saturation effects. It is also suggested that the choice of the estimators and the prior information are both crucial to the retrieval. In addition, the performance of our Bayesian algorithm is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.
Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method
Burger, Lukas; van Nimwegen, Erik
2008-01-01
Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381
Characterizing the Nash equilibria of three-player Bayesian quantum games
NASA Astrophysics Data System (ADS)
Solmeyer, Neal; Balu, Radhakrishnan
2017-05-01
Quantum games with incomplete information can be studied within a Bayesian framework. We analyze games quantized within the EWL framework [Eisert, Wilkens, and Lewenstein, Phys Rev. Lett. 83, 3077 (1999)]. We solve for the Nash equilibria of a variety of two-player quantum games and compare the results to the solutions of the corresponding classical games. We then analyze Bayesian games where there is uncertainty about the player types in two-player conflicting interest games. The solutions to the Bayesian games are found to have a phase diagram-like structure where different equilibria exist in different parameter regions, depending both on the amount of uncertainty and the degree of entanglement. We find that in games where a Pareto-optimal solution is not a Nash equilibrium, it is possible for the quantized game to have an advantage over the classical version. In addition, we analyze the behavior of the solutions as the strategy choices approach an unrestricted operation. We find that some games have a continuum of solutions, bounded by the solutions of a simpler restricted game. A deeper understanding of Bayesian quantum game theory could lead to novel quantum applications in a multi-agent setting.
A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.
Yu, Qingzhao; Zhu, Lin; Zhu, Han
2017-11-01
Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.
Adkison, Milo D.; Peterman, R.M.
1996-01-01
Bayesian methods have been proposed to estimate optimal escapement goals, using both knowledge about physical determinants of salmon productivity and stock-recruitment data. The Bayesian approach has several advantages over many traditional methods for estimating stock productivity: it allows integration of information from diverse sources and provides a framework for decision-making that takes into account uncertainty reflected in the data. However, results can be critically dependent on details of implementation of this approach. For instance, unintended and unwarranted confidence about stock-recruitment relationships can arise if the range of relationships examined is too narrow, if too few discrete alternatives are considered, or if data are contradictory. This unfounded confidence can result in a suboptimal choice of a spawning escapement goal.
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, J.; Zeng, L.
2013-12-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameter identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from indirect concentration measurements in identifying unknown source parameters such as the release time, strength and location. In this approach, the sampling location that gives the maximum relative entropy is selected as the optimal one. Once the sampling location is determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown source parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. Compared with the traditional optimal design, which is based on the Gaussian linear assumption, the method developed in this study can cope with arbitrary nonlinearity. It can be used to assist in groundwater monitor network design and identification of unknown contaminant sources. Contours of the expected information gain. The optimal observing location corresponds to the maximum value. Posterior marginal probability densities of unknown parameters, the thick solid black lines are for the designed location. For comparison, other 7 lines are for randomly chosen locations. The true values are denoted by vertical lines. It is obvious that the unknown parameters are estimated better with the desinged location.
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
P300 Chinese input system based on Bayesian LDA.
Jin, Jing; Allison, Brendan Z; Brunner, Clemens; Wang, Bei; Wang, Xingyu; Zhang, Jianhua; Neuper, Christa; Pfurtscheller, Gert
2010-02-01
A brain-computer interface (BCI) is a new communication channel between humans and computers that translates brain activity into recognizable command and control signals. Attended events can evoke P300 potentials in the electroencephalogram. Hence, the P300 has been used in BCI systems to spell, control cursors or robotic devices, and other tasks. This paper introduces a novel P300 BCI to communicate Chinese characters. To improve classification accuracy, an optimization algorithm (particle swarm optimization, PSO) is used for channel selection (i.e., identifying the best electrode configuration). The effects of different electrode configurations on classification accuracy were tested by Bayesian linear discriminant analysis offline. The offline results from 11 subjects show that this new P300 BCI can effectively communicate Chinese characters and that the features extracted from the electrodes obtained by PSO yield good performance.
2017-01-01
Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing—with its unique statistical properties—became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca. PMID:28817636
Mehrian, Mohammad; Guyot, Yann; Papantoniou, Ioannis; Olofsson, Simon; Sonnaert, Maarten; Misener, Ruth; Geris, Liesbet
2018-03-01
In regenerative medicine, computer models describing bioreactor processes can assist in designing optimal process conditions leading to robust and economically viable products. In this study, we started from a (3D) mechanistic model describing the growth of neotissue, comprised of cells, and extracellular matrix, in a perfusion bioreactor set-up influenced by the scaffold geometry, flow-induced shear stress, and a number of metabolic factors. Subsequently, we applied model reduction by reformulating the problem from a set of partial differential equations into a set of ordinary differential equations. Comparing the reduced model results to the mechanistic model results and to dedicated experimental results assesses the reduction step quality. The obtained homogenized model is 10 5 fold faster than the 3D version, allowing the application of rigorous optimization techniques. Bayesian optimization was applied to find the medium refreshment regime in terms of frequency and percentage of medium replaced that would maximize neotissue growth kinetics during 21 days of culture. The simulation results indicated that maximum neotissue growth will occur for a high frequency and medium replacement percentage, a finding that is corroborated by reports in the literature. This study demonstrates an in silico strategy for bioprocess optimization paying particular attention to the reduction of the associated computational cost. © 2017 Wiley Periodicals, Inc.
Bayesian image reconstruction - The pixon and optimal image modeling
NASA Technical Reports Server (NTRS)
Pina, R. K.; Puetter, R. C.
1993-01-01
In this paper we describe the optimal image model, maximum residual likelihood method (OptMRL) for image reconstruction. OptMRL is a Bayesian image reconstruction technique for removing point-spread function blurring. OptMRL uses both a goodness-of-fit criterion (GOF) and an 'image prior', i.e., a function which quantifies the a priori probability of the image. Unlike standard maximum entropy methods, which typically reconstruct the image on the data pixel grid, OptMRL varies the image model in order to find the optimal functional basis with which to represent the image. We show how an optimal basis for image representation can be selected and in doing so, develop the concept of the 'pixon' which is a generalized image cell from which this basis is constructed. By allowing both the image and the image representation to be variable, the OptMRL method greatly increases the volume of solution space over which the image is optimized. Hence the likelihood of the final reconstructed image is greatly increased. For the goodness-of-fit criterion, OptMRL uses the maximum residual likelihood probability distribution introduced previously by Pina and Puetter (1992). This GOF probability distribution, which is based on the spatial autocorrelation of the residuals, has the advantage that it ensures spatially uncorrelated image reconstruction residuals.
Demaio, Pablo H; Barfuss, Michael H J; Kiesling, Roberto; Till, Walter; Chiapella, Jorge O
2011-11-01
The South American genus Gymnocalycium (Cactoideae-Trichocereae) demonstrates how the sole use of morphological data in Cactaceae results in conflicts in assessing phylogeny, constructing a taxonomic system, and analyzing trends in the evolution of the genus. Molecular phylogenetic analysis was performed using parsimony and Bayesian methods on a 6195-bp data matrix of plastid DNA sequences (atpI-atpH, petL-psbE, trnK-matK, trnT-trnL-trnF) of 78 samples, including 52 species and infraspecific taxa representing all the subgenera of Gymnocalycium. We assessed morphological character evolution using likelihood methods to optimize characters on a Bayesian tree and to reconstruct possible ancestral states. The results of the phylogenetic study confirm the monophyly of the genus, while supporting overall the available infrageneric classification based on seed morphology. Analysis showed the subgenera Microsemineum and Macrosemineum to be polyphyletic and paraphyletic. Analysis of morphological characters showed a tendency toward reduction of stem size, reduction in quantity and hardiness of spines, increment of seed size, development of napiform roots, and change from juicy and colorful fruits to dry and green fruits. Gymnocalycium saglionis is the only species of Microsemineum and a new name is required to identify the clade including the remaining species of Microsemineum; we propose the name Scabrosemineum in agreement with seed morphology. Identifying morphological trends and environmental features allows for a better understanding of the events that might have influenced the diversification of the genus.
Bayesian networks in overlay recipe optimization
NASA Astrophysics Data System (ADS)
Binns, Lewis A.; Reynolds, Greg; Rigden, Timothy C.; Watkins, Stephen; Soroka, Andrew
2005-05-01
Currently, overlay measurements are characterized by "recipe", which defines both physical parameters such as focus, illumination et cetera, and also the software parameters such as algorithm to be used and regions of interest. Setting up these recipes requires both engineering time and wafer availability on an overlay tool, so reducing these requirements will result in higher tool productivity. One of the significant challenges to automating this process is that the parameters are highly and complexly correlated. At the same time, a high level of traceability and transparency is required in the recipe creation process, so a technique that maintains its decisions in terms of well defined physical parameters is desirable. Running time should be short, given the system (automatic recipe creation) is being implemented to reduce overheads. Finally, a failure of the system to determine acceptable parameters should be obvious, so a certainty metric is also desirable. The complex, nonlinear interactions make solution by an expert system difficult at best, especially in the verification of the resulting decision network. The transparency requirements tend to preclude classical neural networks and similar techniques. Genetic algorithms and other "global minimization" techniques require too much computational power (given system footprint and cost requirements). A Bayesian network, however, provides a solution to these requirements. Such a network, with appropriate priors, can be used during recipe creation / optimization not just to select a good set of parameters, but also to guide the direction of search, by evaluating the network state while only incomplete information is available. As a Bayesian network maintains an estimate of the probability distribution of nodal values, a maximum-entropy approach can be utilized to obtain a working recipe in a minimum or near-minimum number of steps. In this paper we discuss the potential use of a Bayesian network in such a capacity, reducing the amount of engineering intervention. We discuss the benefits of this approach, especially improved repeatability and traceability of the learning process, and quantification of uncertainty in decisions made. We also consider the problems associated with this approach, especially in detailed construction of network topology, validation of the Bayesian network and the recipes it generates, and issues arising from the integration of a Bayesian network with a complex multithreaded application; these primarily relate to maintaining Bayesian network and system architecture integrity.
Molecular diversity and evolutionary history of rabies virus strains circulating in the Balkans.
McElhinney, L M; Marston, D A; Freuling, C M; Cragg, W; Stankov, S; Lalosevic, D; Lalosevic, V; Müller, T; Fooks, A R
2011-09-01
Molecular studies of European classical rabies viruses (RABV) have revealed a number of geographically clustered lineages. To study the diversity of Balkan RABV, partial nucleoprotein (N) gene sequences were analysed from a unique panel of isolates (n = 210), collected from various hosts between 1972 and 2006. All of the Balkan isolates grouped within the European/Middle East Lineage, with the majority most closely related to East European strains. A number of RABV from Bosnia & Herzegovina and Montenegro, collected between 1986 and 2006, grouped with the West European strains, believed to be responsible for the rabies epizootic that spread throughout Europe in the latter half of the 20th Century. In contrast, no Serbian RABV belonged to this sublineage. However, a distinct group of Serbian fox RABV provided further evidence for the southwards wildlife-mediated movement of rabies from Hungary, Romania and Serbia into Bulgaria. To determine the optimal region for evolutionary analysis, partial, full and concatenated N-gene and glycoprotein (G) gene sequences were compared. Whilst both the divergence times and evolutionary rates were similar irrespective of genomic region, the 95 % highest probability density (HPD) limits were significantly reduced for full N-gene and concatenated NG-gene sequences compared with partial gene sequences. Bayesian coalescent analysis estimated the date of the most common recent ancestor of the Balkan RABV to be 1885 (95 % HPD, 1852-1913), and skyline plots suggested an expansion of the local viral population in 1980-1990, which coincides with the observed emergence of fox rabies in the region.
Multiple utility constrained multi-objective programs using Bayesian theory
NASA Astrophysics Data System (ADS)
Abbasian, Pooneh; Mahdavi-Amiri, Nezam; Fazlollahtabar, Hamed
2018-03-01
A utility function is an important tool for representing a DM's preference. We adjoin utility functions to multi-objective optimization problems. In current studies, usually one utility function is used for each objective function. Situations may arise for a goal to have multiple utility functions. Here, we consider a constrained multi-objective problem with each objective having multiple utility functions. We induce the probability of the utilities for each objective function using Bayesian theory. Illustrative examples considering dependence and independence of variables are worked through to demonstrate the usefulness of the proposed model.
Bayesian design of decision rules for failure detection
NASA Technical Reports Server (NTRS)
Chow, E. Y.; Willsky, A. S.
1984-01-01
The formulation of the decision making process of a failure detection algorithm as a Bayes sequential decision problem provides a simple conceptualization of the decision rule design problem. As the optimal Bayes rule is not computable, a methodology that is based on the Bayesian approach and aimed at a reduced computational requirement is developed for designing suboptimal rules. A numerical algorithm is constructed to facilitate the design and performance evaluation of these suboptimal rules. The result of applying this design methodology to an example shows that this approach is potentially a useful one.
Bayesian Dose-Response Modeling in Sparse Data
NASA Astrophysics Data System (ADS)
Kim, Steven B.
This book discusses Bayesian dose-response modeling in small samples applied to two different settings. The first setting is early phase clinical trials, and the second setting is toxicology studies in cancer risk assessment. In early phase clinical trials, experimental units are humans who are actual patients. Prior to a clinical trial, opinions from multiple subject area experts are generally more informative than the opinion of a single expert, but we may face a dilemma when they have disagreeing prior opinions. In this regard, we consider compromising the disagreement and compare two different approaches for making a decision. In addition to combining multiple opinions, we also address balancing two levels of ethics in early phase clinical trials. The first level is individual-level ethics which reflects the perspective of trial participants. The second level is population-level ethics which reflects the perspective of future patients. We extensively compare two existing statistical methods which focus on each perspective and propose a new method which balances the two conflicting perspectives. In toxicology studies, experimental units are living animals. Here we focus on a potential non-monotonic dose-response relationship which is known as hormesis. Briefly, hormesis is a phenomenon which can be characterized by a beneficial effect at low doses and a harmful effect at high doses. In cancer risk assessments, the estimation of a parameter, which is known as a benchmark dose, can be highly sensitive to a class of assumptions, monotonicity or hormesis. In this regard, we propose a robust approach which considers both monotonicity and hormesis as a possibility. In addition, We discuss statistical hypothesis testing for hormesis and consider various experimental designs for detecting hormesis based on Bayesian decision theory. Past experiments have not been optimally designed for testing for hormesis, and some Bayesian optimal designs may not be optimal under a wrong parametric assumption. In this regard, we consider a robust experimental design which does not require any parametric assumption.
ERIC Educational Resources Information Center
Vos, Hans J.
As part of a project formulating optimal rules for decision making in computer assisted instructional systems in which the computer is used as a decision support tool, an approach that simultaneously optimizes classification of students into two treatments, each followed by a mastery decision, is presented using the framework of Bayesian decision…
The anatomy of choice: active inference and agency.
Friston, Karl; Schwartenbeck, Philipp; Fitzgerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J
2013-01-01
This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behavior. In particular, we consider prior beliefs that action minimizes the Kullback-Leibler (KL) divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimizes a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimizing free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action-constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualizes optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution-that minimizes free energy. This sensitivity corresponds to the precision of beliefs about behavior, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behavior entails a representation of confidence about outcomes that are under an agent's control.
The performance of matched-field track-before-detect methods using shallow-water Pacific data.
Tantum, Stacy L; Nolte, Loren W; Krolik, Jeffrey L; Harmanci, Kerem
2002-07-01
Matched-field track-before-detect processing, which extends the concept of matched-field processing to include modeling of the source dynamics, has recently emerged as a promising approach for maintaining the track of a moving source. In this paper, optimal Bayesian and minimum variance beamforming track-before-detect algorithms which incorporate a priori knowledge of the source dynamics in addition to the underlying uncertainties in the ocean environment are presented. A Markov model is utilized for the source motion as a means of capturing the stochastic nature of the source dynamics without assuming uniform motion. In addition, the relationship between optimal Bayesian track-before-detect processing and minimum variance track-before-detect beamforming is examined, revealing how an optimal tracking philosophy may be used to guide the modification of existing beamforming techniques to incorporate track-before-detect capabilities. Further, the benefits of implementing an optimal approach over conventional methods are illustrated through application of these methods to shallow-water Pacific data collected as part of the SWellEX-1 experiment. The results show that incorporating Markovian dynamics for the source motion provides marked improvement in the ability to maintain target track without the use of a uniform velocity hypothesis.
Dyvorne, Hadrien A.; Galea, Nicola; Nevers, Thomas; Fiel, M. Isabel; Carpenter, David; Wong, Edmund; Orton, Matthew; de Oliveira, Andre; Feiweier, Thorsten; Vachon, Marie-Louise; Babb, James S.
2013-01-01
Purpose: To optimize intravoxel incoherent motion (IVIM) diffusion-weighted (DW) imaging by estimating the effects of diffusion gradient polarity and breathing acquisition scheme on image quality, signal-to-noise ratio (SNR), IVIM parameters, and parameter reproducibility, as well as to investigate the potential of IVIM in the detection of hepatic fibrosis. Materials and Methods: In this institutional review board–approved prospective study, 20 subjects (seven healthy volunteers, 13 patients with hepatitis C virus infection; 14 men, six women; mean age, 46 years) underwent IVIM DW imaging with four sequences: (a) respiratory-triggered (RT) bipolar (BP) sequence, (b) RT monopolar (MP) sequence, (c) free-breathing (FB) BP sequence, and (d) FB MP sequence. Image quality scores were assessed for all sequences. A biexponential analysis with the Bayesian method yielded true diffusion coefficient (D), pseudodiffusion coefficient (D*), and perfusion fraction (PF) in liver parenchyma. Mixed-model analysis of variance was used to compare image quality, SNR, IVIM parameters, and interexamination variability between the four sequences, as well as the ability to differentiate areas of liver fibrosis from normal liver tissue. Results: Image quality with RT sequences was superior to that with FB acquisitions (P = .02) and was not affected by gradient polarity. SNR did not vary significantly between sequences. IVIM parameter reproducibility was moderate to excellent for PF and D, while it was less reproducible for D*. PF and D were both significantly lower in patients with hepatitis C virus than in healthy volunteers with the RT BP sequence (PF = 13.5% ± 5.3 [standard deviation] vs 9.2% ± 2.5, P = .038; D = [1.16 ± 0.07] × 10−3 mm2/sec vs [1.03 ± 0.1] × 10−3 mm2/sec, P = .006). Conclusion: The RT BP DW imaging sequence had the best results in terms of image quality, reproducibility, and ability to discriminate between healthy and fibrotic liver with biexponential fitting. © RSNA, 2012 PMID:23220895
Cortical Hierarchies Perform Bayesian Causal Inference in Multisensory Perception
Rohe, Tim; Noppeney, Uta
2015-01-01
To form a veridical percept of the environment, the brain needs to integrate sensory signals from a common source but segregate those from independent sources. Thus, perception inherently relies on solving the “causal inference problem.” Behaviorally, humans solve this problem optimally as predicted by Bayesian Causal Inference; yet, the underlying neural mechanisms are unexplored. Combining psychophysics, Bayesian modeling, functional magnetic resonance imaging (fMRI), and multivariate decoding in an audiovisual spatial localization task, we demonstrate that Bayesian Causal Inference is performed by a hierarchy of multisensory processes in the human brain. At the bottom of the hierarchy, in auditory and visual areas, location is represented on the basis that the two signals are generated by independent sources (= segregation). At the next stage, in posterior intraparietal sulcus, location is estimated under the assumption that the two signals are from a common source (= forced fusion). Only at the top of the hierarchy, in anterior intraparietal sulcus, the uncertainty about the causal structure of the world is taken into account and sensory signals are combined as predicted by Bayesian Causal Inference. Characterizing the computational operations of signal interactions reveals the hierarchical nature of multisensory perception in human neocortex. It unravels how the brain accomplishes Bayesian Causal Inference, a statistical computation fundamental for perception and cognition. Our results demonstrate how the brain combines information in the face of uncertainty about the underlying causal structure of the world. PMID:25710328
New Stopping Criteria for Segmenting DNA Sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Wentian
2001-06-18
We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less
Klopfenstein, Ned B; Stewart, Jane E; Ota, Yuko; Hanna, John W; Richardson, Bryce A; Ross-Davis, Amy L; Elías-Román, Rubén D; Korhonen, Kari; Keča, Nenad; Iturritxa, Eugenia; Alvarado-Rosales, Dionicio; Solheim, Halvor; Brazee, Nicholas J; Łakomy, Piotr; Cleary, Michelle R; Hasegawa, Eri; Kikuchi, Taisei; Garza-Ocañas, Fortunato; Tsopelas, Panaghiotis; Rigling, Daniel; Prospero, Simone; Tsykun, Tetyana; Bérubé, Jean A; Stefani, Franck O P; Jafarpour, Saeideh; Antonín, Vladimír; Tomšovský, Michal; McDonald, Geral I; Woodward, Stephen; Kim, Mee-Sook
2017-01-01
Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequence-based analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation elongation factor 1-α (tef1) sequences are highly informative for phylogenetic analysis of Armillaria species within diverse global regions. This study used Neighbor-net and coalescence-based Bayesian analyses to examine phylogenetic relationships of newly determined and existing tef1 sequences derived from diverse Armillaria species from across the Northern Hemisphere, with Southern Hemisphere Armillaria species included for reference. Based on the Bayesian analysis of tef1 sequences, Armillaria species from the Northern Hemisphere are generally contained within the following four superclades, which are named according to the specific epithet of the most frequently cited species within the superclade: (i) Socialis/Tabescens (exannulate) superclade including Eurasian A. ectypa, North American A. socialis (A. tabescens), and Eurasian A. socialis (A. tabescens) clades; (ii) Mellea superclade including undescribed annulate North American Armillaria sp. (Mexico) and four separate clades of A. mellea (Europe and Iran, eastern Asia, and two groups from North America); (iii) Gallica superclade including Armillaria Nag E (Japan), multiple clades of A. gallica (Asia and Europe), A. calvescens (eastern North America), A. cepistipes (North America), A. altimontana (western USA), A. nabsnona (North America and Japan), and at least two A. gallica clades (North America); and (iv) Solidipes/Ostoyae superclade including two A. solidipes/ostoyae clades (North America), A. gemina (eastern USA), A. solidipes/ostoyae (Eurasia), A. cepistipes (Europe and Japan), A. sinapina (North America and Japan), and A. borealis (Eurasia) clade 2. Of note is that A. borealis (Eurasia) clade 1 appears basal to the Solidipes/Ostoyae and Gallica superclades. The Neighbor-net analysis showed similar phylogenetic relationships. This study further demonstrates the utility of tef1 for global phylogenetic studies of Armillaria species and provides critical insights into multiple taxonomic issues that warrant further study.
A Systematic Bayesian Integration of Epidemiological and Genetic Data
Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin
2015-01-01
Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399
Higher-level phylogeny of paraneopteran insects inferred from mitochondrial genome sequences
Li, Hu; Shao, Renfu; Song, Nan; Song, Fan; Jiang, Pei; Li, Zhihong; Cai, Wanzhi
2015-01-01
Mitochondrial (mt) genome data have been proven to be informative for animal phylogenetic studies but may also suffer from systematic errors, due to the effects of accelerated substitution rate and compositional heterogeneity. We analyzed the mt genomes of 25 insect species from the four paraneopteran orders, aiming to better understand how accelerated substitution rate and compositional heterogeneity affect the inferences of the higher-level phylogeny of this diverse group of hemimetabolous insects. We found substantial heterogeneity in base composition and contrasting rates in nucleotide substitution among these paraneopteran insects, which complicate the inference of higher-level phylogeny. The phylogenies inferred with concatenated sequences of mt genes using maximum likelihood and Bayesian methods and homogeneous models failed to recover Psocodea and Hemiptera as monophyletic groups but grouped, instead, the taxa that had accelerated substitution rates together, including Sternorrhyncha (a suborder of Hemiptera), Thysanoptera, Phthiraptera and Liposcelididae (a family of Psocoptera). Bayesian inference with nucleotide sequences and heterogeneous models (CAT and CAT + GTR), however, recovered Psocodea, Thysanoptera and Hemiptera each as a monophyletic group. Within Psocodea, Liposcelididae is more closely related to Phthiraptera than to other species of Psocoptera. Furthermore, Thysanoptera was recovered as the sister group to Hemiptera. PMID:25704094
Chen, Zhijian; Craiu, Radu V; Bull, Shelley B
2014-11-01
In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies. © 2014 WILEY PERIODICALS, INC.
Nonparametric Bayesian clustering to detect bipolar methylated genomic loci.
Wu, Xiaowei; Sun, Ming-An; Zhu, Hongxiao; Xie, Hehuang
2015-01-16
With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms. Highly variable methylation patterns reflect stochastic fluctuations in DNA methylation, whereas well-structured methylation patterns imply deterministic methylation events. Among these methylation patterns, bipolar patterns are important as they may originate from allele-specific methylation (ASM) or cell-specific methylation (CSM). Utilizing nonparametric Bayesian clustering followed by hypothesis testing, we have developed a novel statistical approach to identify bipolar methylated genomic regions in bisulfite sequencing data. Simulation studies demonstrate that the proposed method achieves good performance in terms of specificity and sensitivity. We used the method to analyze data from mouse brain and human blood methylomes. The bipolar methylated segments detected are found highly consistent with the differentially methylated regions identified by using purified cell subsets. Bipolar DNA methylation often indicates epigenetic heterogeneity caused by ASM or CSM. With allele-specific events filtered out or appropriately taken into account, our proposed approach sheds light on the identification of cell-specific genes/pathways under strong epigenetic control in a heterogeneous cell population.
Kalil, Andre C; Sun, Junfeng
2014-10-01
To review Bayesian methodology and its utility to clinical decision making and research in the critical care field. Clinical, epidemiological, and biostatistical studies on Bayesian methods in PubMed and Embase from their inception to December 2013. Bayesian methods have been extensively used by a wide range of scientific fields, including astronomy, engineering, chemistry, genetics, physics, geology, paleontology, climatology, cryptography, linguistics, ecology, and computational sciences. The application of medical knowledge in clinical research is analogous to the application of medical knowledge in clinical practice. Bedside physicians have to make most diagnostic and treatment decisions on critically ill patients every day without clear-cut evidence-based medicine (more subjective than objective evidence). Similarly, clinical researchers have to make most decisions about trial design with limited available data. Bayesian methodology allows both subjective and objective aspects of knowledge to be formally measured and transparently incorporated into the design, execution, and interpretation of clinical trials. In addition, various degrees of knowledge and several hypotheses can be tested at the same time in a single clinical trial without the risk of multiplicity. Notably, the Bayesian technology is naturally suited for the interpretation of clinical trial findings for the individualized care of critically ill patients and for the optimization of public health policies. We propose that the application of the versatile Bayesian methodology in conjunction with the conventional statistical methods is not only ripe for actual use in critical care clinical research but it is also a necessary step to maximize the performance of clinical trials and its translation to the practice of critical care medicine.
Viral Linkage in HIV-1 Seroconverters and Their Partners in an HIV-1 Prevention Clinical Trial
Campbell, Mary S.; Mullins, James I.; Hughes, James P.; Celum, Connie; Wong, Kim G.; Raugi, Dana N.; Sorensen, Stefanie; Stoddard, Julia N.; Zhao, Hong; Deng, Wenjie; Kahle, Erin; Panteleeff, Dana; Baeten, Jared M.; McCutchan, Francine E.; Albert, Jan; Leitner, Thomas; Wald, Anna; Corey, Lawrence; Lingappa, Jairam R.
2011-01-01
Background Characterization of viruses in HIV-1 transmission pairs will help identify biological determinants of infectiousness and evaluate candidate interventions to reduce transmission. Although HIV-1 sequencing is frequently used to substantiate linkage between newly HIV-1 infected individuals and their sexual partners in epidemiologic and forensic studies, viral sequencing is seldom applied in HIV-1 prevention trials. The Partners in Prevention HSV/HIV Transmission Study (ClinicalTrials.gov #NCT00194519) was a prospective randomized placebo-controlled trial that enrolled serodiscordant heterosexual couples to determine the efficacy of genital herpes suppression in reducing HIV-1 transmission; as part of the study analysis, HIV-1 sequences were examined for genetic linkage between seroconverters and their enrolled partners. Methodology/Principal Findings We obtained partial consensus HIV-1 env and gag sequences from blood plasma for 151 transmission pairs and performed deep sequencing of env in some cases. We analyzed sequences with phylogenetic techniques and developed a Bayesian algorithm to evaluate the probability of linkage. For linkage, we required monophyletic clustering between enrolled partners' sequences and a Bayesian posterior probability of ≥50%. Adjudicators classified each seroconversion, finding 108 (71.5%) linked, 40 (26.5%) unlinked, and 3 (2.0%) indeterminate transmissions, with linkage determined by consensus env sequencing in 91 (84%). Male seroconverters had a higher frequency of unlinked transmissions than female seroconverters. The likelihood of transmission from the enrolled partner was related to time on study, with increasing numbers of unlinked transmissions occurring after longer observation periods. Finally, baseline viral load was found to be significantly higher among linked transmitters. Conclusions/Significance In this first use of HIV-1 sequencing to establish endpoints in a large clinical trial, more than one-fourth of transmissions were unlinked to the enrolled partner, illustrating the relevance of these methods in the design of future HIV-1 prevention trials in serodiscordant couples. A hierarchy of sequencing techniques, analysis methods, and expert adjudication contributed to the linkage determination process. PMID:21399681
How Much Can We Learn from a Single Chromatographic Experiment? A Bayesian Perspective.
Wiczling, Paweł; Kaliszan, Roman
2016-01-05
In this work, we proposed and investigated a Bayesian inference procedure to find the desired chromatographic conditions based on known analyte properties (lipophilicity, pKa, and polar surface area) using one preliminary experiment. A previously developed nonlinear mixed effect model was used to specify the prior information about a new analyte with known physicochemical properties. Further, the prior (no preliminary data) and posterior predictive distribution (prior + one experiment) were determined sequentially to search towards the desired separation. The following isocratic high-performance reversed-phase liquid chromatographic conditions were sought: (1) retention time of a single analyte within the range of 4-6 min and (2) baseline separation of two analytes with retention times within the range of 4-10 min. The empirical posterior Bayesian distribution of parameters was estimated using the "slice sampling" Markov Chain Monte Carlo (MCMC) algorithm implemented in Matlab. The simulations with artificial analytes and experimental data of ketoprofen and papaverine were used to test the proposed methodology. The simulation experiment showed that for a single and two randomly selected analytes, there is 97% and 74% probability of obtaining a successful chromatogram using none or one preliminary experiment. The desired separation for ketoprofen and papaverine was established based on a single experiment. It was confirmed that the search for a desired separation rarely requires a large number of chromatographic analyses at least for a simple optimization problem. The proposed Bayesian-based optimization scheme is a powerful method of finding a desired chromatographic separation based on a small number of preliminary experiments.
A Bayesian framework for extracting human gait using strong prior knowledge.
Zhou, Ziheng; Prügel-Bennett, Adam; Damper, Robert I
2006-11-01
Extracting full-body motion of walking people from monocular video sequences in complex, real-world environments is an important and difficult problem, going beyond simple tracking, whose satisfactory solution demands an appropriate balance between use of prior knowledge and learning from data. We propose a consistent Bayesian framework for introducing strong prior knowledge into a system for extracting human gait. In this work, the strong prior is built from a simple articulated model having both time-invariant (static) and time-variant (dynamic) parameters. The model is easily modified to cater to situations such as walkers wearing clothing that obscures the limbs. The statistics of the parameters are learned from high-quality (indoor laboratory) data and the Bayesian framework then allows us to "bootstrap" to accurate gait extraction on the noisy images typical of cluttered, outdoor scenes. To achieve automatic fitting, we use a hidden Markov model to detect the phases of images in a walking cycle. We demonstrate our approach on silhouettes extracted from fronto-parallel ("sideways on") sequences of walkers under both high-quality indoor and noisy outdoor conditions. As well as high-quality data with synthetic noise and occlusions added, we also test walkers with rucksacks, skirts, and trench coats. Results are quantified in terms of chamfer distance and average pixel error between automatically extracted body points and corresponding hand-labeled points. No one part of the system is novel in itself, but the overall framework makes it feasible to extract gait from very much poorer quality image sequences than hitherto. This is confirmed by comparing person identification by gait using our method and a well-established baseline recognition algorithm.
Emerging Concepts of Data Integration in Pathogen Phylodynamics.
Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Wang, Xulong; Philip, Vivek M.; Ananda, Guruprasad; White, Charles C.; Malhotra, Ankit; Michalski, Paul J.; Karuturi, Krishna R. Murthy; Chintalapudi, Sumana R.; Acklin, Casey; Sasner, Michael; Bennett, David A.; De Jager, Philip L.; Howell, Gareth R.; Carter, Gregory W.
2018-01-01
Recent technical and methodological advances have greatly enhanced genome-wide association studies (GWAS). The advent of low-cost, whole-genome sequencing facilitates high-resolution variant identification, and the development of linear mixed models (LMM) allows improved identification of putatively causal variants. While essential for correcting false positive associations due to sample relatedness and population stratification, LMMs have commonly been restricted to quantitative variables. However, phenotypic traits in association studies are often categorical, coded as binary case-control or ordered variables describing disease stages. To address these issues, we have devised a method for genomic association studies that implements a generalized LMM (GLMM) in a Bayesian framework, called Bayes-GLMM. Bayes-GLMM has four major features: (1) support of categorical, binary, and quantitative variables; (2) cohesive integration of previous GWAS results for related traits; (3) correction for sample relatedness by mixed modeling; and (4) model estimation by both Markov chain Monte Carlo sampling and maximal likelihood estimation. We applied Bayes-GLMM to the whole-genome sequencing cohort of the Alzheimer’s Disease Sequencing Project. This study contains 570 individuals from 111 families, each with Alzheimer’s disease diagnosed at one of four confidence levels. Using Bayes-GLMM we identified four variants in three loci significantly associated with Alzheimer’s disease. Two variants, rs140233081 and rs149372995, lie between PRKAR1B and PDGFA. The coded proteins are localized to the glial-vascular unit, and PDGFA transcript levels are associated with Alzheimer’s disease-related neuropathology. In summary, this work provides implementation of a flexible, generalized mixed-model approach in a Bayesian framework for association studies. PMID:29507048
Emerging Concepts of Data Integration in Pathogen Phylodynamics
Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe
2017-01-01
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Cavagnaro, Daniel R; Myung, Jay I; Pitt, Mark A; Kujala, Janne V
2010-04-01
Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experimental designs under which one can infer the underlying model in the fewest possible steps. When the models under consideration are nonlinear, as is often the case in cognitive science, this problem can be impossible to solve analytically without simplifying assumptions. However, as we show in this letter, a full solution can be found numerically with the help of a Bayesian computational trick derived from the statistics literature, which recasts the problem as a probability density simulation in which the optimal design is the mode of the density. We use a utility function based on mutual information and give three intuitive interpretations of the utility function in terms of Bayesian posterior estimates. As a proof of concept, we offer a simple example application to an experiment on memory retention.
The Long Exercise Test in Periodic Paralysis: A Bayesian Analysis.
Simmons, Daniel B; Lanning, Julie; Cleland, James C; Puwanant, Araya; Twydell, Paul T; Griggs, Robert C; Tawil, Rabi; Logigian, Eric L
2018-05-12
The long exercise test (LET) is used to assess the diagnosis of periodic paralysis (PP), but LET methodology and normal "cut-off" values vary. To determine optimal LET methodology and cut-offs, we reviewed LET data (abductor digiti minimi (ADM) motor response amplitude, area) from 55 PP patients (32 genetically definite) and 125 controls. Receiver operating characteristic (ROC) curves were constructed and area-under-the-curve (AUC) calculated to compare 1) peak-to-nadir versus baseline-to-nadir methodologies, and 2) amplitude versus area decrements. Using Bayesian principles, optimal "cut-off" decrements that achieved 95% post-test probability of PP were calculated for various pre-test probabilities (PreTPs). AUC was highest for peak-to-nadir methodology and equal for amplitude and area decrements. For PreTP ≤50%, optimal decrement cut-offs (peak-to-nadir) were >40% (amplitude) or >50% (area). For confirmation of PP, our data endorse the diagnostic utility of peak-to-nadir LET methodology using 40% amplitude or 50% area decrement cut-offs for PreTPs ≤50%. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Partial Planning Reinforcement Learning
2012-08-31
Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS Reinforcement Learning, Bayesian Optimization, Active ... Learning , Action Model Learning, Decision Theoretic Assistance Prasad Tadepalli, Alan Fern Oregon State University Office of Sponsored Programs Oregon State
Becker, Michael P I; Nitsch, Alexander M; Hewig, Johannes; Miltner, Wolfgang H R; Straube, Thomas
2016-12-01
Several regions of the frontal cortex interact with striatal and amygdala regions to mediate the evaluation of reward-related information and subsequent adjustment of response choices. Recent theories discuss the particular relevance of dorsal anterior cingulate cortex (dACC) for switching behavior; consecutively, ventromedial prefrontal cortex (VMPFC) is involved in mediating exploitative behaviors by tracking reward values unfolding after the behavioral switch. Amygdala, on the other hand, has been implied in coding the valence of stimulus-outcome associations and the ventral striatum (VS) has consistently been shown to code a reward prediction error (RPE). Here, we used fMRI data acquired in humans during a reversal task to parametrically model different sequences of positive feedback in order to unravel differential contributions of these brain regions to the tracking and exploitation of rewards. Parameters from an Optimal Bayesian Learner accurately predicted the divergent involvement of dACC and VMPFC during feedback processing: dACC signaled the first, but not later, presentations of positive feedback, while VMPFC coded trial-by-trial accumulations in reward value. Our results confirm that dACC carries a prominent confirmatory signal during processing of first positive feedback. Amygdala coded positive feedbacks more uniformly, while striatal regions were associated with RPE. Copyright © 2016 Elsevier Inc. All rights reserved.
Gaussian process surrogates for failure detection: A Bayesian experimental design approach
NASA Astrophysics Data System (ADS)
Wang, Hongqiao; Lin, Guang; Li, Jinglai
2016-05-01
An important task of uncertainty quantification is to identify the probability of undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian process surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremely expensive, and in this setting, determining the sampling points in the state space is of essential importance. We formulate the problem as an optimal experimental design for Bayesian inferences of the limit state (i.e., the failure boundary) and propose an efficient numerical scheme to solve the resulting optimization problem. In particular, the proposed limit-state inference method is capable of determining multiple sampling points at a time, and thus it is well suited for problems where multiple computer simulations can be performed in parallel. The accuracy and performance of the proposed method is demonstrated by both academic and practical examples.
NASA Astrophysics Data System (ADS)
Pang, Guofei; Perdikaris, Paris; Cai, Wei; Karniadakis, George Em
2017-11-01
The fractional advection-dispersion equation (FADE) can describe accurately the solute transport in groundwater but its fractional order has to be determined a priori. Here, we employ multi-fidelity Bayesian optimization to obtain the fractional order under various conditions, and we obtain more accurate results compared to previously published data. Moreover, the present method is very efficient as we use different levels of resolution to construct a stochastic surrogate model and quantify its uncertainty. We consider two different problem set ups. In the first set up, we obtain variable fractional orders of one-dimensional FADE, considering both synthetic and field data. In the second set up, we identify constant fractional orders of two-dimensional FADE using synthetic data. We employ multi-resolution simulations using two-level and three-level Gaussian process regression models to construct the surrogates.
Goal-oriented Site Characterization in Hydrogeological Applications: An Overview
NASA Astrophysics Data System (ADS)
Nowak, W.; de Barros, F.; Rubin, Y.
2011-12-01
In this study, we address the importance of goal-oriented site characterization. Given the multiple sources of uncertainty in hydrogeological applications, information needs of modeling, prediction and decision support should be satisfied with efficient and rational field campaigns. In this work, we provide an overview of an optimal sampling design framework based on Bayesian decision theory, statistical parameter inference and Bayesian model averaging. It optimizes the field sampling campaign around decisions on environmental performance metrics (e.g., risk, arrival times, etc.) while accounting for parametric and model uncertainty in the geostatistical characterization, in forcing terms, and measurement error. The appealing aspects of the framework lie on its goal-oriented character and that it is directly linked to the confidence in a specified decision. We illustrate how these concepts could be applied in a human health risk problem where uncertainty from both hydrogeological and health parameters are accounted.
Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking.
Li, Chenglong; Cheng, Hui; Hu, Shiyi; Liu, Xiaobai; Tang, Jin; Lin, Liang
2016-09-27
Integrating multiple different yet complementary feature representations has been proved to be an effective way for boosting tracking performance. This paper investigates how to perform robust object tracking in challenging scenarios by adaptively incorporating information from grayscale and thermal videos, and proposes a novel collaborative algorithm for online tracking. In particular, an adaptive fusion scheme is proposed based on collaborative sparse representation in Bayesian filtering framework. We jointly optimize sparse codes and the reliable weights of different modalities in an online way. In addition, this work contributes a comprehensive video benchmark, which includes 50 grayscale-thermal sequences and their ground truth annotations for tracking purpose. The videos are with high diversity and the annotations were finished by one single person to guarantee consistency. Extensive experiments against other stateof- the-art trackers with both grayscale and grayscale-thermal inputs demonstrate the effectiveness of the proposed tracking approach. Through analyzing quantitative results, we also provide basic insights and potential future research directions in grayscale-thermal tracking.
Jaworska, Joanna; Harol, Artsiom; Kern, Petra S; Gerberick, G Frank
2011-01-01
There is an urgent need to develop data integration and testing strategy frameworks allowing interpretation of results from animal alternative test batteries. To this end, we developed a Bayesian Network Integrated Testing Strategy (BN ITS) with the goal to estimate skin sensitization hazard as a test case of previously developed concepts (Jaworska et al., 2010). The BN ITS combines in silico, in chemico, and in vitro data related to skin penetration, peptide reactivity, and dendritic cell activation, and guides testing strategy by Value of Information (VoI). The approach offers novel insights into testing strategies: there is no one best testing strategy, but the optimal sequence of tests depends on information at hand, and is chemical-specific. Thus, a single generic set of tests as a replacement strategy is unlikely to be most effective. BN ITS offers the possibility of evaluating the impact of generating additional data on the target information uncertainty reduction before testing is commenced.
Canedo, Clarissa; Haddad, Célio F B
2012-11-01
We present a phylogenetic hypothesis of the anuran clade Terrarana based on partial sequences of nuclear (Tyr and RAG1) and mitochondrial (12S, tRNA-Val, and 16S) genes, testing the monophyly of Ischnocnema and its species series. We performed maximum parsimony, maximum likelihood, and Bayesian inference analyses on 364 terminals: 11 outgroup terminals and 353 ingroup Terrarana terminals, including 139 Ischnocnema terminals (accounting for 29 of the 35 named Ischnocnema species) and 214 other Terrarana terminals within the families Brachycephalidae, Ceuthomantidae, Craugastoridae, and Eleutherodactylidae. Different optimality criteria produced similar results and mostly recovered the currently accepted families and genera. According to these topologies, Ischnocnema is not a monophyletic group. We propose new combinations for three species, relocating them to Pristimantis, and render Eleutherodactylus bilineatus Bokermann, 1975 incertae sedis status within Holoadeninae. The rearrangements in Ischnocnema place it outside the northernmost Brazilian Atlantic rainforest, where the fauna of Terrarana comprises typical Amazonian genera. Copyright © 2012 Elsevier Inc. All rights reserved.
Fuzzy Naive Bayesian model for medical diagnostic decision support.
Wagholikar, Kavishwar B; Vijayraghavan, Sundararajan; Deshpande, Ashok W
2009-01-01
This work relates to the development of computational algorithms to provide decision support to physicians. The authors propose a Fuzzy Naive Bayesian (FNB) model for medical diagnosis, which extends the Fuzzy Bayesian approach proposed by Okuda. A physician's interview based method is described to define a orthogonal fuzzy symptom information system, required to apply the model. For the purpose of elaboration and elicitation of characteristics, the algorithm is applied to a simple simulated dataset, and compared with conventional Naive Bayes (NB) approach. As a preliminary evaluation of FNB in real world scenario, the comparison is repeated on a real fuzzy dataset of 81 patients diagnosed with infectious diseases. The case study on simulated dataset elucidates that FNB can be optimal over NB for diagnosing patients with imprecise-fuzzy information, on account of the following characteristics - 1) it can model the information that, values of some attributes are semantically closer than values of other attributes, and 2) it offers a mechanism to temper exaggerations in patient information. Although the algorithm requires precise training data, its utility for fuzzy training data is argued for. This is supported by the case study on infectious disease dataset, which indicates optimality of FNB over NB for the infectious disease domain. Further case studies on large datasets are required to establish utility of FNB.
Robust Bayesian Experimental Design for Conceptual Model Discrimination
NASA Astrophysics Data System (ADS)
Pham, H. V.; Tsai, F. T. C.
2015-12-01
A robust Bayesian optimal experimental design under uncertainty is presented to provide firm information for model discrimination, given the least number of pumping wells and observation wells. Firm information is the maximum information of a system can be guaranteed from an experimental design. The design is based on the Box-Hill expected entropy decrease (EED) before and after the experiment design and the Bayesian model averaging (BMA) framework. A max-min programming is introduced to choose the robust design that maximizes the minimal Box-Hill EED subject to that the highest expected posterior model probability satisfies a desired probability threshold. The EED is calculated by the Gauss-Hermite quadrature. The BMA method is used to predict future observations and to quantify future observation uncertainty arising from conceptual and parametric uncertainties in calculating EED. Monte Carlo approach is adopted to quantify the uncertainty in the posterior model probabilities. The optimal experimental design is tested by a synthetic 5-layer anisotropic confined aquifer. Nine conceptual groundwater models are constructed due to uncertain geological architecture and boundary condition. High-performance computing is used to enumerate all possible design solutions in order to identify the most plausible groundwater model. Results highlight the impacts of scedasticity in future observation data as well as uncertainty sources on potential pumping and observation locations.
Bayesian Monte Carlo and Maximum Likelihood Approach for ...
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood estimation (BMCML) to calibrate a lake oxygen recovery model. We first derive an analytical solution of the differential equation governing lake-averaged oxygen dynamics as a function of time-variable wind speed. Statistical inferences on model parameters and predictive uncertainty are then drawn by Bayesian conditioning of the analytical solution on observed daily wind speed and oxygen concentration data obtained from an earlier study during two recovery periods on a eutrophic lake in upper state New York. The model is calibrated using oxygen recovery data for one year and statistical inferences were validated using recovery data for another year. Compared with essentially two-step, regression and optimization approach, the BMCML results are more comprehensive and performed relatively better in predicting the observed temporal dissolved oxygen levels (DO) in the lake. BMCML also produced comparable calibration and validation results with those obtained using popular Markov Chain Monte Carlo technique (MCMC) and is computationally simpler and easier to implement than the MCMC. Next, using the calibrated model, we derive an optimal relationship between liquid film-transfer coefficien
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes.
Schwartenbeck, Philipp; FitzGerald, Thomas H B; Mathys, Christoph; Dolan, Ray; Friston, Karl
2015-10-01
Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a "limited offer" game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing. © The Author 2014. Published by Oxford University Press.
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes
Schwartenbeck, Philipp; FitzGerald, Thomas H. B.; Mathys, Christoph; Dolan, Ray; Friston, Karl
2015-01-01
Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a “limited offer” game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing. PMID:25056572
Phylogenetically marking the limits of the genus Fusarium for post-Article 59 usage
USDA-ARS?s Scientific Manuscript database
Fusarium (Hypocreales, Nectriaceae) is one of the most important and systematically challenging groups of mycotoxigenic, plant pathogenic, and human pathogenic fungi. We conducted maximum likelihood (ML), maximum parsimony (MP) and Bayesian (B) analyses on partial nucleotide sequences of genes encod...
Duputel, Zacharie; Jiang, Junle; Jolivet, Romain; Simons, Mark; Rivera, Luis; Ampuero, Jean-Paul; Riel, Bryan; Owen, Susan E; Moore, Angelyn W; Samsonov, Sergey V; Ortega Culaciati, Francisco; Minson, Sarah E.
2016-01-01
The subduction zone in northern Chile is a well-identified seismic gap that last ruptured in 1877. On 1 April 2014, this region was struck by a large earthquake following a two week long series of foreshocks. This study combines a wide range of observations, including geodetic, tsunami, and seismic data, to produce a reliable kinematic slip model of the Mw=8.1 main shock and a static slip model of the Mw=7.7 aftershock. We use a novel Bayesian modeling approach that accounts for uncertainty in the Green's functions, both static and dynamic, while avoiding nonphysical regularization. The results reveal a sharp slip zone, more compact than previously thought, located downdip of the foreshock sequence and updip of high-frequency sources inferred by back-projection analysis. Both the main shock and the Mw=7.7 aftershock did not rupture to the trench and left most of the seismic gap unbroken, leaving the possibility of a future large earthquake in the region.
Torres-Carvajal, Omar; Schulte, James A; Cadle, John E
2006-04-01
The South American iguanian lizard genus Stenocercus includes 54 species occurring mostly in the Andes and adjacent lowland areas from northern Venezuela and Colombia to central Argentina at elevations of 0-4000m. Small taxon or character sampling has characterized all phylogenetic analyses of Stenocercus, which has long been recognized as sister taxon to the Tropidurus Group. In this study, we use mtDNA sequence data to perform phylogenetic analyses that include 32 species of Stenocercus and 12 outgroup taxa. Monophyly of this genus is strongly supported by maximum parsimony and Bayesian analyses. Evolutionary relationships within Stenocercus are further analyzed with a Bayesian implementation of a general mixture model, which accommodates variability in the pattern of evolution across sites. These analyses indicate a basal split of Stenocercus into two clades, one of which receives very strong statistical support. In addition, we test previous hypotheses using non-parametric and parametric statistical methods, and provide a phylogenetic classification for Stenocercus.
Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R
2014-06-01
The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.
A Bayesian observer replicates convexity context effects in figure-ground perception.
Goldreich, Daniel; Peterson, Mary A
2012-01-01
Peterson and Salvagio (2008) demonstrated convexity context effects in figure-ground perception. Subjects shown displays consisting of unfamiliar alternating convex and concave regions identified the convex regions as foreground objects progressively more frequently as the number of regions increased; this occurred only when the concave regions were homogeneously colored. The origins of these effects have been unclear. Here, we present a two-free-parameter Bayesian observer that replicates convexity context effects. The Bayesian observer incorporates two plausible expectations regarding three-dimensional scenes: (1) objects tend to be convex rather than concave, and (2) backgrounds tend (more than foreground objects) to be homogeneously colored. The Bayesian observer estimates the probability that a depicted scene is three-dimensional, and that the convex regions are figures. It responds stochastically by sampling from its posterior distributions. Like human observers, the Bayesian observer shows convexity context effects only for images with homogeneously colored concave regions. With optimal parameter settings, it performs similarly to the average human subject on the four display types tested. We propose that object convexity and background color homogeneity are environmental regularities exploited by human visual perception; vision achieves figure-ground perception by interpreting ambiguous images in light of these and other expected regularities in natural scenes.
Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A; Lu, Zhong-Lin; Myung, Jay I
2016-01-01
Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias.
Distributed multisensory integration in a recurrent network model through supervised learning
NASA Astrophysics Data System (ADS)
Wang, He; Wong, K. Y. Michael
Sensory integration between different modalities has been extensively studied. It is suggested that the brain integrates signals from different modalities in a Bayesian optimal way. However, how the Bayesian rule is implemented in a neural network remains under debate. In this work we propose a biologically plausible recurrent network model, which can perform Bayesian multisensory integration after trained by supervised learning. Our model is composed of two modules, each for one modality. We assume that each module is a recurrent network, whose activity represents the posterior distribution of each stimulus. The feedforward input on each module is the likelihood of each modality. Two modules are integrated through cross-links, which are feedforward connections from the other modality, and reciprocal connections, which are recurrent connections between different modules. By stochastic gradient descent, we successfully trained the feedforward and recurrent coupling matrices simultaneously, both of which resembles the Mexican-hat. We also find that there are more than one set of coupling matrices that can approximate the Bayesian theorem well. Specifically, reciprocal connections and cross-links will compensate each other if one of them is removed. Even though trained with two inputs, the network's performance with only one input is in good accordance with what is predicted by the Bayesian theorem.
Approximate Bayesian evaluations of measurement uncertainty
NASA Astrophysics Data System (ADS)
Possolo, Antonio; Bodnar, Olha
2018-04-01
The Guide to the Expression of Uncertainty in Measurement (GUM) includes formulas that produce an estimate of a scalar output quantity that is a function of several input quantities, and an approximate evaluation of the associated standard uncertainty. This contribution presents approximate, Bayesian counterparts of those formulas for the case where the output quantity is a parameter of the joint probability distribution of the input quantities, also taking into account any information about the value of the output quantity available prior to measurement expressed in the form of a probability distribution on the set of possible values for the measurand. The approximate Bayesian estimates and uncertainty evaluations that we present have a long history and illustrious pedigree, and provide sufficiently accurate approximations in many applications, yet are very easy to implement in practice. Differently from exact Bayesian estimates, which involve either (analytical or numerical) integrations, or Markov Chain Monte Carlo sampling, the approximations that we describe involve only numerical optimization and simple algebra. Therefore, they make Bayesian methods widely accessible to metrologists. We illustrate the application of the proposed techniques in several instances of measurement: isotopic ratio of silver in a commercial silver nitrate; odds of cryptosporidiosis in AIDS patients; height of a manometer column; mass fraction of chromium in a reference material; and potential-difference in a Zener voltage standard.
Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A.; Lu, Zhong-Lin; Myung, Jay I.
2016-01-01
Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias. PMID:27105061
A Bayesian approach to the statistical analysis of device preference studies.
Fu, Haoda; Qu, Yongming; Zhu, Baojin; Huster, William
2012-01-01
Drug delivery devices are required to have excellent technical specifications to deliver drugs accurately, and in addition, the devices should provide a satisfactory experience to patients because this can have a direct effect on drug compliance. To compare patients' experience with two devices, cross-over studies with patient-reported outcomes (PRO) as response variables are often used. Because of the strength of cross-over designs, each subject can directly compare the two devices by using the PRO variables, and variables indicating preference (preferring A, preferring B, or no preference) can be easily derived. Traditionally, methods based on frequentist statistics can be used to analyze such preference data, but there are some limitations for the frequentist methods. Recently, Bayesian methods are considered an acceptable method by the US Food and Drug Administration to design and analyze device studies. In this paper, we propose a Bayesian statistical method to analyze the data from preference trials. We demonstrate that the new Bayesian estimator enjoys some optimal properties versus the frequentist estimator. Copyright © 2012 John Wiley & Sons, Ltd.
Fourment, Mathieu; Holmes, Edward C
2014-07-24
Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
Bayesian Integration of Information in Hippocampal Place Cells
Madl, Tamas; Franklin, Stan; Chen, Ke; Montaldi, Daniela; Trappl, Robert
2014-01-01
Accurate spatial localization requires a mechanism that corrects for errors, which might arise from inaccurate sensory information or neuronal noise. In this paper, we propose that Hippocampal place cells might implement such an error correction mechanism by integrating different sources of information in an approximately Bayes-optimal fashion. We compare the predictions of our model with physiological data from rats. Our results suggest that useful predictions regarding the firing fields of place cells can be made based on a single underlying principle, Bayesian cue integration, and that such predictions are possible using a remarkably small number of model parameters. PMID:24603429
A Prior for Neural Networks utilizing Enclosing Spheres for Normalization
NASA Astrophysics Data System (ADS)
v. Toussaint, U.; Gori, S.; Dose, V.
2004-11-01
Neural Networks are famous for their advantageous flexibility for problems when there is insufficient knowledge to set up a proper model. On the other hand this flexibility can cause over-fitting and can hamper the generalization properties of neural networks. Many approaches to regularize NN have been suggested but most of them based on ad-hoc arguments. Employing the principle of transformation invariance we derive a general prior in accordance with the Bayesian probability theory for a class of feedforward networks. Optimal networks are determined by Bayesian model comparison verifying the applicability of this approach.
Generalizability of Evidence-Based Assessment Recommendations for Pediatric Bipolar Disorder
Jenkins, Melissa M.; Youngstrom, Eric A.; Youngstrom, Jennifer Kogos; Feeny, Norah C.; Findling, Robert L.
2013-01-01
Bipolar disorder is frequently clinically diagnosed in youths who do not actually satisfy DSM-IV criteria, yet cases that would satisfy full DSM-IV criteria are often undetected clinically. Evidence-based assessment methods that incorporate Bayesian reasoning have demonstrated improved diagnostic accuracy, and consistency; however, their clinical utility is largely unexplored. The present study examines the effectiveness of promising evidence-based decision-making compared to the clinical gold standard. Participants were 562 youth, ages 5-17 and predominantly African American, drawn from a community mental health clinic. Research diagnoses combined semi-structured interview with youths’ psychiatric, developmental, and family mental health histories. Independent Bayesian estimates relied on published risk estimates from other samples discriminated bipolar diagnoses, Area Under Curve=.75, p<.00005. The Bayes and confidence ratings correlated rs =.30. Agreement about an evidence-based assessment intervention “threshold model” (wait/assess/treat) had K=.24, p<.05. No potential moderators of agreement between the Bayesian estimates and confidence ratings, including type of bipolar illness, were significant. Bayesian risk estimates were highly correlated with logistic regression estimates using optimal sample weights, r=.81, p<.0005. Clinical and Bayesian approaches agree in terms of overall concordance and deciding next clinical action, even when Bayesian predictions are based on published estimates from clinically and demographically different samples. Evidence-based assessment methods may be useful in settings that cannot routinely employ gold standard assessments, and they may help decrease rates of overdiagnosis while promoting earlier identification of true cases. PMID:22004538
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
An adaptive response surface method for crashworthiness optimization
NASA Astrophysics Data System (ADS)
Shi, Lei; Yang, Ren-Jye; Zhu, Ping
2013-11-01
Response surface-based design optimization has been commonly used for optimizing large-scale design problems in the automotive industry. However, most response surface models are built by a limited number of design points without considering data uncertainty. In addition, the selection of a response surface in the literature is often arbitrary. This article uses a Bayesian metric to systematically select the best available response surface among several candidates in a library while considering data uncertainty. An adaptive, efficient response surface strategy, which minimizes the number of computationally intensive simulations, was developed for design optimization of large-scale complex problems. This methodology was demonstrated by a crashworthiness optimization example.
NASA Astrophysics Data System (ADS)
An, M.; Assumpcao, M.
2003-12-01
The joint inversion of receiver function and surface wave is an effective way to diminish the influences of the strong tradeoff among parameters and the different sensitivity to the model parameters in their respective inversions, but the inversion problem becomes more complex. Multi-objective problems can be much more complicated than single-objective inversion in the model selection and optimization. If objectives are involved and conflicting, models can be ordered only partially. In this case, Pareto-optimal preference should be used to select solutions. On the other hand, the inversion to get only a few optimal solutions can not deal properly with the strong tradeoff between parameters, the uncertainties in the observation, the geophysical complexities and even the incompetency of the inversion technique. The effective way is to retrieve the geophysical information statistically from many acceptable solutions, which requires more competent global algorithms. Competent genetic algorithms recently proposed are far superior to the conventional genetic algorithm and can solve hard problems quickly, reliably and accurately. In this work we used one of competent genetic algorithms, Bayesian Optimization Algorithm as the main inverse procedure. This algorithm uses Bayesian networks to draw out inherited information and can use Pareto-optimal preference in the inversion. With this algorithm, the lithospheric structure of Paran"› basin is inverted to fit both the observations of inter-station surface wave dispersion and receiver function.
Eckstein, Miguel P; Mack, Stephen C; Liston, Dorion B; Bogush, Lisa; Menzel, Randolf; Krauzlis, Richard J
2013-06-07
Visual attention is commonly studied by using visuo-spatial cues indicating probable locations of a target and assessing the effect of the validity of the cue on perceptual performance and its neural correlates. Here, we adapt a cueing task to measure spatial cueing effects on the decisions of honeybees and compare their behavior to that of humans and monkeys in a similarly structured two-alternative forced-choice perceptual task. Unlike the typical cueing paradigm in which the stimulus strength remains unchanged within a block of trials, for the monkey and human studies we randomized the contrast of the signal to simulate more real world conditions in which the organism is uncertain about the strength of the signal. A Bayesian ideal observer that weights sensory evidence from cued and uncued locations based on the cue validity to maximize overall performance is used as a benchmark of comparison against the three animals and other suboptimal models: probability matching, ignore the cue, always follow the cue, and an additive bias/single decision threshold model. We find that the cueing effect is pervasive across all three species but is smaller in size than that shown by the Bayesian ideal observer. Humans show a larger cueing effect than monkeys and bees show the smallest effect. The cueing effect and overall performance of the honeybees allows rejection of the models in which the bees are ignoring the cue, following the cue and disregarding stimuli to be discriminated, or adopting a probability matching strategy. Stimulus strength uncertainty also reduces the theoretically predicted variation in cueing effect with stimulus strength of an optimal Bayesian observer and diminishes the size of the cueing effect when stimulus strength is low. A more biologically plausible model that includes an additive bias to the sensory response from the cued location, although not mathematically equivalent to the optimal observer for the case stimulus strength uncertainty, can approximate the benefits of the more computationally complex optimal Bayesian model. We discuss the implications of our findings on the field's common conceptualization of covert visual attention in the cueing task and what aspects, if any, might be unique to humans. Copyright © 2013 Elsevier Ltd. All rights reserved.
Javadi, Firouzeh; Tun, Ye Tun; Kawase, Makoto; Guan, Kaiyun; Yamaguchi, Hirofumi
2011-08-01
The subgenus Ceratotropis in the genus Vigna is widely distributed from the Himalayan highlands to South, Southeast and East Asia. However, the interspecific and geographical relationships of its members are poorly understood. This study investigates the phylogeny and biogeography of the subgenus Ceratotropis using chloroplast DNA sequence data. Sequence data from four intergenic spacer regions (petA-psbJ, psbD-trnT, trnT-trnE and trnT-trnL) of chloroplast DNA, alone and in combination, were analysed using Bayesian and parsimony methods. Divergence times for major clades were estimated with penalized likelihood. Character evolution was examined by means of parsimony optimization and MacClade. Parsimony and Bayesian phylogenetic analyses on the combined data demonstrated well-resolved species relationships in which 18 Vigna species were divided into two major geographical clades: the East Asia-Southeast Asian clade and the Indian subcontinent clade. Within these two clades, three well-supported eco-geographical groups, temperate and subtropical (the East Asia-Southeast Asian clade) and tropical (the Indian subcontinent clade), are recognized. The temperate group consists of V. minima, V. nepalensis and V. angularis. The subtropical group comprises the V. nakashimae-V. riukiuensis-V. minima subgroup and the V. hirtella-V. exilis-V. umbellata subgroup. The tropical group contains two subgroups: the V. trinervia-V. reflexo-pilosa-V. trilobata subgroup and the V. mungo-V. grandiflora subgroup. An evolutionary rate analysis estimated the divergence time between the East Asia-Southeast Asia clade and the Indian subcontinent clade as 3·62 ± 0·3 million years, and that between the temperate and subtropical groups as 2·0 ± 0·2 million years. The findings provide an improved understanding of the interspecific relationships, and ecological and geographical phylogenetic structure of the subgenus Ceratotropis. The quaternary diversification of the subgenus Ceratotropis implicates its geographical dispersal in the south-eastern part of Asia involving adaptation to climatic condition after the collision of the Indian subcontinent with the Asian plate. The phylogenetic results indicate that the epigeal germination is plesiomorphic, and the germination type evolved independently multiple times in this subgenus, implying its limited taxonomic utility.
The anatomy of choice: active inference and agency
Friston, Karl; Schwartenbeck, Philipp; FitzGerald, Thomas; Moutoussis, Michael; Behrens, Timothy; Dolan, Raymond J.
2013-01-01
This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behavior. In particular, we consider prior beliefs that action minimizes the Kullback–Leibler (KL) divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimizes a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimizing free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action—constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualizes optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution—that minimizes free energy. This sensitivity corresponds to the precision of beliefs about behavior, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behavior entails a representation of confidence about outcomes that are under an agent's control. PMID:24093015
A Surrogate Approach to the Experimental Optimization of Multielement Airfoils
NASA Technical Reports Server (NTRS)
Otto, John C.; Landman, Drew; Patera, Anthony T.
1996-01-01
The incorporation of experimental test data into the optimization process is accomplished through the use of Bayesian-validated surrogates. In the surrogate approach, a surrogate for the experiment (e.g., a response surface) serves in the optimization process. The validation step of the framework provides a qualitative assessment of the surrogate quality, and bounds the surrogate-for-experiment error on designs "near" surrogate-predicted optimal designs. The utility of the framework is demonstrated through its application to the experimental selection of the trailing edge ap position to achieve a design lift coefficient for a three-element airfoil.
Resolving the tips of the tree of life: How much mitochondrialdata doe we need?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonett, Ronald M.; Macey, J. Robert; Boore, Jeffrey L.
2005-04-29
Mitochondrial (mt) DNA sequences are used extensively to reconstruct evolutionary relationships among recently diverged animals,and have constituted the most widely used markers for species- and generic-level relationships for the last decade or more. However, most studies to date have employed relatively small portions of the mt-genome. In contrast, complete mt-genomes primarily have been used to investigate deep divergences, including several studies of the amount of mt sequence necessary to recover ancient relationships. We sequenced and analyzed 24 complete mt-genomes from a group of salamander species exhibiting divergences typical of those in many species-level studies. We present the first comprehensive investigationmore » of the amount of mt sequence data necessary to consistently recover the mt-genome tree at this level, using parsimony and Bayesian methods. Both methods of phylogenetic analysis revealed extremely similar results. A surprising number of well supported, yet conflicting, relationships were found in trees based on fragments less than {approx}2000 nucleotides (nt), typical of the vast majority of the thousands of mt-based studies published to date. Large amounts of data (11,500+ nt) were necessary to consistently recover the whole mt-genome tree. Some relationships consistently were recovered with fragments of all sizes, but many nodes required the majority of the mt-genome to stabilize, particularly those associated with short internal branches. Although moderate amounts of data (2000-3000 nt) were adequate to recover mt-based relationships for which most nodes were congruent with the whole mt-genome tree, many thousands of nucleotides were necessary to resolve rapid bursts of evolution. Recent advances in genomics are making collection of large amounts of sequence data highly feasible, and our results provide the basis for comparative studies of other closely related groups to optimize mt sequence sampling and phylogenetic resolution at the ''tips'' of the Tree of Life.« less
Content Structure as a Design Strategy Variable in Concept Acquisition.
ERIC Educational Resources Information Center
Tennyson, Robert D.; Tennyson, Carol L.
Three methods of sequencing coordinate concepts (simultaneous, collective, and successive) were investigated with a Bayesian, computer-based, adaptive control system. The data analysis showed that when coordinate concepts are taught simultaneously (contextually similar concepts presented at the same time), student performance is superior to either…
A Bayesian model averaging method for the derivation of reservoir operating rules
NASA Astrophysics Data System (ADS)
Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai
2015-09-01
Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.
Bayesian cross-entropy methodology for optimal design of validation experiments
NASA Astrophysics Data System (ADS)
Jiang, X.; Mahadevan, S.
2006-07-01
An important concern in the design of validation experiments is how to incorporate the mathematical model in the design in order to allow conclusive comparisons of model prediction with experimental output in model assessment. The classical experimental design methods are more suitable for phenomena discovery and may result in a subjective, expensive, time-consuming and ineffective design that may adversely impact these comparisons. In this paper, an integrated Bayesian cross-entropy methodology is proposed to perform the optimal design of validation experiments incorporating the computational model. The expected cross entropy, an information-theoretic distance between the distributions of model prediction and experimental observation, is defined as a utility function to measure the similarity of two distributions. A simulated annealing algorithm is used to find optimal values of input variables through minimizing or maximizing the expected cross entropy. The measured data after testing with the optimum input values are used to update the distribution of the experimental output using Bayes theorem. The procedure is repeated to adaptively design the required number of experiments for model assessment, each time ensuring that the experiment provides effective comparison for validation. The methodology is illustrated for the optimal design of validation experiments for a three-leg bolted joint structure and a composite helicopter rotor hub component.
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.
Caillet, Pascal; Klemm, Sarah; Ducher, Michel; Aussem, Alexandre; Schott, Anne-Marie
2015-01-01
Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach. EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences. Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density. Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Learning Instance-Specific Predictive Models
Visweswaran, Shyam; Cooper, Gregory F.
2013-01-01
This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
A new method for E-government procurement using collaborative filtering and Bayesian approach.
Zhang, Shuai; Xi, Chengyu; Wang, Yan; Zhang, Wenyu; Chen, Yanhong
2013-01-01
Nowadays, as the Internet services increase faster than ever before, government systems are reinvented as E-government services. Therefore, government procurement sectors have to face challenges brought by the explosion of service information. This paper presents a novel method for E-government procurement (eGP) to search for the optimal procurement scheme (OPS). Item-based collaborative filtering and Bayesian approach are used to evaluate and select the candidate services to get the top-M recommendations such that the involved computation load can be alleviated. A trapezoidal fuzzy number similarity algorithm is applied to support the item-based collaborative filtering and Bayesian approach, since some of the services' attributes can be hardly expressed as certain and static values but only be easily represented as fuzzy values. A prototype system is built and validated with an illustrative example from eGP to confirm the feasibility of our approach.
A New Method for E-Government Procurement Using Collaborative Filtering and Bayesian Approach
Wang, Yan
2013-01-01
Nowadays, as the Internet services increase faster than ever before, government systems are reinvented as E-government services. Therefore, government procurement sectors have to face challenges brought by the explosion of service information. This paper presents a novel method for E-government procurement (eGP) to search for the optimal procurement scheme (OPS). Item-based collaborative filtering and Bayesian approach are used to evaluate and select the candidate services to get the top-M recommendations such that the involved computation load can be alleviated. A trapezoidal fuzzy number similarity algorithm is applied to support the item-based collaborative filtering and Bayesian approach, since some of the services' attributes can be hardly expressed as certain and static values but only be easily represented as fuzzy values. A prototype system is built and validated with an illustrative example from eGP to confirm the feasibility of our approach. PMID:24385869
Applying Bayesian belief networks in rapid response situations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gibson, William L; Deborah, Leishman, A.; Van Eeckhout, Edward
2008-01-01
The authors have developed an enhanced Bayesian analysis tool called the Integrated Knowledge Engine (IKE) for monitoring and surveillance. The enhancements are suited for Rapid Response Situations where decisions must be made based on uncertain and incomplete evidence from many diverse and heterogeneous sources. The enhancements extend the probabilistic results of the traditional Bayesian analysis by (1) better quantifying uncertainty arising from model parameter uncertainty and uncertain evidence, (2) optimizing the collection of evidence to reach conclusions more quickly, and (3) allowing the analyst to determine the influence of the remaining evidence that cannot be obtained in the time allowed.more » These extended features give the analyst and decision maker a better comprehension of the adequacy of the acquired evidence and hence the quality of the hurried decisions. They also describe two example systems where the above features are highlighted.« less
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
2017-04-12
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Fonseca, Luiz Henrique M; Lohmann, Lúcia G
2018-06-01
Combining high-throughput sequencing data with amplicon sequences allows the reconstruction of robust phylogenies based on comprehensive sampling of characters and taxa. Here, we combine Next Generation Sequencing (NGS) and Sanger sequencing data to infer the phylogeny of the "Adenocalymma-Neojobertia" clade (Bignonieae, Bignoniaceae), a diverse lineage of Neotropical plants, using Maximum Likelihood and Bayesian approaches. We used NGS to obtain complete or nearly-complete plastomes of members of this clade, leading to a final dataset with 54 individuals, representing 44 members of ingroup and 10 outgroups. In addition, we obtained Sanger sequences of two plastid markers (ndhF and rpl32-trnL) for 44 individuals (43 ingroup and 1 outgroup) and the nuclear PepC for 64 individuals (63 ingroup and 1 outgroup). Our final dataset includes 87 individuals of members of the "Adenocalymma-Neojobertia" clade, representing 66 species (ca. 90% of the diversity), plus 11 outgroups. Plastid and nuclear datasets recovered congruent topologies and were combined. The combined analysis recovered a monophyletic "Adenocalymma-Neojobertia" clade and a paraphyletic Adenocalymma that also contained a monophyletic Neojobertia plus Pleonotoma albiflora. Relationships are strongly supported in all analyses, with most lineages within the "Adenocalymma-Neojobertia" clade receiving maximum posterior probabilities. Ancestral character state reconstructions using Bayesian approaches identified six morphological synapomorphies of clades namely, prophyll type, petiole and petiolule articulation, tendril ramification, inflorescence ramification, calyx shape, and fruit wings. Other characters such as habit, calyx cupular trichomes, corolla color, and corolla shape evolved multiple times. These characters are putatively related with the clade diversification and can be further explored in diversification studies. Copyright © 2018 Elsevier Inc. All rights reserved.
A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.
Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R
2004-11-22
We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.
Bayesian parameter estimation for the Wnt pathway: an infinite mixture models approach.
Koutroumpas, Konstantinos; Ballarini, Paolo; Votsi, Irene; Cournède, Paul-Henry
2016-09-01
Likelihood-free methods, like Approximate Bayesian Computation (ABC), have been extensively used in model-based statistical inference with intractable likelihood functions. When combined with Sequential Monte Carlo (SMC) algorithms they constitute a powerful approach for parameter estimation and model selection of mathematical models of complex biological systems. A crucial step in the ABC-SMC algorithms, significantly affecting their performance, is the propagation of a set of parameter vectors through a sequence of intermediate distributions using Markov kernels. In this article, we employ Dirichlet process mixtures (DPMs) to design optimal transition kernels and we present an ABC-SMC algorithm with DPM kernels. We illustrate the use of the proposed methodology using real data for the canonical Wnt signaling pathway. A multi-compartment model of the pathway is developed and it is compared to an existing model. The results indicate that DPMs are more efficient in the exploration of the parameter space and can significantly improve ABC-SMC performance. In comparison to alternative sampling schemes that are commonly used, the proposed approach can bring potential benefits in the estimation of complex multimodal distributions. The method is used to estimate the parameters and the initial state of two models of the Wnt pathway and it is shown that the multi-compartment model fits better the experimental data. Python scripts for the Dirichlet Process Gaussian Mixture model and the Gibbs sampler are available at https://sites.google.com/site/kkoutroumpas/software konstantinos.koutroumpas@ecp.fr. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Eckstein, Miguel P.; Abbey, Craig K.; Pham, Binh T.; Shimozaki, Steven S.
2004-01-01
Human performance in visual detection, discrimination, identification, and search tasks typically improves with practice. Psychophysical studies suggest that perceptual learning is mediated by an enhancement in the coding of the signal, and physiological studies suggest that it might be related to the plasticity in the weighting or selection of sensory units coding task relevant information (learning through attention optimization). We propose an experimental paradigm (optimal perceptual learning paradigm) to systematically study the dynamics of perceptual learning in humans by allowing comparisons to that of an optimal Bayesian algorithm and a number of suboptimal learning models. We measured improvement in human localization (eight-alternative forced-choice with feedback) performance of a target randomly sampled from four elongated Gaussian targets with different orientations and polarities and kept as a target for a block of four trials. The results suggest that the human perceptual learning can occur within a lapse of four trials (<1 min) but that human learning is slower and incomplete with respect to the optimal algorithm (23.3% reduction in human efficiency from the 1st-to-4th learning trials). The greatest improvement in human performance, occurring from the 1st-to-2nd learning trial, was also present in the optimal observer, and, thus reflects a property inherent to the visual task and not a property particular to the human perceptual learning mechanism. One notable source of human inefficiency is that, unlike the ideal observer, human learning relies more heavily on previous decisions than on the provided feedback, resulting in no human learning on trials following a previous incorrect localization decision. Finally, the proposed theory and paradigm provide a flexible framework for future studies to evaluate the optimality of human learning of other visual cues and/or sensory modalities.
Sparse-grid, reduced-basis Bayesian inversion: Nonaffine-parametric nonlinear equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Peng, E-mail: peng@ices.utexas.edu; Schwab, Christoph, E-mail: christoph.schwab@sam.math.ethz.ch
2016-07-01
We extend the reduced basis (RB) accelerated Bayesian inversion methods for affine-parametric, linear operator equations which are considered in [16,17] to non-affine, nonlinear parametric operator equations. We generalize the analysis of sparsity of parametric forward solution maps in [20] and of Bayesian inversion in [48,49] to the fully discrete setting, including Petrov–Galerkin high-fidelity (“HiFi”) discretization of the forward maps. We develop adaptive, stochastic collocation based reduction methods for the efficient computation of reduced bases on the parametric solution manifold. The nonaffinity and nonlinearity with respect to (w.r.t.) the distributed, uncertain parameters and the unknown solution is collocated; specifically, by themore » so-called Empirical Interpolation Method (EIM). For the corresponding Bayesian inversion problems, computational efficiency is enhanced in two ways: first, expectations w.r.t. the posterior are computed by adaptive quadratures with dimension-independent convergence rates proposed in [49]; the present work generalizes [49] to account for the impact of the PG discretization in the forward maps on the convergence rates of the Quantities of Interest (QoI for short). Second, we propose to perform the Bayesian estimation only w.r.t. a parsimonious, RB approximation of the posterior density. Based on the approximation results in [49], the infinite-dimensional parametric, deterministic forward map and operator admit N-term RB and EIM approximations which converge at rates which depend only on the sparsity of the parametric forward map. In several numerical experiments, the proposed algorithms exhibit dimension-independent convergence rates which equal, at least, the currently known rate estimates for N-term approximation. We propose to accelerate Bayesian estimation by first offline construction of reduced basis surrogates of the Bayesian posterior density. The parsimonious surrogates can then be employed for online data assimilation and for Bayesian estimation. They also open a perspective for optimal experimental design.« less
A new software for deformation source optimization, the Bayesian Earthquake Analysis Tool (BEAT)
NASA Astrophysics Data System (ADS)
Vasyura-Bathke, H.; Dutta, R.; Jonsson, S.; Mai, P. M.
2017-12-01
Modern studies of crustal deformation and the related source estimation, including magmatic and tectonic sources, increasingly use non-linear optimization strategies to estimate geometric and/or kinematic source parameters and often consider both jointly, geodetic and seismic data. Bayesian inference is increasingly being used for estimating posterior distributions of deformation source model parameters, given measured/estimated/assumed data and model uncertainties. For instance, some studies consider uncertainties of a layered medium and propagate these into source parameter uncertainties, while others use informative priors to reduce the model parameter space. In addition, innovative sampling algorithms have been developed to efficiently explore the high-dimensional parameter spaces. Compared to earlier studies, these improvements have resulted in overall more robust source model parameter estimates that include uncertainties. However, the computational burden of these methods is high and estimation codes are rarely made available along with the published results. Even if the codes are accessible, it is usually challenging to assemble them into a single optimization framework as they are typically coded in different programing languages. Therefore, further progress and future applications of these methods/codes are hampered, while reproducibility and validation of results has become essentially impossible. In the spirit of providing open-access and modular codes to facilitate progress and reproducible research in deformation source estimations, we undertook the effort of developing BEAT, a python package that comprises all the above-mentioned features in one single programing environment. The package builds on the pyrocko seismological toolbox (www.pyrocko.org), and uses the pymc3 module for Bayesian statistical model fitting. BEAT is an open-source package (https://github.com/hvasbath/beat), and we encourage and solicit contributions to the project. Here, we present our strategy for developing BEAT and show application examples; especially the effect of including the model prediction uncertainty of the velocity model in following source optimizations: full moment tensor, Mogi source, moderate strike-slip earth-quake.
Blacksell, Stuart D.; Tanganuchitcharnchai, Ampai; Jintaworn, Suthatip; Kantipong, Pacharee; Richards, Allen L.; Day, Nicholas P. J.
2016-01-01
The enzyme-linked immunosorbent assay (ELISA) has been proposed as an alternative serologic diagnostic test to the indirect immunofluorescence assay (IFA) for scrub typhus. Here, we systematically determine the optimal sample dilution and cutoff optical density (OD) and estimate the accuracy of IgM ELISA using Bayesian latent class models (LCMs). Data from 135 patients with undifferentiated fever were reevaluated using Bayesian LCMs. Every patient was evaluated for the presence of an eschar and tested with a blood culture for Orientia tsutsugamushi, three different PCR assays, and an IgM IFA. The IgM ELISA was performed for every sample at sample dilutions from 1:100 to 1:102,400 using crude whole-cell antigens of the Karp, Kato, and Gilliam strains of O. tsutsugamushi developed by the Naval Medical Research Center. We used Bayesian LCMs to generate unbiased receiver operating characteristic curves and found that the sample dilution of 1:400 was optimal for the IgM ELISA. With the optimal cutoff OD of 1.474 at a sample dilution of 1:400, the IgM ELISA had a sensitivity of 85.7% (95% credible interval [CrI], 77.4% to 86.7%) and a specificity of 98.1% (95% CrI, 97.2% to 100%) using paired samples. For the ELISA, the OD could be determined objectively and quickly, in contrast to the reading of IFA slides, which was both subjective and labor-intensive. The IgM ELISA for scrub typhus has high diagnostic accuracy and is less subjective than the IgM IFA. We suggest that the IgM ELISA may be used as an alternative reference test to the IgM IFA for the serological diagnosis of scrub typhus. PMID:27008880
TOWARD A MOLECULAR PHYLOGENY FOR PEROMYSCUS: EVIDENCE FROM MITOCHONDRIAL CYTOCHROME-b SEQUENCES
Bradley, Robert D.; Durish, Nevin D.; Rogers, Duke S.; Miller, Jacqueline R.; Engstrom, Mark D.; Kilpatrick, C. William
2009-01-01
One hundred DNA sequences from the mitochondrial cytochrome-b gene of 44 species of deer mice (Peromyscus (sensu stricto), 1 of Habromys, 1 of Isthmomys, 2 of Megadontomys, and the monotypic genera Neotomodon, Osgoodomys, and Podomys were used to develop a molecular phylogeny for Peromyscus. Phylogenetic analyses (maximum parsimony, maximum likelihood, and Bayesian inference) were conducted to evaluate alternative hypotheses concerning taxonomic arrangements (sensu stricto versus sensu lato) of the genus. In all analyses, monophyletic clades were obtained that corresponded to species groups proposed by previous authors; however, relationships among species groups generally were poorly resolved. The concept of the genus Peromyscus based on molecular data differed significantly from the most current taxonomic arrangement. Maximum-likelihood and Bayesian trees depicted strong support for a clade placing Habromys, Megadontomys, Neotomodon, Osgoodomys, and Podomys within Peromyscus. If Habromys, Megadontomys, Neotomodon, Osgoodomys, and Podomys are regarded as genera, then several species groups within Peromyscus (sensu stricto) should be elevated to generic rank. Isthmomys was associated with the genus Reithrodontomys; in turn this clade was sister to Baiomys, indicating a distant relationship of Isthmomys to Peromyscus. A formal taxonomic revision awaits synthesis of additional sequence data from nuclear markers together with inclusion of available allozymic and karyotypic data. PMID:19924266
SWARM : a scientific workflow for supporting Bayesian approaches to improve metabolic models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, X.; Stevens, R.; Mathematics and Computer Science
2008-01-01
With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM dealsmore » with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.« less
Bayesian mixture analysis for metagenomic community profiling.
Morfopoulou, Sofia; Plagnol, Vincent
2015-09-15
Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix sofia.morfopoulou.10@ucl.ac.uk Supplementary data are available at Bionformatics online. © The Author 2015. Published by Oxford University Press.
Kokaram, Anil C
2004-03-01
Image sequence restoration has been steadily gaining in importance with the increasing prevalence of visual digital media. The demand for content increases the pressure on archives to automate their restoration activities for preservation of the cultural heritage that they hold. There are many defects that affect archived visual material and one central issue is that of Dirt and Sparkle, or "Blotches." Research in archive restoration has been conducted for more than a decade and this paper places that material in context to highlight the advances made during that time. The paper also presents a new and simpler Bayesian framework that achieves joint processing of noise, missing data, and occlusion.
Meinzer, Caitlyn; Martin, Renee; Suarez, Jose I
2017-09-08
In phase II trials, the most efficacious dose is usually not known. Moreover, given limited resources, it is difficult to robustly identify a dose while also testing for a signal of efficacy that would support a phase III trial. Recent designs have sought to be more efficient by exploring multiple doses through the use of adaptive strategies. However, the added flexibility may potentially increase the risk of making incorrect assumptions and reduce the total amount of information available across the dose range as a function of imbalanced sample size. To balance these challenges, a novel placebo-controlled design is presented in which a restricted Bayesian response adaptive randomization (RAR) is used to allocate a majority of subjects to the optimal dose of active drug, defined as the dose with the lowest probability of poor outcome. However, the allocation between subjects who receive active drug or placebo is held constant to retain the maximum possible power for a hypothesis test of overall efficacy comparing the optimal dose to placebo. The design properties and optimization of the design are presented in the context of a phase II trial for subarachnoid hemorrhage. For a fixed total sample size, a trade-off exists between the ability to select the optimal dose and the probability of rejecting the null hypothesis. This relationship is modified by the allocation ratio between active and control subjects, the choice of RAR algorithm, and the number of subjects allocated to an initial fixed allocation period. While a responsive RAR algorithm improves the ability to select the correct dose, there is an increased risk of assigning more subjects to a worse arm as a function of ephemeral trends in the data. A subarachnoid treatment trial is used to illustrate how this design can be customized for specific objectives and available data. Bayesian adaptive designs are a flexible approach to addressing multiple questions surrounding the optimal dose for treatment efficacy within the context of limited resources. While the design is general enough to apply to many situations, future work is needed to address interim analyses and the incorporation of models for dose response.
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks
NASA Astrophysics Data System (ADS)
Sun, Wei; Chang, K. C.
2005-05-01
Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
An assessment of Gallistel's (2012) rationalistic account of extinction phenomena.
Miller, Ralph R
2012-05-01
Gallistel (2012) asserts that animals use rationalistic reasoning (i.e., information theory and Bayesian inference) to make decisions that underlie select extinction phenomena. Rational processes are presumed to lead to evolutionarily optimal behavior. Thus, Gallistel's model is a type of optimality theory. But optimality theory is only a theory, a theory about an ideal organism, and its predictions frequently deviate appreciably from observed behavior of animals in the laboratory and the real world. That is, behavior of animals is often far from optimal, as is evident in many behavioral phenomena. Hence, appeals to optimality theory to explain, rather than illuminate, actual behavior are misguided. Copyright © 2012 Elsevier B.V. All rights reserved.
2014-01-01
Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package. PMID:24517713
Local backbone structure prediction of proteins
De Brevern, Alexandre G.; Benros, Cristina; Gautier, Romain; Valadié, Hélène; Hazout, Serge; Etchebest, Catherine
2004-01-01
Summary A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (φ, Ψ) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288
Online Variational Bayesian Filtering-Based Mobile Target Tracking in Wireless Sensor Networks
Zhou, Bingpeng; Chen, Qingchun; Li, Tiffany Jing; Xiao, Pei
2014-01-01
The received signal strength (RSS)-based online tracking for a mobile node in wireless sensor networks (WSNs) is investigated in this paper. Firstly, a multi-layer dynamic Bayesian network (MDBN) is introduced to characterize the target mobility with either directional or undirected movement. In particular, it is proposed to employ the Wishart distribution to approximate the time-varying RSS measurement precision's randomness due to the target movement. It is shown that the proposed MDBN offers a more general analysis model via incorporating the underlying statistical information of both the target movement and observations, which can be utilized to improve the online tracking capability by exploiting the Bayesian statistics. Secondly, based on the MDBN model, a mean-field variational Bayesian filtering (VBF) algorithm is developed to realize the online tracking of a mobile target in the presence of nonlinear observations and time-varying RSS precision, wherein the traditional Bayesian filtering scheme cannot be directly employed. Thirdly, a joint optimization between the real-time velocity and its prior expectation is proposed to enable online velocity tracking in the proposed online tacking scheme. Finally, the associated Bayesian Cramer–Rao Lower Bound (BCRLB) analysis and numerical simulations are conducted. Our analysis unveils that, by exploiting the potential state information via the general MDBN model, the proposed VBF algorithm provides a promising solution to the online tracking of a mobile node in WSNs. In addition, it is shown that the final tracking accuracy linearly scales with its expectation when the RSS measurement precision is time-varying. PMID:25393784
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Zhao, Wei; Cella, Massimo; Della Pasqua, Oscar; Burger, David; Jacqz-Aigrain, Evelyne
2012-04-01
Abacavir is used to treat HIV infection in both adults and children. The recommended paediatric dose is 8 mg kg(-1) twice daily up to a maximum of 300 mg twice daily. Weight was identified as the central covariate influencing pharmacokinetics of abacavir in children. A population pharmacokinetic model was developed to describe both once and twice daily pharmacokinetic profiles of abacavir in infants and toddlers. Standard dosage regimen is associated with large interindividual variability in abacavir concentrations. A maximum a posteriori probability Bayesian estimator of AUC(0-) (t) based on three time points (0, 1 or 2, and 3 h) is proposed to support area under the concentration-time curve (AUC) targeted individualized therapy in infants and toddlers. To develop a population pharmacokinetic model for abacavir in HIV-infected infants and toddlers, which will be used to describe both once and twice daily pharmacokinetic profiles, identify covariates that explain variability and propose optimal time points to optimize the area under the concentration-time curve (AUC) targeted dosage and individualize therapy. The pharmacokinetics of abacavir was described with plasma concentrations from 23 patients using nonlinear mixed-effects modelling (NONMEM) software. A two-compartment model with first-order absorption and elimination was developed. The final model was validated using bootstrap, visual predictive check and normalized prediction distribution errors. The Bayesian estimator was validated using the cross-validation and simulation-estimation method. The typical population pharmacokinetic parameters and relative standard errors (RSE) were apparent systemic clearance (CL) 13.4 () h−1 (RSE 6.3%), apparent central volume of distribution 4.94 () (RSE 28.7%), apparent peripheral volume of distribution 8.12 () (RSE14.2%), apparent intercompartment clearance 1.25 () h−1 (RSE 16.9%) and absorption rate constant 0.758 h−1 (RSE 5.8%). The covariate analysis identified weight as the individual factor influencing the apparent oral clearance: CL = 13.4 × (weight/12)1.14. The maximum a posteriori probability Bayesian estimator, based on three concentrations measured at 0, 1 or 2, and 3 h after drug intake allowed predicting individual AUC0–t. The population pharmacokinetic model developed for abacavir in HIV-infected infants and toddlers accurately described both once and twice daily pharmacokinetic profiles. The maximum a posteriori probability Bayesian estimator of AUC(0-) (t) was developed from the final model and can be used routinely to optimize individual dosing. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
Optimal visuotactile integration for velocity discrimination of self-hand movements
Chancel, M.; Blanchard, C.; Guerraz, M.; Montagnini, A.
2016-01-01
Illusory hand movements can be elicited by a textured disk or a visual pattern rotating under one's hand, while proprioceptive inputs convey immobility information (Blanchard C, Roll R, Roll JP, Kavounoudias A. PLoS One 8: e62475, 2013). Here, we investigated whether visuotactile integration can optimize velocity discrimination of illusory hand movements in line with Bayesian predictions. We induced illusory movements in 15 volunteers by visual and/or tactile stimulation delivered at six angular velocities. Participants had to compare hand illusion velocities with a 5°/s hand reference movement in an alternative forced choice paradigm. Results showed that the discrimination threshold decreased in the visuotactile condition compared with unimodal (visual or tactile) conditions, reflecting better bimodal discrimination. The perceptual strength (gain) of the illusions also increased: the stimulation required to give rise to a 5°/s illusory movement was slower in the visuotactile condition compared with each of the two unimodal conditions. The maximum likelihood estimation model satisfactorily predicted the improved discrimination threshold but not the increase in gain. When we added a zero-centered prior, reflecting immobility information, the Bayesian model did actually predict the gain increase but systematically overestimated it. Interestingly, the predicted gains better fit the visuotactile performances when a proprioceptive noise was generated by covibrating antagonist wrist muscles. These findings show that kinesthetic information of visual and tactile origins is optimally integrated to improve velocity discrimination of self-hand movements. However, a Bayesian model alone could not fully describe the illusory phenomenon pointing to the crucial importance of the omnipresent muscle proprioceptive cues with respect to other sensory cues for kinesthesia. PMID:27385802
Variational Gaussian approximation for Poisson data
NASA Astrophysics Data System (ADS)
Arridge, Simon R.; Ito, Kazufumi; Jin, Bangti; Zhang, Chen
2018-02-01
The Poisson model is frequently employed to describe count data, but in a Bayesian context it leads to an analytically intractable posterior probability distribution. In this work, we analyze a variational Gaussian approximation to the posterior distribution arising from the Poisson model with a Gaussian prior. This is achieved by seeking an optimal Gaussian distribution minimizing the Kullback-Leibler divergence from the posterior distribution to the approximation, or equivalently maximizing the lower bound for the model evidence. We derive an explicit expression for the lower bound, and show the existence and uniqueness of the optimal Gaussian approximation. The lower bound functional can be viewed as a variant of classical Tikhonov regularization that penalizes also the covariance. Then we develop an efficient alternating direction maximization algorithm for solving the optimization problem, and analyze its convergence. We discuss strategies for reducing the computational complexity via low rank structure of the forward operator and the sparsity of the covariance. Further, as an application of the lower bound, we discuss hierarchical Bayesian modeling for selecting the hyperparameter in the prior distribution, and propose a monotonically convergent algorithm for determining the hyperparameter. We present extensive numerical experiments to illustrate the Gaussian approximation and the algorithms.
Moran, Rosalyn J; Symmonds, Mkael; Dolan, Raymond J; Friston, Karl J
2014-01-01
The aging brain shows a progressive loss of neuropil, which is accompanied by subtle changes in neuronal plasticity, sensory learning and memory. Neurophysiologically, aging attenuates evoked responses--including the mismatch negativity (MMN). This is accompanied by a shift in cortical responsivity from sensory (posterior) regions to executive (anterior) regions, which has been interpreted as a compensatory response for cognitive decline. Theoretical neurobiology offers a simpler explanation for all of these effects--from a Bayesian perspective, as the brain is progressively optimized to model its world, its complexity will decrease. A corollary of this complexity reduction is an attenuation of Bayesian updating or sensory learning. Here we confirmed this hypothesis using magnetoencephalographic recordings of the mismatch negativity elicited in a large cohort of human subjects, in their third to ninth decade. Employing dynamic causal modeling to assay the synaptic mechanisms underlying these non-invasive recordings, we found a selective age-related attenuation of synaptic connectivity changes that underpin rapid sensory learning. In contrast, baseline synaptic connectivity strengths were consistently strong over the decades. Our findings suggest that the lifetime accrual of sensory experience optimizes functional brain architectures to enable efficient and generalizable predictions of the world.
The relationships of the Euparkeriidae and the rise of Archosauria
NASA Astrophysics Data System (ADS)
Sookias, Roland B.
2016-03-01
For the first time, a phylogenetic analysis including all putative euparkeriid taxa is conducted, using a large data matrix analysed with maximum parsimony and Bayesian analysis. Using parsimony, the putative euparkeriid Dorosuchus neoetus from Russia is the sister taxon to Archosauria + Phytosauria. Euparkeria capensis is placed one node further from the crown, and forms a euparkeriid clade with the Chinese taxa Halazhaisuchus qiaoensis and `Turfanosuchus shageduensis' and the Polish taxon Osmolskina czatkowicensis. Using Bayesian methods, Osmolskina and Halazhaisuchus are sister taxa within Euparkeriidae, in turn sister to `Turfanosuchus shageduensis' and then Euparkeria capensis. Dorosuchus is placed in a polytomy with Euparkeriidae and Archosauria + Phytosauria. Although conclusions remain tentative owing to low node support and incompleteness, a broad phylogenetic position close to the base of Archosauria is confirmed for all putative euparkeriids, and the ancestor of Archosauria +Phytosauria is optimized as similar to euparkeriids in its morphology. Ecomorphological characters and traits are optimized onto the maximum parsimony strict consensus phylogeny presented using squared change parsimony. This optimization indicates that the ancestral archosaur was probably similar in many respects to euparkeriids, being relatively small, terrestrial, carnivorous and showing relatively cursorial limb morphology; this Bauplan may have underlain the exceptional radiaton and success of crown Archosauria.
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Confidence as Bayesian Probability: From Neural Origins to Behavior.
Meyniel, Florent; Sigman, Mariano; Mainen, Zachary F
2015-10-07
Research on confidence spreads across several sub-fields of psychology and neuroscience. Here, we explore how a definition of confidence as Bayesian probability can unify these viewpoints. This computational view entails that there are distinct forms in which confidence is represented and used in the brain, including distributional confidence, pertaining to neural representations of probability distributions, and summary confidence, pertaining to scalar summaries of those distributions. Summary confidence is, normatively, derived or "read out" from distributional confidence. Neural implementations of readout will trade off optimality versus flexibility of routing across brain systems, allowing confidence to serve diverse cognitive functions. Copyright © 2015 Elsevier Inc. All rights reserved.
Bayesian identification of acoustic impedance in treated ducts.
Buot de l'Épine, Y; Chazot, J-D; Ville, J-M
2015-07-01
The noise reduction of a liner placed in the nacelle of a turbofan engine is still difficult to predict due to the lack of knowledge of its acoustic impedance that depends on grazing flow profile, mode order, and sound pressure level. An eduction method, based on a Bayesian approach, is presented here to adjust an impedance model of the liner from sound pressures measured in a rectangular treated duct under multimodal propagation and flow. The cost function is regularized with prior information provided by Guess's [J. Sound Vib. 40, 119-137 (1975)] impedance of a perforated plate. The multi-parameter optimization is achieved with an Evolutionary-Markov-Chain-Monte-Carlo algorithm.
IMAGINE: Interstellar MAGnetic field INference Engine
NASA Astrophysics Data System (ADS)
Steininger, Theo
2018-03-01
IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
A Bayesian Account of Visual-Vestibular Interactions in the Rod-and-Frame Task.
Alberts, Bart B G T; de Brouwer, Anouk J; Selen, Luc P J; Medendorp, W Pieter
2016-01-01
Panoramic visual cues, as generated by the objects in the environment, provide the brain with important information about gravity direction. To derive an optimal, i.e., Bayesian, estimate of gravity direction, the brain must combine panoramic information with gravity information detected by the vestibular system. Here, we examined the individual sensory contributions to this estimate psychometrically. We asked human subjects to judge the orientation (clockwise or counterclockwise relative to gravity) of a briefly flashed luminous rod, presented within an oriented square frame (rod-in-frame). Vestibular contributions were manipulated by tilting the subject's head, whereas visual contributions were manipulated by changing the viewing distance of the rod and frame. Results show a cyclical modulation of the frame-induced bias in perceived verticality across a 90° range of frame orientations. The magnitude of this bias decreased significantly with larger viewing distance, as if visual reliability was reduced. Biases increased significantly when the head was tilted, as if vestibular reliability was reduced. A Bayesian optimal integration model, with distinct vertical and horizontal panoramic weights, a gain factor to allow for visual reliability changes, and ocular counterroll in response to head tilt, provided a good fit to the data. We conclude that subjects flexibly weigh visual panoramic and vestibular information based on their orientation-dependent reliability, resulting in the observed verticality biases and the associated response variabilities.
Ma, Wei Ji; Zhou, Xiang; Ross, Lars A; Foxe, John J; Parra, Lucas C
2009-01-01
Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.
A Bayesian Account of Visual–Vestibular Interactions in the Rod-and-Frame Task
de Brouwer, Anouk J.; Medendorp, W. Pieter
2016-01-01
Abstract Panoramic visual cues, as generated by the objects in the environment, provide the brain with important information about gravity direction. To derive an optimal, i.e., Bayesian, estimate of gravity direction, the brain must combine panoramic information with gravity information detected by the vestibular system. Here, we examined the individual sensory contributions to this estimate psychometrically. We asked human subjects to judge the orientation (clockwise or counterclockwise relative to gravity) of a briefly flashed luminous rod, presented within an oriented square frame (rod-in-frame). Vestibular contributions were manipulated by tilting the subject’s head, whereas visual contributions were manipulated by changing the viewing distance of the rod and frame. Results show a cyclical modulation of the frame-induced bias in perceived verticality across a 90° range of frame orientations. The magnitude of this bias decreased significantly with larger viewing distance, as if visual reliability was reduced. Biases increased significantly when the head was tilted, as if vestibular reliability was reduced. A Bayesian optimal integration model, with distinct vertical and horizontal panoramic weights, a gain factor to allow for visual reliability changes, and ocular counterroll in response to head tilt, provided a good fit to the data. We conclude that subjects flexibly weigh visual panoramic and vestibular information based on their orientation-dependent reliability, resulting in the observed verticality biases and the associated response variabilities. PMID:27844055
NASA Astrophysics Data System (ADS)
Sun, Weiwei; Ma, Jun; Yang, Gang; Du, Bo; Zhang, Liangpei
2017-06-01
A new Bayesian method named Poisson Nonnegative Matrix Factorization with Parameter Subspace Clustering Constraint (PNMF-PSCC) has been presented to extract endmembers from Hyperspectral Imagery (HSI). First, the method integrates the liner spectral mixture model with the Bayesian framework and it formulates endmember extraction into a Bayesian inference problem. Second, the Parameter Subspace Clustering Constraint (PSCC) is incorporated into the statistical program to consider the clustering of all pixels in the parameter subspace. The PSCC could enlarge differences among ground objects and helps finding endmembers with smaller spectrum divergences. Meanwhile, the PNMF-PSCC method utilizes the Poisson distribution as the prior knowledge of spectral signals to better explain the quantum nature of light in imaging spectrometer. Third, the optimization problem of PNMF-PSCC is formulated into maximizing the joint density via the Maximum A Posterior (MAP) estimator. The program is finally solved by iteratively optimizing two sub-problems via the Alternating Direction Method of Multipliers (ADMM) framework and the FURTHESTSUM initialization scheme. Five state-of-the art methods are implemented to make comparisons with the performance of PNMF-PSCC on both the synthetic and real HSI datasets. Experimental results show that the PNMF-PSCC outperforms all the five methods in Spectral Angle Distance (SAD) and Root-Mean-Square-Error (RMSE), and especially it could identify good endmembers for ground objects with smaller spectrum divergences.
Evolution of the cerebellum as a neuronal machine for Bayesian state estimation
NASA Astrophysics Data System (ADS)
Paulin, M. G.
2005-09-01
The cerebellum evolved in association with the electric sense and vestibular sense of the earliest vertebrates. Accurate information provided by these sensory systems would have been essential for precise control of orienting behavior in predation. A simple model shows that individual spikes in electrosensory primary afferent neurons can be interpreted as measurements of prey location. Using this result, I construct a computational neural model in which the spatial distribution of spikes in a secondary electrosensory map forms a Monte Carlo approximation to the Bayesian posterior distribution of prey locations given the sense data. The neural circuit that emerges naturally to perform this task resembles the cerebellar-like hindbrain electrosensory filtering circuitry of sharks and other electrosensory vertebrates. The optimal filtering mechanism can be extended to handle dynamical targets observed from a dynamical platform; that is, to construct an optimal dynamical state estimator using spiking neurons. This may provide a generic model of cerebellar computation. Vertebrate motion-sensing neurons have specific fractional-order dynamical characteristics that allow Bayesian state estimators to be implemented elegantly and efficiently, using simple operations with asynchronous pulses, i.e. spikes. The computational neural models described in this paper represent a novel kind of particle filter, using spikes as particles. The models are specific and make testable predictions about computational mechanisms in cerebellar circuitry, while providing a plausible explanation of cerebellar contributions to aspects of motor control, perception and cognition.
Molecular epidemiology of Powassan virus in North America.
Pesko, Kendra N; Torres-Perez, Fernando; Hjelle, Brian L; Ebel, Gregory D
2010-11-01
Powassan virus (POW) is a tick-borne flavivirus distributed in Canada, the northern USA and the Primorsky region of Russia. POW is the only tick-borne flavivirus endemic to the western hemisphere, where it is transmitted mainly between Ixodes cookei and groundhogs (Marmota monax). Deer tick virus (DTV), a genotype of POW that has been frequently isolated from deer ticks (Ixodes scapularis), appears to be maintained in an enzootic cycle between these ticks and white-footed mice (Peromyscus leucopus). DTV has been isolated from ticks in several regions of North America, including the upper Midwest and the eastern seaboard. The incidence of human disease due to POW is apparently increasing. Previous analysis of tick-borne flaviviruses endemic to North America have been limited to relatively short genome fragments. We therefore assessed the evolutionary dynamics of POW using newly generated complete and partial genome sequences. Maximum-likelihood and Bayesian phylogenetic inferences showed two well-supported, reciprocally monophyletic lineages corresponding to POW and DTV. Bayesian skyline plots based on year-of-sampling data indicated no significant population size change for either virus lineage. Statistical model-based selection analyses showed evidence of purifying selection in both lineages. Positive selection was detected in NS-5 sequences for both lineages and envelope sequences for POW. Our findings confirm that POW and DTV sequences are relatively stable over time, which suggests strong evolutionary constraint, and support field observations that suggest that tick-borne flavivirus populations are extremely stable in enzootic foci.
Bulashevska, Alla; Eils, Roland
2006-06-14
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
Modeling human decision making behavior in supervisory control
NASA Technical Reports Server (NTRS)
Tulga, M. K.; Sheridan, T. B.
1977-01-01
An optimal decision control model was developed, which is based primarily on a dynamic programming algorithm which looks at all the available task possibilities, charts an optimal trajectory, and commits itself to do the first step (i.e., follow the optimal trajectory during the next time period), and then iterates the calculation. A Bayesian estimator was included which estimates the tasks which might occur in the immediate future and provides this information to the dynamic programming routine. Preliminary trials comparing the human subject's performance to that of the optimal model show a great similarity, but indicate that the human skips certain movements which require quick change in strategy.
NASA Astrophysics Data System (ADS)
Renes, Joseph M.
2017-10-01
We extend the recent bounds of Sason and Verdú relating Rényi entropy and Bayesian hypothesis testing (arXiv:1701.01974.) to the quantum domain and show that they have a number of different applications. First, we obtain a sharper bound relating the optimal probability of correctly distinguishing elements of an ensemble of states to that of the pretty good measurement, and an analogous bound for optimal and pretty good entanglement recovery. Second, we obtain bounds relating optimal guessing and entanglement recovery to the fidelity of the state with a product state, which then leads to tight tripartite uncertainty and monogamy relations.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Changsen; Liu, Feixiang
2017-02-15
Common spatial pattern (CSP) is most widely used in motor imagery based brain-computer interface (BCI) systems. In conventional CSP algorithm, pairs of the eigenvectors corresponding to both extreme eigenvalues are selected to construct the optimal spatial filter. In addition, an appropriate selection of subject-specific time segments and frequency bands plays an important role in its successful application. This study proposes to optimize spatial-frequency-temporal patterns for discriminative feature extraction. Spatial optimization is implemented by channel selection and finding discriminative spatial filters adaptively on each time-frequency segment. A novel Discernibility of Feature Sets (DFS) criteria is designed for spatial filter optimization. Besides, discriminative features located in multiple time-frequency segments are selected automatically by the proposed sparse time-frequency segment common spatial pattern (STFSCSP) method which exploits sparse regression for significant features selection. Finally, a weight determined by the sparse coefficient is assigned for each selected CSP feature and we propose a Weighted Naïve Bayesian Classifier (WNBC) for classification. Experimental results on two public EEG datasets demonstrate that optimizing spatial-frequency-temporal patterns in a data-driven manner for discriminative feature extraction greatly improves the classification performance. The proposed method gives significantly better classification accuracies in comparison with several competing methods in the literature. The proposed approach is a promising candidate for future BCI systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Chaillon, Antoine; Nakazawa, Masato; Wertheim, Joel O; Little, Susan J; Smith, Davey M; Mehta, Sanjay R; Gianella, Sara
2017-11-01
During primary HIV infection, the presence of minority drug resistance mutations (DRM) may be a consequence of sexual transmission, de novo mutations, or technical errors in identification. Baseline blood samples were collected from 24 HIV-infected antiretroviral-naive, genetically and epidemiologically linked source and recipient partners shortly after the recipient's estimated date of infection. An additional 32 longitudinal samples were available from 11 recipients. Deep sequencing of HIV reverse transcriptase (RT) was performed (Roche/454), and the sequences were screened for nucleoside and nonnucleoside RT inhibitor DRM. The likelihood of sexual transmission and persistence of DRM was assessed using Bayesian-based statistical modeling. While the majority of DRM (>20%) were consistently transmitted from source to recipient, the probability of detecting a minority DRM in the recipient was not increased when the same minority DRM was detected in the source (Bayes factor [BF] = 6.37). Longitudinal analyses revealed an exponential decay of DRM (BF = 0.05) while genetic diversity increased. Our analysis revealed no substantial evidence for sexual transmission of minority DRM (BF = 0.02). The presence of minority DRM during early infection, followed by a rapid decay, is consistent with the "mutation-selection balance" hypothesis, in which deleterious mutations are more efficiently purged later during HIV infection when the larger effective population size allows more efficient selection. Future studies using more recent sequencing technologies that are less prone to single-base errors should confirm these results by applying a similar Bayesian framework in other clinical settings. IMPORTANCE The advent of sensitive sequencing platforms has led to an increased identification of minority drug resistance mutations (DRM), including among antiretroviral therapy-naive HIV-infected individuals. While transmission of DRM may impact future therapy options for newly infected individuals, the clinical significance of the detection of minority DRM remains controversial. In the present study, we applied deep-sequencing techniques within a Bayesian hierarchical framework to a cohort of 24 transmission pairs to investigate whether minority DRM detected shortly after transmission were the consequence of (i) sexual transmission from the source, (ii) de novo emergence shortly after infection followed by viral selection and evolution, or (iii) technical errors/limitations of deep-sequencing methods. We found no clear evidence to support the sexual transmission of minority resistant variants, and our results suggested that minor resistant variants may emerge de novo shortly after transmission, when the small effective population size limits efficient purge by natural selection. Copyright © 2017 American Society for Microbiology.
Fuster-Parra, P; García-Mas, A; Ponseti, F J; Leo, F M
2015-04-01
The purpose of this paper was to discover the relationships among 22 relevant psychological features in semi-professional football players in order to study team's performance and collective efficacy via a Bayesian network (BN). The paper includes optimization of team's performance and collective efficacy using intercausal reasoning pattern which constitutes a very common pattern in human reasoning. The BN is used to make inferences regarding our problem, and therefore we obtain some conclusions; among them: maximizing the team's performance causes a decrease in collective efficacy and when team's performance achieves the minimum value it causes an increase in moderate/high values of collective efficacy. Similarly, we may reason optimizing team collective efficacy instead. It also allows us to determine the features that have the strongest influence on performance and which on collective efficacy. From the BN two different coaching styles were differentiated taking into account the local Markov property: training leadership and autocratic leadership. Copyright © 2014 Elsevier B.V. All rights reserved.
Approximate Bayesian Computation by Subset Simulation using hierarchical state-space models
NASA Astrophysics Data System (ADS)
Vakilzadeh, Majid K.; Huang, Yong; Beck, James L.; Abrahamsson, Thomas
2017-02-01
A new multi-level Markov Chain Monte Carlo algorithm for Approximate Bayesian Computation, ABC-SubSim, has recently appeared that exploits the Subset Simulation method for efficient rare-event simulation. ABC-SubSim adaptively creates a nested decreasing sequence of data-approximating regions in the output space that correspond to increasingly closer approximations of the observed output vector in this output space. At each level, multiple samples of the model parameter vector are generated by a component-wise Metropolis algorithm so that the predicted output corresponding to each parameter value falls in the current data-approximating region. Theoretically, if continued to the limit, the sequence of data-approximating regions would converge on to the observed output vector and the approximate posterior distributions, which are conditional on the data-approximation region, would become exact, but this is not practically feasible. In this paper we study the performance of the ABC-SubSim algorithm for Bayesian updating of the parameters of dynamical systems using a general hierarchical state-space model. We note that the ABC methodology gives an approximate posterior distribution that actually corresponds to an exact posterior where a uniformly distributed combined measurement and modeling error is added. We also note that ABC algorithms have a problem with learning the uncertain error variances in a stochastic state-space model and so we treat them as nuisance parameters and analytically integrate them out of the posterior distribution. In addition, the statistical efficiency of the original ABC-SubSim algorithm is improved by developing a novel strategy to regulate the proposal variance for the component-wise Metropolis algorithm at each level. We demonstrate that Self-regulated ABC-SubSim is well suited for Bayesian system identification by first applying it successfully to model updating of a two degree-of-freedom linear structure for three cases: globally, locally and un-identifiable model classes, and then to model updating of a two degree-of-freedom nonlinear structure with Duffing nonlinearities in its interstory force-deflection relationship.
BayesMotif: de novo protein sorting motif discovery from impure datasets.
Hu, Jianjun; Zhang, Fan
2010-01-18
Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
What is value—accumulated reward or evidence?
Friston, Karl; Adams, Rick; Montague, Read
2012-01-01
Why are you reading this abstract? In some sense, your answer will cast the exercise as valuable—but what is value? In what follows, we suggest that value is evidence or, more exactly, log Bayesian evidence. This implies that a sufficient explanation for valuable behavior is the accumulation of evidence for internal models of our world. This contrasts with normative models of optimal control and reinforcement learning, which assume the existence of a value function that explains behavior, where (somewhat tautologically) behavior maximizes value. In this paper, we consider an alternative formulation—active inference—that replaces policies in normative models with prior beliefs about the (future) states agents should occupy. This enables optimal behavior to be cast purely in terms of inference: where agents sample their sensorium to maximize the evidence for their generative model of hidden states in the world, and minimize their uncertainty about those states. Crucially, this formulation resolves the tautology inherent in normative models and allows one to consider how prior beliefs are themselves optimized in a hierarchical setting. We illustrate these points by showing that any optimal policy can be specified with prior beliefs in the context of Bayesian inference. We then show how these prior beliefs are themselves prescribed by an imperative to minimize uncertainty. This formulation explains the saccadic eye movements required to read this text and defines the value of the visual sensations you are soliciting. PMID:23133414
Gálvez, Akemi; Iglesias, Andrés; Cabellos, Luis
2014-01-01
The problem of data fitting is very important in many theoretical and applied fields. In this paper, we consider the problem of optimizing a weighted Bayesian energy functional for data fitting by using global-support approximating curves. By global-support curves we mean curves expressed as a linear combination of basis functions whose support is the whole domain of the problem, as opposed to other common approaches in CAD/CAM and computer graphics driven by piecewise functions (such as B-splines and NURBS) that provide local control of the shape of the curve. Our method applies a powerful nature-inspired metaheuristic algorithm called cuckoo search, introduced recently to solve optimization problems. A major advantage of this method is its simplicity: cuckoo search requires only two parameters, many fewer than other metaheuristic approaches, so the parameter tuning becomes a very simple task. The paper shows that this new approach can be successfully used to solve our optimization problem. To check the performance of our approach, it has been applied to five illustrative examples of different types, including open and closed 2D and 3D curves that exhibit challenging features, such as cusps and self-intersections. Our results show that the method performs pretty well, being able to solve our minimization problem in an astonishingly straightforward way. PMID:24977175
Gálvez, Akemi; Iglesias, Andrés; Cabellos, Luis
2014-01-01
The problem of data fitting is very important in many theoretical and applied fields. In this paper, we consider the problem of optimizing a weighted Bayesian energy functional for data fitting by using global-support approximating curves. By global-support curves we mean curves expressed as a linear combination of basis functions whose support is the whole domain of the problem, as opposed to other common approaches in CAD/CAM and computer graphics driven by piecewise functions (such as B-splines and NURBS) that provide local control of the shape of the curve. Our method applies a powerful nature-inspired metaheuristic algorithm called cuckoo search, introduced recently to solve optimization problems. A major advantage of this method is its simplicity: cuckoo search requires only two parameters, many fewer than other metaheuristic approaches, so the parameter tuning becomes a very simple task. The paper shows that this new approach can be successfully used to solve our optimization problem. To check the performance of our approach, it has been applied to five illustrative examples of different types, including open and closed 2D and 3D curves that exhibit challenging features, such as cusps and self-intersections. Our results show that the method performs pretty well, being able to solve our minimization problem in an astonishingly straightforward way.
NASA Astrophysics Data System (ADS)
Harmening, Corinna; Neuner, Hans
2016-09-01
Due to the establishment of terrestrial laser scanner, the analysis strategies in engineering geodesy change from pointwise approaches to areal ones. These areal analysis strategies are commonly built on the modelling of the acquired point clouds. Freeform curves and surfaces like B-spline curves/surfaces are one possible approach to obtain space continuous information. A variety of parameters determines the B-spline's appearance; the B-spline's complexity is mostly determined by the number of control points. Usually, this number of control points is chosen quite arbitrarily by intuitive trial-and-error-procedures. In this paper, the Akaike Information Criterion and the Bayesian Information Criterion are investigated with regard to a justified and reproducible choice of the optimal number of control points of B-spline curves. Additionally, we develop a method which is based on the structural risk minimization of the statistical learning theory. Unlike the Akaike and the Bayesian Information Criteria this method doesn't use the number of parameters as complexity measure of the approximating functions but their Vapnik-Chervonenkis-dimension. Furthermore, it is also valid for non-linear models. Thus, the three methods differ in their target function to be minimized and consequently in their definition of optimality. The present paper will be continued by a second paper dealing with the choice of the optimal number of control points of B-spline surfaces.
(Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans
Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup
2016-05-06
Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons ( Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC)more » effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM's genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Furthermore, using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.« less
A transcriptome approach to ecdysozoan phylogeny.
Borner, Janus; Rehm, Peter; Schill, Ralph O; Ebersberger, Ingo; Burmester, Thorsten
2014-11-01
The monophyly of Ecdysozoa, which comprise molting phyla, has received strong support from several lines of evidence. However, the internal relationships of Ecdysozoa are still contended. We generated expressed sequence tags from a priapulid (penis worm), a kinorhynch (mud dragon), a tardigrade (water bear) and five chelicerate taxa by 454 transcriptome sequencing. A multigene alignment was assembled from 63 taxa, which comprised after matrix optimization 24,249 amino acid positions with high data density (2.6% gaps, 19.1% missing data). Phylogenetic analyses employing various models support the monophyly of Ecdysozoa. A clade combining Priapulida and Kinorhyncha (i.e. Scalidophora) was recovered as the earliest branch among Ecdysozoa. We conclude that Cycloneuralia, a taxon erected to combine Priapulida, Kinorhyncha and Nematoda (and others), are paraphyletic. Rather Arthropoda (including Onychophora) are allied with Nematoda and Tardigrada. Within Arthropoda, we found strong support for most clades, including monophyletic Mandibulata and Pancrustacea. The phylogeny within the Euchelicerata remained largely unresolved. There is conflicting evidence on the position of tardigrades: While Bayesian and maximum likelihood analyses of only slowly evolving genes recovered Tardigrada as a sister group to Arthropoda, analyses of the full data set, and of subsets containing genes evolving at fast and intermediate rates identified a clade of Tardigrada and Nematoda. Notably, the latter topology is also supported by the analyses of indel patterns. Copyright © 2014 Elsevier Inc. All rights reserved.
Ding, Hui-Hui; Chao, Yi-Shan; Callado, John Rey; Dong, Shi-Yong
2014-11-01
In this study we provide a phylogeny for the pantropical fern genus Tectaria, with emphasis on the Old World species, based on sequences of five plastid regions (atpB, ndhF plus ndhF-trnL, rbcL, rps16-matK plus matK, and trnL-F). Maximum parsimony, maximum likelihood, and Bayesian inference are used to analyze 115 individuals, representing ca. 56 species of Tectaria s.l. and 36 species of ten related genera. The results strongly support the monophyly of Tectaria in a broad sense, in which Ctenitopsis, Hemigramma, Heterogonium, Psomiocarpa, Quercifilix, Stenosemia, and Tectaridium should be submerged. Such broadly circumscribed Tectaria is supported by the arising pattern of veinlets and the base chromosome number (x=40). Four primary clades are well resolved within Tectaria, one from the Neotropic (T. trifoliata clade) and three from the Old World (T. subtriphylla clade, Ctenitopsis clade, and T. crenata clade). Tectaria crenata clade is the largest one including six subclades. Of the genera previously recognized as tectarioid ferns, Ctenitis, Lastreopsis, and Pleocnemia, are confirmed to be members in Dryopteridaceae; while Pteridrys and Triplophyllum are supported in Tectariaceae. To infer morphological evolution, 13 commonly used characters are optimized on the resulting phylogenetic trees and in result, are all homoplastic in Tectaria. Copyright © 2014 Elsevier Inc. All rights reserved.
Assessing spatial variation of corn response to irrigation using a bayesian semiparametric model
USDA-ARS?s Scientific Manuscript database
Spatial irrigation of agricultural crops using site-specific variable-rate irrigation (VRI) systems is beginning to have wide-spread acceptance. However, optimizing the management of these VRI systems to conserve natural resources and increase profitability requires an understanding of the spatial ...
The second molecular epidemiological study of HIV infection in Mongolia between 2010 and 2016.
Jagdagsuren, Davaalkham; Hayashida, Tsunefusa; Takano, Misao; Gombo, Erdenetuya; Zayasaikhan, Setsen; Kanayama, Naomi; Tsuchiya, Kiyoto; Oka, Shinichi
2017-01-01
Our previous 2005-2009 molecular epidemiological study in Mongolia identified a hot spot of HIV-1 transmission in men who have sex with men (MSM). To control the infection, we collaborated with NGOs to promote safer sex and HIV testing since mid-2010. In this study, we carried out the second molecular epidemiological survey between 2010 and 2016 to determine the status of HIV-1 infection in Mongolia. The study included 143 new cases of HIV-1 infection. Viral RNA was extracted from stocked plasma samples and sequenced for the pol and the env regions using the Sanger method. Near-full length sequencing using MiSeq was performed in 3 patients who were suspected to be infected with recombinant HIV-1. Phylogenetic analysis was performed using the neighbor-joining method and Bayesian Markov chain Monte Carlo method. MSM was the main transmission route in the previous and current studies. However, heterosexual route showed a significant increase in recent years. Phylogenetic analysis documented three taxa; Mongolian B, Korean B, and CRF51_01B, though the former two were also observed in the previous study. CRF51_01B, which originated from Singapore and Malaysia, was confirmed by near-full length sequencing. Although these strains were mainly detected in MSM, they were also found in increasing numbers of heterosexual males and females. Bayesian phylogenetic analysis estimated transmission of CRF51_01B into Mongolia around early 2000s. An extended Bayesian skyline plot showed a rapid increase in the effective population size of Mongolian B cluster around 2004 and that of CRF51_01B cluster around 2011. HIV-1 infection might expand to the general population in Mongolia. Our study documented a new cluster of HIV-1 transmission, enhancing our understanding of the epidemiological status of HIV-1 in Mongolia.
NASA Astrophysics Data System (ADS)
Ha, Taesung
A probabilistic risk assessment (PRA) was conducted for a loss of coolant accident, (LOCA) in the McMaster Nuclear Reactor (MNR). A level 1 PRA was completed including event sequence modeling, system modeling, and quantification. To support the quantification of the accident sequence identified, data analysis using the Bayesian method and human reliability analysis (HRA) using the accident sequence evaluation procedure (ASEP) approach were performed. Since human performance in research reactors is significantly different from that in power reactors, a time-oriented HRA model (reliability physics model) was applied for the human error probability (HEP) estimation of the core relocation. This model is based on two competing random variables: phenomenological time and performance time. The response surface and direct Monte Carlo simulation with Latin Hypercube sampling were applied for estimating the phenomenological time, whereas the performance time was obtained from interviews with operators. An appropriate probability distribution for the phenomenological time was assigned by statistical goodness-of-fit tests. The human error probability (HEP) for the core relocation was estimated from these two competing quantities: phenomenological time and operators' performance time. The sensitivity of each probability distribution in human reliability estimation was investigated. In order to quantify the uncertainty in the predicted HEPs, a Bayesian approach was selected due to its capability of incorporating uncertainties in model itself and the parameters in that model. The HEP from the current time-oriented model was compared with that from the ASEP approach. Both results were used to evaluate the sensitivity of alternative huinan reliability modeling for the manual core relocation in the LOCA risk model. This exercise demonstrated the applicability of a reliability physics model supplemented with a. Bayesian approach for modeling human reliability and its potential usefulness of quantifying model uncertainty as sensitivity analysis in the PRA model.
Phylodynamics of classical swine fever virus with emphasis on Ecuadorian strains.
Garrido Haro, A D; Barrera Valle, M; Acosta, A; J Flores, F
2018-06-01
Classic swine fever virus (CSFV) is a Pestivirus from the Flaviviridae family that affects pigs worldwide and is endemic in several Latin American countries. However, there are still some countries in the region, including Ecuador, for which CSFV molecular information is lacking. To better understand the epidemiology of CSFV in the Americas, sequences from CSFVs from Ecuador were generated and a phylodynamic analysis of the virus was performed. Sequences for the full-length glycoprotein E2 gene of twenty field isolates were obtained and, along with sequences from strains previously described in the Americas and from the most representative strains worldwide, were used to analyse the phylodynamics of the virus. Bayesian methods were used to test several molecular clock and demographic models. A calibrated ultrametric tree and a Bayesian skyline were constructed, and codons associated with positive selection involving immune scape were detected. The best model according to Bayes factors was the strict molecular clock and Bayesian skyline model, which shows that CSFV has an evolution rate of 3.2 × 10 -4 substitutions per site per year. The model estimates the origin of CSFV in the mid-1500s. There is a strong spatial structure for CSFV in the Americas, indicating that the virus is moving mainly through neighbouring countries. The genetic diversity of CSFV has increased constantly since its appearance, with a slight decrease in mid-twentieth century, which coincides, with eradication campaigns in North America. Even though there is no evidence of strong directional evolution of the E2 gene in CSFV, codons 713, 761, 762 and 975 appear to be selected positively and could be related to virulence or pathogenesis. These results reveal how CSFV has spread and evolved since it first appeared in the Americas and provide important information for attaining the goal of eradication of this virus in Latin America. © 2018 Blackwell Verlag GmbH.
Tomasello, Salvatore; Álvarez, Inés; Vargas, Pablo; Oberprieler, Christoph
2015-01-01
The present study provides results of multi-species coalescent species tree analyses of DNA sequences sampled from multiple nuclear and plastid regions to infer the phylogenetic relationships among the members of the subtribe Leucanthemopsidinae (Compositae, Anthemideae), to which besides the annual Castrilanthemum debeauxii (Degen, Hervier & É.Rev.) Vogt & Oberp., one of the rarest flowering plant species of the Iberian Peninsula, two other unispecific genera (Hymenostemma, Prolongoa), and the polyploidy complex of the genus Leucanthemopsis belong. Based on sequence information from two single- to low-copy nuclear regions (C16, D35, characterised by Chapman et al. (2007)), the multi-copy region of the nrDNA internal transcribed spacer regions ITS1 and ITS2, and two intergenic spacer regions of the cpDNA gene trees were reconstructed using Bayesian inference methods. For the reconstruction of a multi-locus species tree we applied three different methods: (a) analysis of concatenated sequences using Bayesian inference (MrBayes), (b) a tree reconciliation approach by minimizing the number of deep coalescences (PhyloNet), and (c) a coalescent-based species-tree method in a Bayesian framework ((∗)BEAST). All three species tree reconstruction methods unequivocally support the close relationship of the subtribe with the hitherto unclassified genus Phalacrocarpum, the sister-group relationship of Castrilanthemum with the three remaining genera of the subtribe, and the further sister-group relationship of the clade of Hymenostemma+Prolongoa with a monophyletic genus Leucanthemopsis. Dating of the (∗)BEAST phylogeny supports the long-lasting (Early Miocene, 15-22Ma) taxonomical independence and the switch from the plesiomorphic perennial to the apomorphic annual life-form assumed for the Castrilanthemum lineage that may have occurred not earlier than in the Pliocene (3Ma) when the establishment of a Mediterranean climate with summer droughts triggered evolution towards annuality. Copyright © 2014 Elsevier Inc. All rights reserved.
Cinelli, Mattia; Sun, Yuxin; Best, Katharine; Heather, James M; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-04-01
Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone. The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant. The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html . b.chain@ucl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Optimization of global model composed of radial basis functions using the term-ranking approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cai, Peng; Tao, Chao, E-mail: taochao@nju.edu.cn; Liu, Xiao-Jun
2014-03-15
A term-ranking method is put forward to optimize the global model composed of radial basis functions to improve the predictability of the model. The effectiveness of the proposed method is examined by numerical simulation and experimental data. Numerical simulations indicate that this method can significantly lengthen the prediction time and decrease the Bayesian information criterion of the model. The application to real voice signal shows that the optimized global model can capture more predictable component in chaos-like voice data and simultaneously reduce the predictable component (periodic pitch) in the residual signal.
Epstein, F H; Mugler, J P; Brookeman, J R
1994-02-01
A number of pulse sequence techniques, including magnetization-prepared gradient echo (MP-GRE), segmented GRE, and hybrid RARE, employ a relatively large number of variable pulse sequence parameters and acquire the image data during a transient signal evolution. These sequences have recently been proposed and/or used for clinical applications in the brain, spine, liver, and coronary arteries. Thus, the need for a method of deriving optimal pulse sequence parameter values for this class of sequences now exists. Due to the complexity of these sequences, conventional optimization approaches, such as applying differential calculus to signal difference equations, are inadequate. We have developed a general framework for adapting the simulated annealing algorithm to pulse sequence parameter value optimization, and applied this framework to the specific case of optimizing the white matter-gray matter signal difference for a T1-weighted variable flip angle 3D MP-RAGE sequence. Using our algorithm, the values of 35 sequence parameters, including the magnetization-preparation RF pulse flip angle and delay time, 32 flip angles in the variable flip angle gradient-echo acquisition sequence, and the magnetization recovery time, were derived. Optimized 3D MP-RAGE achieved up to a 130% increase in white matter-gray matter signal difference compared with optimized 3D RF-spoiled FLASH with the same total acquisition time. The simulated annealing approach was effective at deriving optimal parameter values for a specific 3D MP-RAGE imaging objective, and may be useful for other imaging objectives and sequences in this general class.
A BAYESIAN APPROACH TO DERIVING AGES OF INDIVIDUAL FIELD WHITE DWARFS
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Malley, Erin M.; Von Hippel, Ted; Van Dyk, David A., E-mail: ted.vonhippel@erau.edu, E-mail: dvandyke@imperial.ac.uk
2013-09-20
We apply a self-consistent and robust Bayesian statistical approach to determine the ages, distances, and zero-age main sequence (ZAMS) masses of 28 field DA white dwarfs (WDs) with ages of approximately 4-8 Gyr. Our technique requires only quality optical and near-infrared photometry to derive ages with <15% uncertainties, generally with little sensitivity to our choice of modern initial-final mass relation. We find that age, distance, and ZAMS mass are correlated in a manner that is too complex to be captured by traditional error propagation techniques. We further find that the posterior distributions of age are often asymmetric, indicating that themore » standard approach to deriving WD ages can yield misleading results.« less
A solution to the static frame validation challenge problem using Bayesian model selection
Grigoriu, M. D.; Field, R. V.
2007-12-23
Within this paper, we provide a solution to the static frame validation challenge problem (see this issue) in a manner that is consistent with the guidelines provided by the Validation Challenge Workshop tasking document. The static frame problem is constructed such that variability in material properties is known to be the only source of uncertainty in the system description, but there is ignorance on the type of model that best describes this variability. Hence both types of uncertainty, aleatoric and epistemic, are present and must be addressed. Our approach is to consider a collection of competing probabilistic models for themore » material properties, and calibrate these models to the information provided; models of different levels of complexity and numerical efficiency are included in the analysis. A Bayesian formulation is used to select the optimal model from the collection, which is then used for the regulatory assessment. Lastly, bayesian credible intervals are used to provide a measure of confidence to our regulatory assessment.« less
NASA Astrophysics Data System (ADS)
Sadegh, Mojtaba; Ragno, Elisa; AghaKouchak, Amir
2017-06-01
We present a newly developed Multivariate Copula Analysis Toolbox (MvCAT) which includes a wide range of copula families with different levels of complexity. MvCAT employs a Bayesian framework with a residual-based Gaussian likelihood function for inferring copula parameters and estimating the underlying uncertainties. The contribution of this paper is threefold: (a) providing a Bayesian framework to approximate the predictive uncertainties of fitted copulas, (b) introducing a hybrid-evolution Markov Chain Monte Carlo (MCMC) approach designed for numerical estimation of the posterior distribution of copula parameters, and (c) enabling the community to explore a wide range of copulas and evaluate them relative to the fitting uncertainties. We show that the commonly used local optimization methods for copula parameter estimation often get trapped in local minima. The proposed method, however, addresses this limitation and improves describing the dependence structure. MvCAT also enables evaluation of uncertainties relative to the length of record, which is fundamental to a wide range of applications such as multivariate frequency analysis.
A Bayesian Approach for Sensor Optimisation in Impact Identification
Mallardo, Vincenzo; Sharif Khodaei, Zahra; Aliabadi, Ferri M. H.
2016-01-01
This paper presents a Bayesian approach for optimizing the position of sensors aimed at impact identification in composite structures under operational conditions. The uncertainty in the sensor data has been represented by statistical distributions of the recorded signals. An optimisation strategy based on the genetic algorithm is proposed to find the best sensor combination aimed at locating impacts on composite structures. A Bayesian-based objective function is adopted in the optimisation procedure as an indicator of the performance of meta-models developed for different sensor combinations to locate various impact events. To represent a real structure under operational load and to increase the reliability of the Structural Health Monitoring (SHM) system, the probability of malfunctioning sensors is included in the optimisation. The reliability and the robustness of the procedure is tested with experimental and numerical examples. Finally, the proposed optimisation algorithm is applied to a composite stiffened panel for both the uniform and non-uniform probability of impact occurrence. PMID:28774064
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
On the uncertainty in single molecule fluorescent lifetime and energy emission measurements
NASA Technical Reports Server (NTRS)
Brown, Emery N.; Zhang, Zhenhua; Mccollom, Alex D.
1995-01-01
Time-correlated single photon counting has recently been combined with mode-locked picosecond pulsed excitation to measure the fluorescent lifetimes and energy emissions of single molecules in a flow stream. Maximum likelihood (ML) and least square methods agree and are optimal when the number of detected photons is large however, in single molecule fluorescence experiments the number of detected photons can be less than 20, 67% of those can be noise and the detection time is restricted to 10 nanoseconds. Under the assumption that the photon signal and background noise are two independent inhomogeneous poisson processes, we derive the exact joint arrival time probably density of the photons collected in a single counting experiment performed in the presence of background noise. The model obviates the need to bin experimental data for analysis, and makes it possible to analyze formally the effect of background noise on the photon detection experiment using both ML or Bayesian methods. For both methods we derive the joint and marginal probability densities of the fluorescent lifetime and fluorescent emission. the ML and Bayesian methods are compared in an analysis of simulated single molecule fluorescence experiments of Rhodamine 110 using different combinations of expected background nose and expected fluorescence emission. While both the ML or Bayesian procedures perform well for analyzing fluorescence emissions, the Bayesian methods provide more realistic measures of uncertainty in the fluorescent lifetimes. The Bayesian methods would be especially useful for measuring uncertainty in fluorescent lifetime estimates in current single molecule flow stream experiments where the expected fluorescence emission is low. Both the ML and Bayesian algorithms can be automated for applications in molecular biology.
On the Uncertainty in Single Molecule Fluorescent Lifetime and Energy Emission Measurements
NASA Technical Reports Server (NTRS)
Brown, Emery N.; Zhang, Zhenhua; McCollom, Alex D.
1996-01-01
Time-correlated single photon counting has recently been combined with mode-locked picosecond pulsed excitation to measure the fluorescent lifetimes and energy emissions of single molecules in a flow stream. Maximum likelihood (ML) and least squares methods agree and are optimal when the number of detected photons is large, however, in single molecule fluorescence experiments the number of detected photons can be less than 20, 67 percent of those can be noise, and the detection time is restricted to 10 nanoseconds. Under the assumption that the photon signal and background noise are two independent inhomogeneous Poisson processes, we derive the exact joint arrival time probability density of the photons collected in a single counting experiment performed in the presence of background noise. The model obviates the need to bin experimental data for analysis, and makes it possible to analyze formally the effect of background noise on the photon detection experiment using both ML or Bayesian methods. For both methods we derive the joint and marginal probability densities of the fluorescent lifetime and fluorescent emission. The ML and Bayesian methods are compared in an analysis of simulated single molecule fluorescence experiments of Rhodamine 110 using different combinations of expected background noise and expected fluorescence emission. While both the ML or Bayesian procedures perform well for analyzing fluorescence emissions, the Bayesian methods provide more realistic measures of uncertainty in the fluorescent lifetimes. The Bayesian methods would be especially useful for measuring uncertainty in fluorescent lifetime estimates in current single molecule flow stream experiments where the expected fluorescence emission is low. Both the ML and Bayesian algorithms can be automated for applications in molecular biology.
Heuristics as Bayesian inference under extreme priors.
Parpart, Paula; Jones, Matt; Love, Bradley C
2018-05-01
Simple heuristics are often regarded as tractable decision strategies because they ignore a great deal of information in the input data. One puzzle is why heuristics can outperform full-information models, such as linear regression, which make full use of the available information. These "less-is-more" effects, in which a relatively simpler model outperforms a more complex model, are prevalent throughout cognitive science, and are frequently argued to demonstrate an inherent advantage of simplifying computation or ignoring information. In contrast, we show at the computational level (where algorithmic restrictions are set aside) that it is never optimal to discard information. Through a formal Bayesian analysis, we prove that popular heuristics, such as tallying and take-the-best, are formally equivalent to Bayesian inference under the limit of infinitely strong priors. Varying the strength of the prior yields a continuum of Bayesian models with the heuristics at one end and ordinary regression at the other. Critically, intermediate models perform better across all our simulations, suggesting that down-weighting information with the appropriate prior is preferable to entirely ignoring it. Rather than because of their simplicity, our analyses suggest heuristics perform well because they implement strong priors that approximate the actual structure of the environment. We end by considering how new heuristics could be derived by infinitely strengthening the priors of other Bayesian models. These formal results have implications for work in psychology, machine learning and economics. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Selecting the selector: Comparison of update rules for discrete global optimization
Theiler, James; Zimmer, Beate G.
2017-05-24
In this paper, we compare some well-known Bayesian global optimization methods in four distinct regimes, corresponding to high and low levels of measurement noise and to high and low levels of “quenched noise” (which term we use to describe the roughness of the function we are trying to optimize). We isolate the two stages of this optimization in terms of a “regressor,” which fits a model to the data measured so far, and a “selector,” which identifies the next point to be measured. Finally, the focus of this paper is to investigate the choice of selector when the regressor ismore » well matched to the data.« less
Modeling Statistical Insensitivity: Sources of Suboptimal Behavior
ERIC Educational Resources Information Center
Gagliardi, Annie; Feldman, Naomi H.; Lidz, Jeffrey
2017-01-01
Children acquiring languages with noun classes (grammatical gender) have ample statistical information available that characterizes the distribution of nouns into these classes, but their use of this information to classify novel nouns differs from the predictions made by an optimal Bayesian classifier. We use rational analysis to investigate the…
Categorical Biases in Spatial Memory: The Role of Certainty
ERIC Educational Resources Information Center
Holden, Mark P.; Newcombe, Nora S.; Shipley, Thomas F.
2015-01-01
Memories for spatial locations often show systematic errors toward the central value of the surrounding region. The Category Adjustment (CA) model suggests that this bias is due to a Bayesian combination of categorical and metric information, which offers an optimal solution under conditions of uncertainty (Huttenlocher, Hedges, & Duncan,…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Portone, Teresa; Niederhaus, John Henry; Sanchez, Jason James
This report introduces the concepts of Bayesian model selection, which provides a systematic means of calibrating and selecting an optimal model to represent a phenomenon. This has many potential applications, including for comparing constitutive models. The ideas described herein are applied to a model selection problem between different yield models for hardened steel under extreme loading conditions.
Gomes, Laise de Azevedo; Moraes, Pablo Henrique Gonçalves; do Nascimento, Luciana de Cássia Silva; O'Dwyer, Lucia Helena; Nunes, Márcio Roberto Teixeira; Rossi, Adriana Dos Reis Ponce; Aguiar, Délia Cristina Figueira; Gonçalves, Evonnildo Costa
2016-10-01
This study aimed to optimize molecular methods for detecting DNA of Hepatozoon spp. as well as identify the phylogenetic relationships of Hepatozoon strains naturally infecting domestic dogs in Belém, Pará, northern Brazil. Blood samples were collected from 138 dogs, and screened for Hepatozoon spp. using a new nested PCR assay. Positive samples were subjected to genetic characterization based on amplification and sequencing of approximately 670bp of the Hepatozoon spp. 18S rRNA. Of the positive dogs, four shared the haplotype Belém 01, one dog presented the haplotype Belém 02 and two dogs shared the haplotype Belém 03. A Bayesian inference indicates that haplotypes Belém 01 and Belém 02 are phylogenetically related to H. canis, while Belém 03 is related to H. americanum. Overall, based on the first molecular evidence of H. americanum in Brazilian domestic dogs, the proposed protocol may improve the epidemiological investigation of canine hepatozoonosis. Copyright © 2016 Elsevier GmbH. All rights reserved.
Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W
2006-03-01
Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.
Bulashevska, Alla; Stein, Martin; Jackson, David; Eils, Roland
2009-12-01
Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.
NASA Astrophysics Data System (ADS)
Omi, Takahiro; Ogata, Yosihiko; Hirata, Yoshito; Aihara, Kazuyuki
2015-04-01
Because aftershock occurrences can cause significant seismic risks for a considerable time after the main shock, prospective forecasting of the intermediate-term aftershock activity as soon as possible is important. The epidemic-type aftershock sequence (ETAS) model with the maximum likelihood estimate effectively reproduces general aftershock activity including secondary or higher-order aftershocks and can be employed for the forecasting. However, because we cannot always expect the accurate parameter estimation from incomplete early aftershock data where many events are missing, such forecasting using only a single estimated parameter set (plug-in forecasting) can frequently perform poorly. Therefore, we here propose Bayesian forecasting that combines the forecasts by the ETAS model with various probable parameter sets given the data. By conducting forecasting tests of 1 month period aftershocks based on the first 1 day data after the main shock as an example of the early intermediate-term forecasting, we show that the Bayesian forecasting performs better than the plug-in forecasting on average in terms of the log-likelihood score. Furthermore, to improve forecasting of large aftershocks, we apply a nonparametric (NP) model using magnitude data during the learning period and compare its forecasting performance with that of the Gutenberg-Richter (G-R) formula. We show that the NP forecast performs better than the G-R formula in some cases but worse in other cases. Therefore, robust forecasting can be obtained by employing an ensemble forecast that combines the two complementary forecasts. Our proposed method is useful for a stable unbiased intermediate-term assessment of aftershock probabilities.
Subbotin, Sergei A; Ragsdale, Erik J; Mullens, Teresa; Roberts, Philip A; Mundo-Ocampo, Manuel; Baldwin, James G
2008-08-01
The root lesion nematodes of the genus Pratylenchus Filipjev, 1936 are migratory endoparasites of plant roots, considered among the most widespread and important nematode parasites in a variety of crops. We obtained gene sequences from the D2 and D3 expansion segments of 28S rRNA partial and 18S rRNA from 31 populations belonging to 11 valid and two unidentified species of root lesion nematodes and five outgroup taxa. These datasets were analyzed using maximum parsimony and Bayesian inference. The alignments were generated using the secondary structure models for these molecules and analyzed with Bayesian inference under the standard models and the complex model, considering helices under the doublet model and loops and bulges under the general time reversible model. The phylogenetic informativeness of morphological characters is tested by reconstruction of their histories on rRNA based trees using parallel parsimony and Bayesian approaches. Phylogenetic and sequence analyses of the 28S D2-D3 dataset with 145 accessions for 28 species and 18S dataset with 68 accessions for 15 species confirmed among large numbers of geographical diverse isolates that most classical morphospecies are monophyletic. Phylogenetic analyses revealed at least six distinct major clades of examined Pratylenchus species and these clades are generally congruent with those defined by characters derived from lip patterns, numbers of lip annules, and spermatheca shape. Morphological results suggest the need for sophisticated character discovery and analysis for morphology based phylogenetics in nematodes.
ERIC Educational Resources Information Center
Dawson, Colin; Gerken, LouAnn
2011-01-01
While many constraints on learning must be relatively experience-independent, past experience provides a rich source of guidance for subsequent learning. Discovering structure in some domain can inform a learner's future hypotheses about that domain. If a general property accounts for particular sub-patterns, a rational learner should not…
Community of Priors: A Bayesian Approach to Consensus Building
ERIC Educational Resources Information Center
Hara, Motoaki
2010-01-01
Despite having drawn from empirical evidence and cumulative prior expertise in the formulation of research questions as well as study design, each study is treated as a stand-alone product rather than positioned within a sequence of cumulative evidence. While results of prior studies are typically cited within the body of prior literature review,…
Salas-Leiva, Dayana E; Meerow, Alan W; Calonje, Michael; Griffith, M Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W; Lewis, Carl E; Namoff, Sandra
2013-11-01
Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree-species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree-species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia-Lepidozamia-Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae.
Lorenz, Romy; Monti, Ricardo Pio; Violante, Inês R.; Anagnostopoulos, Christoforos; Faisal, Aldo A.; Montana, Giovanni; Leech, Robert
2016-01-01
Functional neuroimaging typically explores how a particular task activates a set of brain regions. Importantly though, the same neural system can be activated by inherently different tasks. To date, there is no approach available that systematically explores whether and how distinct tasks probe the same neural system. Here, we propose and validate an alternative framework, the Automatic Neuroscientist, which turns the standard fMRI approach on its head. We use real-time fMRI in combination with modern machine-learning techniques to automatically design the optimal experiment to evoke a desired target brain state. In this work, we present two proof-of-principle studies involving perceptual stimuli. In both studies optimization algorithms of varying complexity were employed; the first involved a stochastic approximation method while the second incorporated a more sophisticated Bayesian optimization technique. In the first study, we achieved convergence for the hypothesized optimum in 11 out of 14 runs in less than 10 min. Results of the second study showed how our closed-loop framework accurately and with high efficiency estimated the underlying relationship between stimuli and neural responses for each subject in one to two runs: with each run lasting 6.3 min. Moreover, we demonstrate that using only the first run produced a reliable solution at a group-level. Supporting simulation analyses provided evidence on the robustness of the Bayesian optimization approach for scenarios with low contrast-to-noise ratio. This framework is generalizable to numerous applications, ranging from optimizing stimuli in neuroimaging pilot studies to tailoring clinical rehabilitation therapy to patients and can be used with multiple imaging modalities in humans and animals. PMID:26804778
Lorenz, Romy; Monti, Ricardo Pio; Violante, Inês R; Anagnostopoulos, Christoforos; Faisal, Aldo A; Montana, Giovanni; Leech, Robert
2016-04-01
Functional neuroimaging typically explores how a particular task activates a set of brain regions. Importantly though, the same neural system can be activated by inherently different tasks. To date, there is no approach available that systematically explores whether and how distinct tasks probe the same neural system. Here, we propose and validate an alternative framework, the Automatic Neuroscientist, which turns the standard fMRI approach on its head. We use real-time fMRI in combination with modern machine-learning techniques to automatically design the optimal experiment to evoke a desired target brain state. In this work, we present two proof-of-principle studies involving perceptual stimuli. In both studies optimization algorithms of varying complexity were employed; the first involved a stochastic approximation method while the second incorporated a more sophisticated Bayesian optimization technique. In the first study, we achieved convergence for the hypothesized optimum in 11 out of 14 runs in less than 10 min. Results of the second study showed how our closed-loop framework accurately and with high efficiency estimated the underlying relationship between stimuli and neural responses for each subject in one to two runs: with each run lasting 6.3 min. Moreover, we demonstrate that using only the first run produced a reliable solution at a group-level. Supporting simulation analyses provided evidence on the robustness of the Bayesian optimization approach for scenarios with low contrast-to-noise ratio. This framework is generalizable to numerous applications, ranging from optimizing stimuli in neuroimaging pilot studies to tailoring clinical rehabilitation therapy to patients and can be used with multiple imaging modalities in humans and animals. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Barcoding Neotropical birds: assessing the impact of nonmonophyly in a highly diverse group.
Chaves, Bárbara R N; Chaves, Anderson V; Nascimento, Augusto C A; Chevitarese, Juliana; Vasconcelos, Marcelo F; Santos, Fabrício R
2015-07-01
In this study, we verified the power of DNA barcodes to discriminate Neotropical birds using Bayesian tree reconstructions of a total of 7404 COI sequences from 1521 species, including 55 Brazilian species with no previous barcode data. We found that 10.4% of species were nonmonophyletic, most likely due to inaccurate taxonomy, incomplete lineage sorting or hybridization. At least 0.5% of the sequences (2.5% of the sampled species) retrieved from GenBank were associated with database errors (poor-quality sequences, NuMTs, misidentification or unnoticed hybridization). Paraphyletic species (5.8% of the total) can be related to rapid speciation events leading to nonreciprocal monophyly between recently diverged sister species, or to absence of synapomorphies in the small COI region analysed. We also performed two series of genetic distance calculations under the K2P model for intraspecific and interspecific comparisons: the first included all COI sequences, and the second included only monophyletic taxa observed in the Bayesian trees. As expected, the mean and median pairwise distances were smaller for intraspecific than for interspecific comparisons. However, there was no precise 'barcode gap', which was shown to be larger in the monophyletic taxon data set than for the data from all species, as expected. Our results indicated that although database errors may explain some of the difficulties in the species discrimination of Neotropical birds, distance-based barcode assignment may also be compromised because of the high diversity of bird species and more complex speciation events in the Neotropics. © 2014 John Wiley & Sons Ltd.
Zhou, Xiaoming; Chan, Paul K. S.; Tam, John S.; Tang, Julian W.
2011-01-01
Background Hepatitis C virus (HCV) 6a accounts for 23.6% of all HCV infections of the general population and 58.5% of intravenous drug users in Hong Kong. However, the geographical origin of this highly predominant HCV subgenotype is largely unknown. This study explores a hypothesis for one possible transmission route of HCV 6a to Hong Kong. Methods NS5A sequences derived from 26 HCV 6a samples were chosen from a five year period (1999–2004) from epidemiologically unrelated patients from Hong Kong. Partial-NS5A sequences (513-bp from nt 6728 to 7240) were adopted for Bayesian coalescent analysis to reconstruct the evolutionary history of HCV infections in Hong Kong using the BEAST v1.3 program. A rooted phylogenetic tree was drawn for these sequences by alignment with reference Vietnamese sequences. Demographic data were accessed from “The Statistic Yearbooks of Hong Kong”. Results Bayesian coalescent analysis showed that the rapid increase in 6a infections, which had increased more than 90-fold in Hong Kong from 1986 to 1994 correlated to two peaks of Vietnamese immigration to Hong Kong from 1978 to 1997. The second peak, which occurred from 1987 through 1997, overlapped with the rapid increase of HCV 6a occurrence in Hong Kong. Phylogenetic analyses have further revealed that HCV 6a strains from Vietnam may be ancestral to Hong Kong counterparts. Conclusions The high predominance of HCV 6a infections in Hong Kong was possibly associated with Vietnamese immigration during 1987–1997. PMID:21931867
Zhou, Xiaoming; Chan, Paul K S; Tam, John S; Tang, Julian W
2011-01-01
Hepatitis C virus (HCV) 6a accounts for 23.6% of all HCV infections of the general population and 58.5% of intravenous drug users in Hong Kong. However, the geographical origin of this highly predominant HCV subgenotype is largely unknown. This study explores a hypothesis for one possible transmission route of HCV 6a to Hong Kong. NS5A sequences derived from 26 HCV 6a samples were chosen from a five year period (1999-2004) from epidemiologically unrelated patients from Hong Kong. Partial-NS5A sequences (513-bp from nt 6728 to 7240) were adopted for Bayesian coalescent analysis to reconstruct the evolutionary history of HCV infections in Hong Kong using the BEAST v1.3 program. A rooted phylogenetic tree was drawn for these sequences by alignment with reference Vietnamese sequences. Demographic data were accessed from "The Statistic Yearbooks of Hong Kong". Bayesian coalescent analysis showed that the rapid increase in 6a infections, which had increased more than 90-fold in Hong Kong from 1986 to 1994 correlated to two peaks of Vietnamese immigration to Hong Kong from 1978 to 1997. The second peak, which occurred from 1987 through 1997, overlapped with the rapid increase of HCV 6a occurrence in Hong Kong. Phylogenetic analyses have further revealed that HCV 6a strains from Vietnam may be ancestral to Hong Kong counterparts. The high predominance of HCV 6a infections in Hong Kong was possibly associated with Vietnamese immigration during 1987-1997.
An experimental phylogeny to benchmark ancestral sequence reconstruction
Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.
2016-01-01
Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Nonlinear dynamical modes of climate variability: from curves to manifolds
NASA Astrophysics Data System (ADS)
Gavrilov, Andrey; Mukhin, Dmitry; Loskutov, Evgeny; Feigin, Alexander
2016-04-01
The necessity of efficient dimensionality reduction methods capturing dynamical properties of the system from observed data is evident. Recent study shows that nonlinear dynamical mode (NDM) expansion is able to solve this problem and provide adequate phase variables in climate data analysis [1]. A single NDM is logical extension of linear spatio-temporal structure (like empirical orthogonal function pattern): it is constructed as nonlinear transformation of hidden scalar time series to the space of observed variables, i. e. projection of observed dataset onto a nonlinear curve. Both the hidden time series and the parameters of the curve are learned simultaneously using Bayesian approach. The only prior information about the hidden signal is the assumption of its smoothness. The optimal nonlinearity degree and smoothness are found using Bayesian evidence technique. In this work we do further extension and look for vector hidden signals instead of scalar with the same smoothness restriction. As a result we resolve multidimensional manifolds instead of sum of curves. The dimension of the hidden manifold is optimized using also Bayesian evidence. The efficiency of the extension is demonstrated on model examples. Results of application to climate data are demonstrated and discussed. The study is supported by Government of Russian Federation (agreement #14.Z50.31.0033 with the Institute of Applied Physics of RAS). 1. Mukhin, D., Gavrilov, A., Feigin, A., Loskutov, E., & Kurths, J. (2015). Principal nonlinear dynamical modes of climate variability. Scientific Reports, 5, 15510. http://doi.org/10.1038/srep15510
Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui
2017-01-01
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs. PMID:29113310
Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui
2017-10-06
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
NASA Astrophysics Data System (ADS)
Davis, A. D.; Huan, X.; Heimbach, P.; Marzouk, Y.
2017-12-01
Borehole data are essential for calibrating ice sheet models. However, field expeditions for acquiring borehole data are often time-consuming, expensive, and dangerous. It is thus essential to plan the best sampling locations that maximize the value of data while minimizing costs and risks. We present an uncertainty quantification (UQ) workflow based on rigorous probability framework to achieve these objectives. First, we employ an optimal experimental design (OED) procedure to compute borehole locations that yield the highest expected information gain. We take into account practical considerations of location accessibility (e.g., proximity to research sites, terrain, and ice velocity may affect feasibility of drilling) and robustness (e.g., real-time constraints such as weather may force researchers to drill at sub-optimal locations near those originally planned), by incorporating a penalty reflecting accessibility as well as sensitivity to deviations from the optimal locations. Next, we extract vertical temperature profiles from these boreholes and formulate a Bayesian inverse problem to reconstruct past surface temperatures. Using a model of temperature advection/diffusion, the top boundary condition (corresponding to surface temperatures) is calibrated via efficient Markov chain Monte Carlo (MCMC). The overall procedure can then be iterated to choose new optimal borehole locations for the next expeditions.Through this work, we demonstrate powerful UQ methods for designing experiments, calibrating models, making predictions, and assessing sensitivity--all performed under an uncertain environment. We develop a theoretical framework as well as practical software within an intuitive workflow, and illustrate their usefulness for combining data and models for environmental and climate research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Heng, E-mail: hengli@mdanderson.org; Zhu, X. Ronald; Zhang, Xiaodong
Purpose: To develop and validate a novel delivery strategy for reducing the respiratory motion–induced dose uncertainty of spot-scanning proton therapy. Methods and Materials: The spot delivery sequence was optimized to reduce dose uncertainty. The effectiveness of the delivery sequence optimization was evaluated using measurements and patient simulation. One hundred ninety-one 2-dimensional measurements using different delivery sequences of a single-layer uniform pattern were obtained with a detector array on a 1-dimensional moving platform. Intensity modulated proton therapy plans were generated for 10 lung cancer patients, and dose uncertainties for different delivery sequences were evaluated by simulation. Results: Without delivery sequence optimization,more » the maximum absolute dose error can be up to 97.2% in a single measurement, whereas the optimized delivery sequence results in a maximum absolute dose error of ≤11.8%. In patient simulation, the optimized delivery sequence reduces the mean of fractional maximum absolute dose error compared with the regular delivery sequence by 3.3% to 10.6% (32.5-68.0% relative reduction) for different patients. Conclusions: Optimizing the delivery sequence can reduce dose uncertainty due to respiratory motion in spot-scanning proton therapy, assuming the 4-dimensional CT is a true representation of the patients' breathing patterns.« less
Wavelet-Bayesian inference of cosmic strings embedded in the cosmic microwave background
NASA Astrophysics Data System (ADS)
McEwen, J. D.; Feeney, S. M.; Peiris, H. V.; Wiaux, Y.; Ringeval, C.; Bouchet, F. R.
2017-12-01
Cosmic strings are a well-motivated extension to the standard cosmological model and could induce a subdominant component in the anisotropies of the cosmic microwave background (CMB), in addition to the standard inflationary component. The detection of strings, while observationally challenging, would provide a direct probe of physics at very high-energy scales. We develop a framework for cosmic string inference from observations of the CMB made over the celestial sphere, performing a Bayesian analysis in wavelet space where the string-induced CMB component has distinct statistical properties to the standard inflationary component. Our wavelet-Bayesian framework provides a principled approach to compute the posterior distribution of the string tension Gμ and the Bayesian evidence ratio comparing the string model to the standard inflationary model. Furthermore, we present a technique to recover an estimate of any string-induced CMB map embedded in observational data. Using Planck-like simulations, we demonstrate the application of our framework and evaluate its performance. The method is sensitive to Gμ ∼ 5 × 10-7 for Nambu-Goto string simulations that include an integrated Sachs-Wolfe contribution only and do not include any recombination effects, before any parameters of the analysis are optimized. The sensitivity of the method compares favourably with other techniques applied to the same simulations.
NASA Astrophysics Data System (ADS)
Kiyan, Duygu; Rath, Volker; Delhaye, Robert
2017-04-01
The frequency- and time-domain airborne electromagnetic (AEM) data collected under the Tellus projects of the Geological Survey of Ireland (GSI) which represent a wealth of information on the multi-dimensional electrical structure of Ireland's near-surface. Our project, which was funded by GSI under the framework of their Short Call Research Programme, aims to develop and implement inverse techniques based on various Bayesian methods for these densely sampled data. We have developed a highly flexible toolbox using Python language for the one-dimensional inversion of AEM data along the flight lines. The computational core is based on an adapted frequency- and time-domain forward modelling core derived from the well-tested open-source code AirBeo, which was developed by the CSIRO (Australia) and the AMIRA consortium. Three different inversion methods have been implemented: (i) Tikhonov-type inversion including optimal regularisation methods (Aster el al., 2012; Zhdanov, 2015), (ii) Bayesian MAP inversion in parameter and data space (e.g. Tarantola, 2005), and (iii) Full Bayesian inversion with Markov Chain Monte Carlo (Sambridge and Mosegaard, 2002; Mosegaard and Sambridge, 2002), all including different forms of spatial constraints. The methods have been tested on synthetic and field data. This contribution will introduce the toolbox and present case studies on the AEM data from the Tellus projects.
Bayesian integration and non-linear feedback control in a full-body motor task.
Stevenson, Ian H; Fernandes, Hugo L; Vilares, Iris; Wei, Kunlin; Körding, Konrad P
2009-12-01
A large number of experiments have asked to what degree human reaching movements can be understood as being close to optimal in a statistical sense. However, little is known about whether these principles are relevant for other classes of movements. Here we analyzed movement in a task that is similar to surfing or snowboarding. Human subjects stand on a force plate that measures their center of pressure. This center of pressure affects the acceleration of a cursor that is displayed in a noisy fashion (as a cloud of dots) on a projection screen while the subject is incentivized to keep the cursor close to a fixed position. We find that salient aspects of observed behavior are well-described by optimal control models where a Bayesian estimation model (Kalman filter) is combined with an optimal controller (either a Linear-Quadratic-Regulator or Bang-bang controller). We find evidence that subjects integrate information over time taking into account uncertainty. However, behavior in this continuous steering task appears to be a highly non-linear function of the visual feedback. While the nervous system appears to implement Bayes-like mechanisms for a full-body, dynamic task, it may additionally take into account the specific costs and constraints of the task.
Bayesian segmentation of atrium wall using globally-optimal graph cuts on 3D meshes.
Veni, Gopalkrishna; Fu, Zhisong; Awate, Suyash P; Whitaker, Ross T
2013-01-01
Efficient segmentation of the left atrium (LA) wall from delayed enhancement MRI is challenging due to inconsistent contrast, combined with noise, and high variation in atrial shape and size. We present a surface-detection method that is capable of extracting the atrial wall by computing an optimal a-posteriori estimate. This estimation is done on a set of nested meshes, constructed from an ensemble of segmented training images, and graph cuts on an associated multi-column, proper-ordered graph. The graph/mesh is a part of a template/model that has an associated set of learned intensity features. When this mesh is overlaid onto a test image, it produces a set of costs which lead to an optimal segmentation. The 3D mesh has an associated weighted, directed multi-column graph with edges that encode smoothness and inter-surface penalties. Unlike previous graph-cut methods that impose hard constraints on the surface properties, the proposed method follows from a Bayesian formulation resulting in soft penalties on spatial variation of the cuts through the mesh. The novelty of this method also lies in the construction of proper-ordered graphs on complex shapes for choosing among distinct classes of base shapes for automatic LA segmentation. We evaluate the proposed segmentation framework on simulated and clinical cardiac MRI.
Esfahani, Mohammad Shahrokh; Dougherty, Edward R
2015-01-01
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
Applications of Bayesian spectrum representation in acoustics
NASA Astrophysics Data System (ADS)
Botts, Jonathan M.
This dissertation utilizes a Bayesian inference framework to enhance the solution of inverse problems where the forward model maps to acoustic spectra. A Bayesian solution to filter design inverts a acoustic spectra to pole-zero locations of a discrete-time filter model. Spatial sound field analysis with a spherical microphone array is a data analysis problem that requires inversion of spatio-temporal spectra to directions of arrival. As with many inverse problems, a probabilistic analysis results in richer solutions than can be achieved with ad-hoc methods. In the filter design problem, the Bayesian inversion results in globally optimal coefficient estimates as well as an estimate the most concise filter capable of representing the given spectrum, within a single framework. This approach is demonstrated on synthetic spectra, head-related transfer function spectra, and measured acoustic reflection spectra. The Bayesian model-based analysis of spatial room impulse responses is presented as an analogous problem with equally rich solution. The model selection mechanism provides an estimate of the number of arrivals, which is necessary to properly infer the directions of simultaneous arrivals. Although, spectrum inversion problems are fairly ubiquitous, the scope of this dissertation has been limited to these two and derivative problems. The Bayesian approach to filter design is demonstrated on an artificial spectrum to illustrate the model comparison mechanism and then on measured head-related transfer functions to show the potential range of application. Coupled with sampling methods, the Bayesian approach is shown to outperform least-squares filter design methods commonly used in commercial software, confirming the need for a global search of the parameter space. The resulting designs are shown to be comparable to those that result from global optimization methods, but the Bayesian approach has the added advantage of a filter length estimate within the same unified framework. The application to reflection data is useful for representing frequency-dependent impedance boundaries in finite difference acoustic simulations. Furthermore, since the filter transfer function is a parametric model, it can be modified to incorporate arbitrary frequency weighting and account for the band-limited nature of measured reflection spectra. Finally, the model is modified to compensate for dispersive error in the finite difference simulation, from the filter design process. Stemming from the filter boundary problem, the implementation of pressure sources in finite difference simulation is addressed in order to assure that schemes properly converge. A class of parameterized source functions is proposed and shown to offer straightforward control of residual error in the simulation. Guided by the notion that the solution to be approximated affects the approximation error, sources are designed which reduce residual dispersive error to the size of round-off errors. The early part of a room impulse response can be characterized by a series of isolated plane waves. Measured with an array of microphones, plane waves map to a directional response of the array or spatial intensity map. Probabilistic inversion of this response results in estimates of the number and directions of image source arrivals. The model-based inversion is shown to avoid ambiguities associated with peak-finding or inspection of the spatial intensity map. For this problem, determining the number of arrivals in a given frame is critical for properly inferring the state of the sound field. This analysis is effectively compression of the spatial room response, which is useful for analysis or encoding of the spatial sound field. Parametric, model-based formulations of these problems enhance the solution in all cases, and a Bayesian interpretation provides a principled approach to model comparison and parameter estimation. v
Wellehan, James F.X.; Pessier, Allan P.; Archer, Linda L.; Childress, April L.; Jacobson, Elliott R.; Tesh, Robert B.
2012-01-01
Rhabdoviruses infect a variety of hosts, including non-avian reptiles. Consensus PCR techniques were used to obtain partial RNA-dependent RNA polymerase gene sequence from five rhabdoviruses of South American lizards; Marco, Chaco, Timbo, Sena Madureira, and a rhabdovirus from a caiman lizard (Dracaena guianensis). The caiman lizard rhabdovirus formed inclusions in erythrocytes, which may be a route for infecting hematophagous insects. This is the first information on behavior of a rhabdovirus in squamates. We also obtained sequence from two rhabdoviruses of Australian lizards, confirming previous Charleville virus sequence and finding that, unlike a previous sequence report but in agreement with serologic reports, Almpiwar virus is clearly distinct from Charleville virus. Bayesian and maximum likelihood phylogenetic analysis revealed that most known rhabdoviruses of squamates cluster in the Almpiwar subgroup. The exception is Marco virus, which is found in the Hart Park group. PMID:22397930
Predicting Flavonoid UGT Regioselectivity
Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip
2011-01-01
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
Salehi, Mojtaba; Bahreininejad, Ardeshir
2011-08-01
Optimization of process planning is considered as the key technology for computer-aided process planning which is a rather complex and difficult procedure. A good process plan of a part is built up based on two elements: (1) the optimized sequence of the operations of the part; and (2) the optimized selection of the machine, cutting tool and Tool Access Direction (TAD) for each operation. In the present work, the process planning is divided into preliminary planning, and secondary/detailed planning. In the preliminary stage, based on the analysis of order and clustering constraints as a compulsive constraint aggregation in operation sequencing and using an intelligent searching strategy, the feasible sequences are generated. Then, in the detailed planning stage, using the genetic algorithm which prunes the initial feasible sequences, the optimized operation sequence and the optimized selection of the machine, cutting tool and TAD for each operation based on optimization constraints as an additive constraint aggregation are obtained. The main contribution of this work is the optimization of sequence of the operations of the part, and optimization of machine selection, cutting tool and TAD for each operation using the intelligent search and genetic algorithm simultaneously.
Salehi, Mojtaba
2010-01-01
Optimization of process planning is considered as the key technology for computer-aided process planning which is a rather complex and difficult procedure. A good process plan of a part is built up based on two elements: (1) the optimized sequence of the operations of the part; and (2) the optimized selection of the machine, cutting tool and Tool Access Direction (TAD) for each operation. In the present work, the process planning is divided into preliminary planning, and secondary/detailed planning. In the preliminary stage, based on the analysis of order and clustering constraints as a compulsive constraint aggregation in operation sequencing and using an intelligent searching strategy, the feasible sequences are generated. Then, in the detailed planning stage, using the genetic algorithm which prunes the initial feasible sequences, the optimized operation sequence and the optimized selection of the machine, cutting tool and TAD for each operation based on optimization constraints as an additive constraint aggregation are obtained. The main contribution of this work is the optimization of sequence of the operations of the part, and optimization of machine selection, cutting tool and TAD for each operation using the intelligent search and genetic algorithm simultaneously. PMID:21845020
On the adaptive daily forecasting of seismic aftershock hazard
NASA Astrophysics Data System (ADS)
Ebrahimian, Hossein; Jalayer, Fatemeh; Asprone, Domenico; Lombardi, Anna Maria; Marzocchi, Warner; Prota, Andrea; Manfredi, Gaetano
2013-04-01
Post-earthquake ground motion hazard assessment is a fundamental initial step towards time-dependent seismic risk assessment for buildings in a post main-shock environment. Therefore, operative forecasting of seismic aftershock hazard forms a viable support basis for decision-making regarding search and rescue, inspection, repair, and re-occupation in a post main-shock environment. Arguably, an adaptive procedure for integrating the aftershock occurrence rate together with suitable ground motion prediction relations is key to Probabilistic Seismic Aftershock Hazard Assessment (PSAHA). In the short-term, the seismic hazard may vary significantly (Jordan et al., 2011), particularly after the occurrence of a high magnitude earthquake. Hence, PSAHA requires a reliable model that is able to track the time evolution of the earthquake occurrence rates together with suitable ground motion prediction relations. This work focuses on providing adaptive daily forecasts of the mean daily rate of exceeding various spectral acceleration values (the aftershock hazard). Two well-established earthquake occurrence models suitable for daily seismicity forecasts associated with the evolution of an aftershock sequence, namely, the modified Omori's aftershock model and the Epidemic Type Aftershock Sequence (ETAS) are adopted. The parameters of the modified Omori model are updated on a daily basis using Bayesian updating and based on the data provided by the ongoing aftershock sequence based on the methodology originally proposed by Jalayer et al. (2011). The Bayesian updating is used also to provide sequence-based parameter estimates for a given ground motion prediction model, i.e. the aftershock events in an ongoing sequence are exploited in order to update in an adaptive manner the parameters of an existing ground motion prediction model. As a numerical example, the mean daily rates of exceeding specific spectral acceleration values are estimated adaptively for the L'Aquila 2009 aftershock catalog. The parameters of the modified Omori model are estimated in an adaptive manner using the Bayesian updating based on the aftershock events that had already taken place at each day elapsed and using the Italian generic sequence (Lolli and Gasperini 2003) as prior information. For the ETAS model, the real-time daily forecast of the spatio-temporal evolution of the L'Aquila sequence provided for the Italian Civil Protection for managing the emergency (Marzocchi and Lombardi, 2009) is utilized. Moreover, the parameters of the ground motion prediction relation proposed by Sabetta and Pugliese (1996) are updated adaptively and on a daily basis using Bayesian updating based on the ongoing aftershock sequence. Finally, the forecasted daily rates of exceeding (first-mode) spectral acceleration values are compared with observed rates of exceedance calculated based on the wave-forms that have actually taken place. References Jalayer, F., Asprone, D., Prota, A., Manfredi, G. (2011). A decision support system for post-earthquake reliability assessment of structures subjected to after-shocks: an application to L'Aquila earthquake, 2009. Bull. Earthquake Eng. 9(4) 997-1014. Jordan, T.H., Chen Y-T., Gasparini P., Madariaga R., Main I., Marzocchi W., Papadopoulos G., Sobolev G., Yamaoka K., and J. Zschau (2011). Operational earthquake forecasting: State of knowledge and guidelines for implementation, Ann. Geophys. 54(4) 315-391, doi 10.4401/ag-5350. Lolli, B., and P. Gasperini (2003). Aftershocks hazard in Italy part I: estimation of time-magnitude distribution model parameters and computation of probabilities of occurrence. Journal of Seismology 7(2) 235-257. Marzocchi, W., and A.M. Lombardi (2009). Real-time forecasting following a damaging earthquake, Geophys. Res. Lett. 36, L21302, doi: 10.1029/2009GL040233. Sabetta F., A. Pugliese (1996) Estimation of response spectra and simulation of nonstationary earthquake ground motions. Bull Seismol Soc Am 86(2) 337-352.
Reasoning and choice in the Monty Hall Dilemma (MHD): implications for improving Bayesian reasoning
Tubau, Elisabet; Aguilar-Lleyda, David; Johnson, Eric D.
2015-01-01
The Monty Hall Dilemma (MHD) is a two-step decision problem involving counterintuitive conditional probabilities. The first choice is made among three equally probable options, whereas the second choice takes place after the elimination of one of the non-selected options which does not hide the prize. Differing from most Bayesian problems, statistical information in the MHD has to be inferred, either by learning outcome probabilities or by reasoning from the presented sequence of events. This often leads to suboptimal decisions and erroneous probability judgments. Specifically, decision makers commonly develop a wrong intuition that final probabilities are equally distributed, together with a preference for their first choice. Several studies have shown that repeated practice enhances sensitivity to the different reward probabilities, but does not facilitate correct Bayesian reasoning. However, modest improvements in probability judgments have been observed after guided explanations. To explain these dissociations, the present review focuses on two types of causes producing the observed biases: Emotional-based choice biases and cognitive limitations in understanding probabilistic information. Among the latter, we identify a crucial cause for the universal difficulty in overcoming the equiprobability illusion: Incomplete representation of prior and conditional probabilities. We conclude that repeated practice and/or high incentives can be effective for overcoming choice biases, but promoting an adequate partitioning of possibilities seems to be necessary for overcoming cognitive illusions and improving Bayesian reasoning. PMID:25873906
Gerber, Brian D.; Kendall, William L.; Hooten, Mevin B.; Dubovsky, James A.; Drewien, Roderick C.
2015-01-01
Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment.Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression.Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring–summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect.Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond.Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions.
AN OPTIMAL MAINTENANCE MANAGEMENT MODEL FOR AIRPORT CONCRETE PAVEMENT
NASA Astrophysics Data System (ADS)
Shimomura, Taizo; Fujimori, Yuji; Kaito, Kiyoyuki; Obama, Kengo; Kobayashi, Kiyoshi
In this paper, an optimal management model is formulated for the performance-based rehabilitation/maintenance contract for airport concrete pavement, whereby two types of life cycle cost risks, i.e., ground consolidation risk and concrete depreciation risk, are explicitly considered. The non-homogenous Markov chain model is formulated to represent the deterioration processes of concrete pavement which are conditional upon the ground consolidation processes. The optimal non-homogenous Markov decision model with multiple types of risk is presented to design the optimal rehabilitation/maintenance plans. And the methodology to revise the optimal rehabilitation/maintenance plans based upon the monitoring data by the Bayesian up-to-dating rules. The validity of the methodology presented in this paper is examined based upon the case studies carried out for the H airport.
Late Holocene volcanic activity and environmental change in Highland Guatemala
NASA Astrophysics Data System (ADS)
Lohse, Jon C.; Hamilton, W. Derek; Brenner, Mark; Curtis, Jason; Inomata, Takeshi; Morgan, Molly; Cardona, Karla; Aoyama, Kazuo; Yonenobu, Hitoshi
2018-07-01
We present a record of late Holocene volcanic eruptions with elemental data for a sequence of sampled tephras from Lake Amatitlan in Highland Guatemala. Our tephrochronology is anchored by a Bayesian P_Sequence age-depth model based on multiple AMS radiocarbon dates. We compare our record against a previously published study from the same area to understand the record of volcanism and environmental changes. This work has implications for understanding the effects of climate and other environmental changes that may be related to the emission of volcanic aerosols at local, regional and global scales.
Rage against the Machine: Evaluation Metrics in the 21st Century
ERIC Educational Resources Information Center
Yang, Charles
2017-01-01
I review the classic literature in generative grammar and Marr's three-level program for cognitive science to defend the Evaluation Metric as a psychological theory of language learning. Focusing on well-established facts of language variation, change, and use, I argue that optimal statistical principles embodied in Bayesian inference models are…
A Bayesian Tutoring System for Newtonian Mechanics: Can It Adapt to Different Learners?
ERIC Educational Resources Information Center
Pek, Peng-Kiat; Poh, Kim-Leng
2004-01-01
Newtonian mechanics is a core module in technology courses, but is difficult for many students to learn. Computerized tutoring can assist the teachers to provide individualized instruction. This article presents the application of decision theory to develop a tutoring system, "iTutor", to select optimal tutoring actions under uncertainty of…
NASA Astrophysics Data System (ADS)
Sheng, Zheng
2013-02-01
The estimation of lower atmospheric refractivity from radar sea clutter (RFC) is a complicated nonlinear optimization problem. This paper deals with the RFC problem in a Bayesian framework. It uses the unbiased Markov Chain Monte Carlo (MCMC) sampling technique, which can provide accurate posterior probability distributions of the estimated refractivity parameters by using an electromagnetic split-step fast Fourier transform terrain parabolic equation propagation model within a Bayesian inversion framework. In contrast to the global optimization algorithm, the Bayesian—MCMC can obtain not only the approximate solutions, but also the probability distributions of the solutions, that is, uncertainty analyses of solutions. The Bayesian—MCMC algorithm is implemented on the simulation radar sea-clutter data and the real radar sea-clutter data. Reference data are assumed to be simulation data and refractivity profiles are obtained using a helicopter. The inversion algorithm is assessed (i) by comparing the estimated refractivity profiles from the assumed simulation and the helicopter sounding data; (ii) the one-dimensional (1D) and two-dimensional (2D) posterior probability distribution of solutions.
Jalem, Randy; Kanamori, Kenta; Takeuchi, Ichiro; Nakayama, Masanobu; Yamasaki, Hisatsugu; Saito, Toshiya
2018-04-11
Safe and robust batteries are urgently requested today for power sources of electric vehicles. Thus, a growing interest has been noted for fabricating those with solid electrolytes. Materials search by density functional theory (DFT) methods offers great promise for finding new solid electrolytes but the evaluation is known to be computationally expensive, particularly on ion migration property. In this work, we proposed a Bayesian-optimization-driven DFT-based approach to efficiently screen for compounds with low ion migration energies ([Formula: see text]. We demonstrated this on 318 tavorite-type Li- and Na-containing compounds. We found that the scheme only requires ~30% of the total DFT-[Formula: see text] evaluations on the average to recover the optimal compound ~90% of the time. Its recovery performance for desired compounds in the tavorite search space is ~2× more than random search (i.e., for [Formula: see text] < 0.3 eV). Our approach offers a promising way for addressing computational bottlenecks in large-scale material screening for fast ionic conductors.
Processing of angular motion and gravity information through an internal model.
Laurens, Jean; Straumann, Dominik; Hess, Bernhard J M
2010-09-01
The vestibular organs in the base of the skull provide important information about head orientation and motion in space. Previous studies have suggested that both angular velocity information from the semicircular canals and information about head orientation and translation from the otolith organs are centrally processed in an internal model of head motion, using the principles of optimal estimation. This concept has been successfully applied to model behavioral responses to classical vestibular motion paradigms. This study measured the dynamic of the vestibuloocular reflex during postrotatory tilt, tilt during the optokinetic afternystagmus, and off-vertical axis rotation. The influence of otolith signal on the VOR was systematically varied by using a series of tilt angles. We found that the time constants of responses varied almost identically as a function of gravity in these paradigms. We show that Bayesian modeling could predict the experimental results in an accurate and consistent manner. In contrast to other approaches, the Bayesian model also provides a plausible explanation of why these vestibulooculo motor responses occur as a consequence of an internal process of optimal motion estimation.
Autonomic Closure for Turbulent Flows Using Approximate Bayesian Computation
NASA Astrophysics Data System (ADS)
Doronina, Olga; Christopher, Jason; Hamlington, Peter; Dahm, Werner
2017-11-01
Autonomic closure is a new technique for achieving fully adaptive and physically accurate closure of coarse-grained turbulent flow governing equations, such as those solved in large eddy simulations (LES). Although autonomic closure has been shown in recent a priori tests to more accurately represent unclosed terms than do dynamic versions of traditional LES models, the computational cost of the approach makes it challenging to implement for simulations of practical turbulent flows at realistically high Reynolds numbers. The optimization step used in the approach introduces large matrices that must be inverted and is highly memory intensive. In order to reduce memory requirements, here we propose to use approximate Bayesian computation (ABC) in place of the optimization step, thereby yielding a computationally-efficient implementation of autonomic closure that trades memory-intensive for processor-intensive computations. The latter challenge can be overcome as co-processors such as general purpose graphical processing units become increasingly available on current generation petascale and exascale supercomputers. In this work, we outline the formulation of ABC-enabled autonomic closure and present initial results demonstrating the accuracy and computational cost of the approach.
Optimal predictions in everyday cognition: the wisdom of individuals or crowds?
Mozer, Michael C; Pashler, Harold; Homaei, Hadjar
2008-10-01
Griffiths and Tenenbaum (2006) asked individuals to make predictions about the duration or extent of everyday events (e.g., cake baking times), and reported that predictions were optimal, employing Bayesian inference based on veridical prior distributions. Although the predictions conformed strikingly to statistics of the world, they reflect averages over many individuals. On the conjecture that the accuracy of the group response is chiefly a consequence of aggregating across individuals, we constructed simple, heuristic approximations to the Bayesian model premised on the hypothesis that individuals have access merely to a sample of k instances drawn from the relevant distribution. The accuracy of the group response reported by Griffiths and Tenenbaum could be accounted for by supposing that individuals each utilize only two instances. Moreover, the variability of the group data is more consistent with this small-sample hypothesis than with the hypothesis that people utilize veridical or nearly veridical representations of the underlying prior distributions. Our analyses lead to a qualitatively different view of how individuals reason from past experience than the view espoused by Griffiths and Tenenbaum. 2008 Cognitive Science Society, Inc.
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
A controllable sensor management algorithm capable of learning
NASA Astrophysics Data System (ADS)
Osadciw, Lisa A.; Veeramacheneni, Kalyan K.
2005-03-01
Sensor management technology progress is challenged by the geographic space it spans, the heterogeneity of the sensors, and the real-time timeframes within which plans controlling the assets are executed. This paper presents a new sensor management paradigm and demonstrates its application in a sensor management algorithm designed for a biometric access control system. This approach consists of an artificial intelligence (AI) algorithm focused on uncertainty measures, which makes the high level decisions to reduce uncertainties and interfaces with the user, integrated cohesively with a bottom up evolutionary algorithm, which optimizes the sensor network"s operation as determined by the AI algorithm. The sensor management algorithm presented is composed of a Bayesian network, the AI algorithm component, and a swarm optimization algorithm, the evolutionary algorithm. Thus, the algorithm can change its own performance goals in real-time and will modify its own decisions based on observed measures within the sensor network. The definition of the measures as well as the Bayesian network determine the robustness of the algorithm and its utility in reacting dynamically to changes in the global system.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Rosenthal, Elisabeth A; Ranchalis, Jane; Crosslin, David R; Burt, Amber; Brunzell, John D; Motulsky, Arno G; Nickerson, Deborah A; Wijsman, Ellen M; Jarvik, Gail P
2013-12-05
Hypertriglyceridemia (HTG) is a heritable risk factor for cardiovascular disease. Investigating the genetics of HTG may identify new drug targets. There are ~35 known single-nucleotide variants (SNVs) that explain only ~10% of variation in triglyceride (TG) level. Because of the genetic heterogeneity of HTG, a family study design is optimal for identification of rare genetic variants with large effect size because the same mutation can be observed in many relatives and cosegregation with TG can be tested. We considered HTG in a five-generation family of European American descent (n = 121), ascertained for familial combined hyperlipidemia. By using Bayesian Markov chain Monte Carlo joint oligogenic linkage and association analysis, we detected linkage to chromosomes 7 and 17. Whole-exome sequence data revealed shared, highly conserved, private missense SNVs in both SLC25A40 on chr7 and PLD2 on chr17. Jointly, these SNVs explained 49% of the genetic variance in TG; however, only the SLC25A40 SNV was significantly associated with TG (p = 0.0001). This SNV, c.374A>G, causes a highly disruptive p.Tyr125Cys substitution just outside the second helical transmembrane region of the SLC25A40 inner mitochondrial membrane transport protein. Whole-gene testing in subjects from the Exome Sequencing Project confirmed the association between TG and SLC25A40 rare, highly conserved, coding variants (p = 0.03). These results suggest a previously undescribed pathway for HTG and illustrate the power of large pedigrees in the search for rare, causal variants. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen
2017-09-25
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
The Empirical Distribution of Singletons for Geographic Samples of DNA Sequences.
Cubry, Philippe; Vigouroux, Yves; François, Olivier
2017-01-01
Rare variants are important for drawing inference about past demographic events in a species history. A singleton is a rare variant for which genetic variation is carried by a unique chromosome in a sample. How singletons are distributed across geographic space provides a local measure of genetic diversity that can be measured at the individual level. Here, we define the empirical distribution of singletons in a sample of chromosomes as the proportion of the total number of singletons that each chromosome carries, and we present a theoretical background for studying this distribution. Next, we use computer simulations to evaluate the potential for the empirical distribution of singletons to provide a description of genetic diversity across geographic space. In a Bayesian framework, we show that the empirical distribution of singletons leads to accurate estimates of the geographic origin of range expansions. We apply the Bayesian approach to estimating the origin of the cultivated plant species Pennisetum glaucum [L.] R. Br . (pearl millet) in Africa, and find support for range expansion having started from Northern Mali. Overall, we report that the empirical distribution of singletons is a useful measure to analyze results of sequencing projects based on large scale sampling of individuals across geographic space.
Numerical optimization using flow equations.
Punk, Matthias
2014-12-01
We develop a method for multidimensional optimization using flow equations. This method is based on homotopy continuation in combination with a maximum entropy approach. Extrema of the optimizing functional correspond to fixed points of the flow equation. While ideas based on Bayesian inference such as the maximum entropy method always depend on a prior probability, the additional step in our approach is to perform a continuous update of the prior during the homotopy flow. The prior probability thus enters the flow equation only as an initial condition. We demonstrate the applicability of this optimization method for two paradigmatic problems in theoretical condensed matter physics: numerical analytic continuation from imaginary to real frequencies and finding (variational) ground states of frustrated (quantum) Ising models with random or long-range antiferromagnetic interactions.
Numerical optimization using flow equations
NASA Astrophysics Data System (ADS)
Punk, Matthias
2014-12-01
We develop a method for multidimensional optimization using flow equations. This method is based on homotopy continuation in combination with a maximum entropy approach. Extrema of the optimizing functional correspond to fixed points of the flow equation. While ideas based on Bayesian inference such as the maximum entropy method always depend on a prior probability, the additional step in our approach is to perform a continuous update of the prior during the homotopy flow. The prior probability thus enters the flow equation only as an initial condition. We demonstrate the applicability of this optimization method for two paradigmatic problems in theoretical condensed matter physics: numerical analytic continuation from imaginary to real frequencies and finding (variational) ground states of frustrated (quantum) Ising models with random or long-range antiferromagnetic interactions.
Reliability of a Bayesian network to predict an elevated aldosterone-to-renin ratio.
Ducher, Michel; Mounier-Véhier, Claire; Lantelme, Pierre; Vaisse, Bernard; Baguet, Jean-Philippe; Fauvel, Jean-Pierre
2015-05-01
Resistant hypertension is common, mainly idiopathic, but sometimes related to primary aldosteronism. Thus, most hypertension specialists recommend screening for primary aldosteronism. To optimize the selection of patients whose aldosterone-to-renin ratio (ARR) is elevated from simple clinical and biological characteristics. Data from consecutive patients referred between 1 June 2008 and 30 May 2009 were collected retrospectively from five French 'European excellence hypertension centres' institutional registers. Patients were included if they had at least one of: onset of hypertension before age 40 years, resistant hypertension, history of hypokalaemia, efficient treatment by spironolactone, and potassium supplementation. An ARR>32 ng/L and aldosterone>160 ng/L in patients treated without agents altering the renin-angiotensin system was considered as elevated. Bayesian network and stepwise logistic regression were used to predict an elevated ARR. Of 334 patients, 89 were excluded (31 for incomplete data, 32 for taking agents that alter the renin-angiotensin system and 26 for other reasons). Among 245 included patients, 110 had an elevated ARR. Sensitivity reached 100% or 63.3% using Bayesian network or logistic regression, respectively, and specificity reached 89.6% or 67.2%, respectively. The area under the receiver-operating-characteristic curve obtained with the Bayesian network was significantly higher than that obtained by stepwise regression (0.93±0.02 vs. 0.70±0.03; P<0.001). In hypertension centres, Bayesian network efficiently detected patients with an elevated ARR. An external validation study is required before use in primary clinical settings. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
Prediction and assimilation of surf-zone processes using a Bayesian network: Part I: Forward models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
Prediction of coastal processes, including waves, currents, and sediment transport, can be obtained from a variety of detailed geophysical-process models with many simulations showing significant skill. This capability supports a wide range of research and applied efforts that can benefit from accurate numerical predictions. However, the predictions are only as accurate as the data used to drive the models and, given the large temporal and spatial variability of the surf zone, inaccuracies in data are unavoidable such that useful predictions require corresponding estimates of uncertainty. We demonstrate how a Bayesian-network model can be used to provide accurate predictions of wave-height evolution in the surf zone given very sparse and/or inaccurate boundary-condition data. The approach is based on a formal treatment of a data-assimilation problem that takes advantage of significant reduction of the dimensionality of the model system. We demonstrate that predictions of a detailed geophysical model of the wave evolution are reproduced accurately using a Bayesian approach. In this surf-zone application, forward prediction skill was 83%, and uncertainties in the model inputs were accurately transferred to uncertainty in output variables. We also demonstrate that if modeling uncertainties were not conveyed to the Bayesian network (i.e., perfect data or model were assumed), then overly optimistic prediction uncertainties were computed. More consistent predictions and uncertainties were obtained by including model-parameter errors as a source of input uncertainty. Improved predictions (skill of 90%) were achieved because the Bayesian network simultaneously estimated optimal parameters while predicting wave heights.
A Comparison of FPGA and GPGPU Designs for Bayesian Occupancy Filters.
Medina, Luis; Diez-Ochoa, Miguel; Correal, Raul; Cuenca-Asensi, Sergio; Serrano, Alejandro; Godoy, Jorge; Martínez-Álvarez, Antonio; Villagra, Jorge
2017-11-11
Grid-based perception techniques in the automotive sector based on fusing information from different sensors and their robust perceptions of the environment are proliferating in the industry. However, one of the main drawbacks of these techniques is the traditionally prohibitive, high computing performance that is required for embedded automotive systems. In this work, the capabilities of new computing architectures that embed these algorithms are assessed in a real car. The paper compares two ad hoc optimized designs of the Bayesian Occupancy Filter; one for General Purpose Graphics Processing Unit (GPGPU) and the other for Field-Programmable Gate Array (FPGA). The resulting implementations are compared in terms of development effort, accuracy and performance, using datasets from a realistic simulator and from a real automated vehicle.
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
Zhou, Mingyuan; Chen, Haojun; Paisley, John; Ren, Lu; Li, Lingbo; Xing, Zhengming; Dunson, David; Sapiro, Guillermo; Carin, Lawrence
2013-01-01
Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature. PMID:21693421
Decentralized Bayesian search using approximate dynamic programming methods.
Zhao, Yijia; Patek, Stephen D; Beling, Peter A
2008-08-01
We consider decentralized Bayesian search problems that involve a team of multiple autonomous agents searching for targets on a network of search points operating under the following constraints: 1) interagent communication is limited; 2) the agents do not have the opportunity to agree in advance on how to resolve equivalent but incompatible strategies; and 3) each agent lacks the ability to control or predict with certainty the actions of the other agents. We formulate the multiagent search-path-planning problem as a decentralized optimal control problem and introduce approximate dynamic heuristics that can be implemented in a decentralized fashion. After establishing some analytical properties of the heuristics, we present computational results for a search problem involving two agents on a 5 x 5 grid.
Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biros, George
Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less
Brayanov, Jordan B.
2010-01-01
Which is heavier: a pound of lead or a pound of feathers? This classic trick question belies a simple but surprising truth: when lifted, the pound of lead feels heavier—a phenomenon known as the size–weight illusion. To estimate the weight of an object, our CNS combines two imperfect sources of information: a prior expectation, based on the object's appearance, and direct sensory information from lifting it. Bayes' theorem (or Bayes' law) defines the statistically optimal way to combine multiple information sources for maximally accurate estimation. Here we asked whether the mechanisms for combining these information sources produce statistically optimal weight estimates for both perceptions and actions. We first studied the ability of subjects to hold one hand steady when the other removed an object from it, under conditions in which sensory information about the object's weight sometimes conflicted with prior expectations based on its size. Since the ability to steady the supporting hand depends on the generation of a motor command that accounts for lift timing and object weight, hand motion can be used to gauge biases in weight estimation by the motor system. We found that these motor system weight estimates reflected the integration of prior expectations with real-time proprioceptive information in a Bayesian, statistically optimal fashion that discounted unexpected sensory information. This produces a motor size–weight illusion that consistently biases weight estimates toward prior expectations. In contrast, when subjects compared the weights of two objects, their perceptions defied Bayes' law, exaggerating the value of unexpected sensory information. This produces a perceptual size–weight illusion that biases weight perceptions away from prior expectations. We term this effect “anti-Bayesian” because the bias is opposite that seen in Bayesian integration. Our findings suggest that two fundamentally different strategies for the integration of prior expectations with sensory information coexist in the nervous system for weight estimation. PMID:20089821
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.
Bauer, Markus; Klau, Gunnar W; Reinert, Knut
2007-07-27
The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.
Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S
2015-08-07
Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.
Guidugli, Lucia; Shimelis, Hermela; Masica, David L; Pankratz, Vernon S; Lipton, Gary B; Singh, Namit; Hu, Chunling; Monteiro, Alvaro N A; Lindor, Noralane M; Goldgar, David E; Karchin, Rachel; Iversen, Edwin S; Couch, Fergus J
2018-01-17
Many variants of uncertain significance (VUS) have been identified in BRCA2 through clinical genetic testing. VUS pose a significant clinical challenge because the contribution of these variants to cancer risk has not been determined. We conducted a comprehensive assessment of VUS in the BRCA2 C-terminal DNA binding domain (DBD) by using a validated functional assay of BRCA2 homologous recombination (HR) DNA-repair activity and defined a classifier of variant pathogenicity. Among 139 variants evaluated, 54 had ≥99% probability of pathogenicity, and 73 had ≥95% probability of neutrality. Functional assay results were compared with predictions of variant pathogenicity from the Align-GVGD protein-sequence-based prediction algorithm, which has been used for variant classification. Relative to the HR assay, Align-GVGD significantly (p < 0.05) over-predicted pathogenic variants. We subsequently combined functional and Align-GVGD prediction results in a Bayesian hierarchical model (VarCall) to estimate the overall probability of pathogenicity for each VUS. In addition, to predict the effects of all other BRCA2 DBD variants and to prioritize variants for functional studies, we used the endoPhenotype-Optimized Sequence Ensemble (ePOSE) algorithm to train classifiers for BRCA2 variants by using data from the HR functional assay. Together, the results show that systematic functional assays in combination with in silico predictors of pathogenicity provide robust tools for clinical annotation of BRCA2 VUS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
An efficient method for model refinement in diffuse optical tomography
NASA Astrophysics Data System (ADS)
Zirak, A. R.; Khademi, M.
2007-11-01
Diffuse optical tomography (DOT) is a non-linear, ill-posed, boundary value and optimization problem which necessitates regularization. Also, Bayesian methods are suitable owing to measurements data are sparse and correlated. In such problems which are solved with iterative methods, for stabilization and better convergence, the solution space must be small. These constraints subject to extensive and overdetermined system of equations which model retrieving criteria specially total least squares (TLS) must to refine model error. Using TLS is limited to linear systems which is not achievable when applying traditional Bayesian methods. This paper presents an efficient method for model refinement using regularized total least squares (RTLS) for treating on linearized DOT problem, having maximum a posteriori (MAP) estimator and Tikhonov regulator. This is done with combination Bayesian and regularization tools as preconditioner matrices, applying them to equations and then using RTLS to the resulting linear equations. The preconditioning matrixes are guided by patient specific information as well as a priori knowledge gained from the training set. Simulation results illustrate that proposed method improves the image reconstruction performance and localize the abnormally well.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
Efficient Posterior Probability Mapping Using Savage-Dickey Ratios
Penny, William D.; Ridgway, Gerard R.
2013-01-01
Statistical Parametric Mapping (SPM) is the dominant paradigm for mass-univariate analysis of neuroimaging data. More recently, a Bayesian approach termed Posterior Probability Mapping (PPM) has been proposed as an alternative. PPM offers two advantages: (i) inferences can be made about effect size thus lending a precise physiological meaning to activated regions, (ii) regions can be declared inactive. This latter facility is most parsimoniously provided by PPMs based on Bayesian model comparisons. To date these comparisons have been implemented by an Independent Model Optimization (IMO) procedure which separately fits null and alternative models. This paper proposes a more computationally efficient procedure based on Savage-Dickey approximations to the Bayes factor, and Taylor-series approximations to the voxel-wise posterior covariance matrices. Simulations show the accuracy of this Savage-Dickey-Taylor (SDT) method to be comparable to that of IMO. Results on fMRI data show excellent agreement between SDT and IMO for second-level models, and reasonable agreement for first-level models. This Savage-Dickey test is a Bayesian analogue of the classical SPM-F and allows users to implement model comparison in a truly interactive manner. PMID:23533640
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Molecular epidemiology of hepatitis B virus in Misiones, Argentina.
Mojsiejczuk, Laura Noelia; Torres, Carolina; Sevic, Ina; Badano, Inés; Malan, Richard; Flichman, Diego Martin; Liotta, Domingo Javier; Campos, Rodolfo Hector
2016-10-01
Hepatitis B virus (HBV) infection is a major public health problem worldwide. The aims of this study were to describe the molecular epidemiology of HBV in the Province of Misiones, Argentina and estimate the phylodynamic of the main groups in a Bayesian coalescent framework. To this end, partial or complete genome sequences were obtained from 52 blood donor candidates. The phylogenetic analysis based on partial sequences of S/P region showed a predominance of genotype D (65.4%), followed by genotype F (30.8%) and genotype A as a minority (3.8%). At subgenotype level, the circulation of subgenotypes D3 (42.3%), D2 (13.5%), F1b (11.5%) and F4 (9.6%) was mainly identified. The Bayesian coalescent analysis of 29 complete genome sequences for the main groups revealed that the subgenotypes D2 and D3 had several introductions to the region, with ancestors dating back from 1921 to 1969 and diversification events until the late '70s. The genotype F in Misiones has a more recent history; subgenotype F4 isolates were intermixed with sequences from Argentina and neighboring countries and only one significant cluster dated back in 1994 was observed. Subgenotype F1b isolates exhibited low genetic distance and formed a closely related monophyletic cluster, suggesting a very recent introduction. In conclusion, the phylogenetic and coalescent analyses showed that the European genotype D has a higher circulation, a longer history of diversification and may be responsible for the largest proportion of chronic HBV infections in the Province of Misiones. Genotype F, especially subgenotype F1b, had a more recent introduction and its diversification in the last 20years might be related to its involvement in new transmission events. Copyright © 2016 Elsevier B.V. All rights reserved.
Epidemic history of hepatitis C virus infection in two remote communities in Nigeria, West Africa.
Forbi, Joseph C; Purdy, Michael A; Campo, David S; Vaughan, Gilberto; Dimitrova, Zoya E; Ganova-Raeva, Lilia M; Xia, Guo-Liang; Khudyakov, Yury E
2012-07-01
We investigated the molecular epidemiology and population dynamics of HCV infection among indigenes of two semi-isolated communities in North-Central Nigeria. Despite remoteness and isolation, ~15% of the population had serological or molecular markers of hepatitis C virus (HCV) infection. Phylogenetic analysis of the NS5b sequences obtained from 60 HCV-infected residents showed that HCV variants belonged to genotype 1 (n=51; 85%) and genotype 2 (n=9; 15%). All sequences were unique and intermixed in the phylogenetic tree with HCV sequences from people infected from other West African countries. The high-throughput 454 pyrosequencing of the HCV hypervariable region 1 and an empirical threshold error correction algorithm were used to evaluate intra-host heterogeneity of HCV strains of genotype 1 (n=43) and genotype 2 (n=6) from residents of the communities. Analysis revealed a rare detectable intermixing of HCV intra-host variants among residents. Identification of genetically close HCV variants among all known groups of relatives suggests a common intra-familial HCV transmission in the communities. Applying Bayesian coalescent analysis to the NS5b sequences, the most recent common ancestors for genotype 1 and 2 variants were estimated to have existed 675 and 286 years ago, respectively. Bayesian skyline plots suggest that HCV lineages of both genotypes identified in the Nigerian communities experienced epidemic growth for 200-300 years until the mid-20th century. The data suggest a massive introduction of numerous HCV variants to the communities during the 20th century in the background of a dynamic evolutionary history of the hepatitis C epidemic in Nigeria over the past three centuries.
Jayachandran, Devaraj; Laínez-Aguirre, José; Rundell, Ann; Vik, Terry; Hannemann, Robert; Reklaitis, Gintaras; Ramkrishna, Doraiswami
2015-01-01
6-Mercaptopurine (6-MP) is one of the key drugs in the treatment of many pediatric cancers, auto immune diseases and inflammatory bowel disease. 6-MP is a prodrug, converted to an active metabolite 6-thioguanine nucleotide (6-TGN) through enzymatic reaction involving thiopurine methyltransferase (TPMT). Pharmacogenomic variation observed in the TPMT enzyme produces a significant variation in drug response among the patient population. Despite 6-MP’s widespread use and observed variation in treatment response, efforts at quantitative optimization of dose regimens for individual patients are limited. In addition, research efforts devoted on pharmacogenomics to predict clinical responses are proving far from ideal. In this work, we present a Bayesian population modeling approach to develop a pharmacological model for 6-MP metabolism in humans. In the face of scarcity of data in clinical settings, a global sensitivity analysis based model reduction approach is used to minimize the parameter space. For accurate estimation of sensitive parameters, robust optimal experimental design based on D-optimality criteria was exploited. With the patient-specific model, a model predictive control algorithm is used to optimize the dose scheduling with the objective of maintaining the 6-TGN concentration within its therapeutic window. More importantly, for the first time, we show how the incorporation of information from different levels of biological chain-of response (i.e. gene expression-enzyme phenotype-drug phenotype) plays a critical role in determining the uncertainty in predicting therapeutic target. The model and the control approach can be utilized in the clinical setting to individualize 6-MP dosing based on the patient’s ability to metabolize the drug instead of the traditional standard-dose-for-all approach. PMID:26226448
Dynamic Denoising of Tracking Sequences
Michailovich, Oleg; Tannenbaum, Allen
2009-01-01
In this paper, we describe an approach to the problem of simultaneously enhancing image sequences and tracking the objects of interest represented by the latter. The enhancement part of the algorithm is based on Bayesian wavelet denoising, which has been chosen due to its exceptional ability to incorporate diverse a priori information into the process of image recovery. In particular, we demonstrate that, in dynamic settings, useful statistical priors can come both from some reasonable assumptions on the properties of the image to be enhanced as well as from the images that have already been observed before the current scene. Using such priors forms the main contribution of the present paper which is the proposal of the dynamic denoising as a tool for simultaneously enhancing and tracking image sequences. Within the proposed framework, the previous observations of a dynamic scene are employed to enhance its present observation. The mechanism that allows the fusion of the information within successive image frames is Bayesian estimation, while transferring the useful information between the images is governed by a Kalman filter that is used for both prediction and estimation of the dynamics of tracked objects. Therefore, in this methodology, the processes of target tracking and image enhancement “collaborate” in an interlacing manner, rather than being applied separately. The dynamic denoising is demonstrated on several examples of SAR imagery. The results demonstrated in this paper indicate a number of advantages of the proposed dynamic denoising over “static” approaches, in which the tracking images are enhanced independently of each other. PMID:18482881
2013-01-01
Background The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. Results We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. Conclusion We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic. PMID:23642077
Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W
2013-05-04
The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.
Intelligent fault recognition strategy based on adaptive optimized multiple centers
NASA Astrophysics Data System (ADS)
Zheng, Bo; Li, Yan-Feng; Huang, Hong-Zhong
2018-06-01
For the recognition principle based optimized single center, one important issue is that the data with nonlinear separatrix cannot be recognized accurately. In order to solve this problem, a novel recognition strategy based on adaptive optimized multiple centers is proposed in this paper. This strategy recognizes the data sets with nonlinear separatrix by the multiple centers. Meanwhile, the priority levels are introduced into the multi-objective optimization, including recognition accuracy, the quantity of optimized centers, and distance relationship. According to the characteristics of various data, the priority levels are adjusted to ensure the quantity of optimized centers adaptively and to keep the original accuracy. The proposed method is compared with other methods, including support vector machine (SVM), neural network, and Bayesian classifier. The results demonstrate that the proposed strategy has the same or even better recognition ability on different distribution characteristics of data.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Rational Irrationality: Modeling Climate Change Belief Polarization Using Bayesian Networks.
Cook, John; Lewandowsky, Stephan
2016-01-01
Belief polarization is said to occur when two people respond to the same evidence by updating their beliefs in opposite directions. This response is considered to be "irrational" because it involves contrary updating, a form of belief updating that appears to violate normatively optimal responding, as for example dictated by Bayes' theorem. In light of much evidence that people are capable of normatively optimal behavior, belief polarization presents a puzzling exception. We show that Bayesian networks, or Bayes nets, can simulate rational belief updating. When fit to experimental data, Bayes nets can help identify the factors that contribute to polarization. We present a study into belief updating concerning the reality of climate change in response to information about the scientific consensus on anthropogenic global warming (AGW). The study used representative samples of Australian and U.S. Among Australians, consensus information partially neutralized the influence of worldview, with free-market supporters showing a greater increase in acceptance of human-caused global warming relative to free-market opponents. In contrast, while consensus information overall had a positive effect on perceived consensus among U.S. participants, there was a reduction in perceived consensus and acceptance of human-caused global warming for strong supporters of unregulated free markets. Fitting a Bayes net model to the data indicated that under a Bayesian framework, free-market support is a significant driver of beliefs about climate change and trust in climate scientists. Further, active distrust of climate scientists among a small number of U.S. conservatives drives contrary updating in response to consensus information among this particular group. Copyright © 2016 Cognitive Science Society, Inc.
Saleh, Mohammad I
2017-11-01
Pegylated interferon α-2a (PEG-IFN-α-2a) is an antiviral drug used for the treatment of chronic hepatitis C virus (HCV) infection. This study describes the population pharmacokinetics of PEG-IFN-α-2a in hepatitis C patients using a Bayesian approach. A possible association between patient characteristics and pharmacokinetic parameters is also explored. A Bayesian population pharmacokinetic modeling approach, using WinBUGS version 1.4.3, was applied to a cohort of patients (n = 292) with chronic HCV infection. Data were obtained from two phase III studies sponsored by Hoffmann-La Roche. Demographic and clinical information were evaluated as possible predictors of pharmacokinetic parameters during model development. A one-compartment model with an additive error best fitted the data, and a total of 2271 PEG-IFN-α-2a measurements from 292 subjects were analyzed using the proposed population pharmacokinetic model. Sex was identified as a predictor of PEG-IFN-α-2a clearance, and hemoglobin baseline level was identified as a predictor of PEG-IFN-α-2a volume of distribution. A population pharmacokinetic model of PEG-IFN-α-2a in patients with chronic HCV infection was presented in this study. The proposed model can be used to optimize PEG-IFN-α-2a dosing in patients with chronic HCV infection. Optimal PEG-IFN-α-2a selection is important to maximize response and/or to avoid potential side effects such as thrombocytopenia and neutropenia. NV15942 and NV15801.
Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C
2013-01-01
Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
The problem of determining the minimum cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete (Wang and Jiang, 1994). Traditionally, point estimations of hypothetical ancestral sequences have been used to gain heuristic, upper bounds on cladogram cost. These include procedures with such diverse approaches as non-additive optimization of multiple sequence alignment, direct optimization (Wheeler, 1996), and fixed-state character optimization (Wheeler, 1999). A method is proposed here which, by extending fixed-state character optimization, replaces the estimation process with a search. This form of optimization examines a diversity of potential state solutions for cost-efficient hypothetical ancestral sequences and can result in greatly more parsimonious cladograms. Additionally, such an approach can be applied to other NP-complete phylogenetic optimization problems such as genomic break-point analysis. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
A framework for quantifying and optimizing the value of seismic monitoring of infrastructure
NASA Astrophysics Data System (ADS)
Omenzetter, Piotr
2017-04-01
This paper outlines a framework for quantifying and optimizing the value of information from structural health monitoring (SHM) technology deployed on large infrastructure, which may sustain damage in a series of earthquakes (the main and the aftershocks). The evolution of the damage state of the infrastructure without or with SHM is presented as a time-dependent, stochastic, discrete-state, observable and controllable nonlinear dynamical system. The pre-posterior Bayesian analysis and the decision tree are used for quantifying and optimizing the value of SHM information. An optimality problem is then formulated how to decide on the adoption of SHM and how to manage optimally the usage and operations of the possibly damaged infrastructure and its repair schedule using the information from SHM. The objective function to minimize is the expected total cost or risk.
Mossadegh, Somayyeh; He, Shan; Parker, Paul
2016-05-01
Various injury severity scores exist for trauma; it is known that they do not correlate accurately to military injuries. A promising anatomical scoring system for blast pelvic and perineal injury led to the development of an improved scoring system using machine-learning techniques. An unbiased genetic algorithm selected optimal anatomical and physiological parameters from 118 military cases. A Naïve Bayesian model was built using the proposed parameters to predict the probability of survival. Ten-fold cross validation was employed to evaluate its performance. Our model significantly out-performed Injury Severity Score (ISS), Trauma ISS, New ISS, and the Revised Trauma Score in virtually all areas; positive predictive value 0.8941, specificity 0.9027, accuracy 0.9056, and area under curve 0.9059. A two-sample t test showed that the predictive performance of the proposed scoring system was significantly better than the other systems (p < 0.001). With limited resources and the simplest of Bayesian methodologies, we have demonstrated that the Naïve Bayesian model performed significantly better in virtually all areas assessed by current scoring systems used for trauma. This is encouraging and highlights that more can be done to improve trauma systems not only for our military injured, but also for civilian trauma victims. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Ennouri, Karim; Ayed, Rayda Ben; Hassen, Hanen Ben; Mazzarello, Maura; Ottaviani, Ennio
2015-12-01
Bacillus thuringiensis (Bt) is a Gram-positive bacterium. The entomopathogenic activity of Bt is related to the existence of the crystal consisting of protoxins, also called delta-endotoxins. In order to optimize and explain the production of delta-endotoxins of Bacillus thuringiensis kurstaki, we studied seven medium components: soybean meal, starch, KH₂PO₄, K₂HPO₄, FeSO₄, MnSO₄, and MgSO₄and their relationships with the concentration of delta-endotoxins using an experimental design (Plackett-Burman design) and Bayesian networks modelling. The effects of the ingredients of the culture medium on delta-endotoxins production were estimated. The developed model showed that different medium components are important for the Bacillus thuringiensis fermentation. The most important factors influenced the production of delta-endotoxins are FeSO₄, K2HPO₄, starch and soybean meal. Indeed, it was found that soybean meal, K₂HPO₄, KH₂PO₄and starch also showed positive effect on the delta-endotoxins production. However, FeSO4 and MnSO4 expressed opposite effect. The developed model, based on Bayesian techniques, can automatically learn emerging models in data to serve in the prediction of delta-endotoxins concentrations. The constructed model in the present study implies that experimental design (Plackett-Burman design) joined with Bayesian networks method could be used for identification of effect variables on delta-endotoxins variation.
Constraining coastal change: A morpho-sedimentological concept to infer sea-level oscillation
NASA Astrophysics Data System (ADS)
Mauz, Barbara; Shen, Zhixiong
2016-04-01
One of the responders to Milankovitch-scale climate changes is sea level which, in turn, is a driver of coastal change. In literature, the sedimentary sequences representing the coastal change are often linked to high sea-level stands, to intermediate sea-level positions or to regressive shorelines. We note apparent contradictions that indicate a lack of concept and inconsistent usage of sea level-related terms. To overcome this, we combine an integrated morpho-sedimentological concept for microtidal, mid-latitudinal coasts with chronologies based on Bayesian statistics. The concept regards the coastal sedimentary system as a depositional complex consisting of shallow-marine, aeolian and alluvial facies. These facies are in juxtaposition and respond simultaneously to external forcing. Bayesian statistics constrains the timing of the sequence based on optical or radiocarbon ages. Here, we present the site Hergla located on the North African coast of the central Mediterranean Sea as a case study to illustrate how the approach helps eliminating contradictions. The site has been cited frequently for confirming the hypothesis of a global two peak sea-level highstand during the last interglacial (MIS 5e). The ~2 km cliff exposure at Hergla was surveyed, mapped, logged and sampled for further describing the sediments and their depositional environment through thin section and Bayesian modelling of optical ages. Using our concept based on sequence stratigraphy tools, the section is interpreted as representing a coastal barrier with two bounding surfaces in the succession. Both surfaces mark the falling sea level of, first, MIS 5e and, second, MIS 5a and hence bound the falling stage system tract of a forced regression. Part of the deposits between the two surfaces are pulled up onto the shoulder of a small rising horst and the associated tectonic event coincided with the MIS 5a sea-level rise enhancing locally the accommodation space for a second foreshore environment. Our presentation will provide theoretical background of the concept and critically discuss the global dataset for last interglacial sea-level oscillations using both the stratigraphic record and age distributions.
Renner, Susanne S; Zhang, Li-Bing
2004-06-01
Pistia stratiotes (water lettuce) and Lemna (duckweeds) are the only free-floating aquatic Araceae. The geographic origin and phylogenetic placement of these unrelated aroids present long-standing problems because of their highly modified reproductive structures and wide geographical distributions. We sampled chloroplast (trnL-trnF and rpl20-rps12 spacers, trnL intron) and mitochondrial sequences (nad1 b/c intron) for all genera implicated as close relatives of Pistia by morphological, restriction site, and sequencing data, and present a hypothesis about its geographic origin based on the consensus of trees obtained from the combined data, using Bayesian, maximum likelihood, parsimony, and distance analyses. Of the 14 genera closest to Pistia, only Alocasia, Arisaema, and Typhonium are species-rich, and the latter two were studied previously, facilitating the choice of representatives that span the roots of these genera. Results indicate that Pistia and the Seychelles endemic Protarum sechellarum are the basalmost branches in a grade comprising the tribes Colocasieae (Ariopsis, Steudnera, Remusatia, Alocasia, Colocasia), Arisaemateae (Arisaema, Pinellia), and Areae (Arum, Biarum, Dracunculus, Eminium, Helicodiceros, Theriophonum, Typhonium). Unexpectedly, all Areae genera are embedded in Typhonium, which throws new light on the geographic history of Areae. A Bayesian analysis of divergence times that explores the effects of multiple fossil and geological calibration points indicates that the Pistia lineage is 90 to 76 million years (my) old. The oldest fossils of the Pistia clade, though not Pistia itself, are 45-my-old leaves from Germany; the closest outgroup, Peltandreae (comprising a few species in Florida, the Mediterranean, and Madagascar), is known from 60-my-old leaves from Europe, Kazakhstan, North Dakota, and Tennessee. Based on the geographic ranges of close relatives, Pistia likely originated in the Tethys region, with Protarum then surviving on the Seychelles, which became isolated from Madagascar and India in the Late Cretaceous (85 my ago). Pistia and Protarum provide striking examples of ancient lineages that appear to have survived in unique or isolated habitats.
Simulation of Optimal Decision-Making Under the Impacts of Climate Change.
Møller, Lea Ravnkilde; Drews, Martin; Larsen, Morten Andreas Dahl
2017-07-01
Climate change causes transformations to the conditions of existing agricultural practices appointing farmers to continuously evaluate their agricultural strategies, e.g., towards optimising revenue. In this light, this paper presents a framework for applying Bayesian updating to simulate decision-making, reaction patterns and updating of beliefs among farmers in a developing country, when faced with the complexity of adapting agricultural systems to climate change. We apply the approach to a case study from Ghana, where farmers seek to decide on the most profitable of three agricultural systems (dryland crops, irrigated crops and livestock) by a continuous updating of beliefs relative to realised trajectories of climate (change), represented by projections of temperature and precipitation. The climate data is based on combinations of output from three global/regional climate model combinations and two future scenarios (RCP4.5 and RCP8.5) representing moderate and unsubstantial greenhouse gas reduction policies, respectively. The results indicate that the climate scenario (input) holds a significant influence on the development of beliefs, net revenues and thereby optimal farming practices. Further, despite uncertainties in the underlying net revenue functions, the study shows that when the beliefs of the farmer (decision-maker) opposes the development of the realised climate, the Bayesian methodology allows for simulating an adjustment of such beliefs, when improved information becomes available. The framework can, therefore, help facilitating the optimal choice between agricultural systems considering the influence of climate change.
Bayesian Networks Predict Neuronal Transdifferentiation.
Ainsworth, Richard I; Ai, Rizi; Ding, Bo; Li, Nan; Zhang, Kai; Wang, Wei
2018-05-30
We employ the language of Bayesian networks to systematically construct gene-regulation topologies from deep-sequencing single-nucleus RNA-Seq data for human neurons. From the perspective of the cell-state potential landscape, we identify attractors that correspond closely to different neuron subtypes. Attractors are also recovered for cell states from an independent data set confirming our models accurate description of global genetic regulations across differing cell types of the neocortex (not included in the training data). Our model recovers experimentally confirmed genetic regulations and community analysis reveals genetic associations in common pathways. Via a comprehensive scan of all theoretical three-gene perturbations of gene knockout and overexpression, we discover novel neuronal trans-differrentiation recipes (including perturbations of SATB2, GAD1, POU6F2 and ADARB2) for excitatory projection neuron and inhibitory interneuron subtypes. Copyright © 2018, G3: Genes, Genomes, Genetics.
Optimal control design of turbo spin‐echo sequences with applications to parallel‐transmit systems
Hoogduin, Hans; Hajnal, Joseph V.; van den Berg, Cornelis A. T.; Luijten, Peter R.; Malik, Shaihan J.
2016-01-01
Purpose The design of turbo spin‐echo sequences is modeled as a dynamic optimization problem which includes the case of inhomogeneous transmit radiofrequency fields. This problem is efficiently solved by optimal control techniques making it possible to design patient‐specific sequences online. Theory and Methods The extended phase graph formalism is employed to model the signal evolution. The design problem is cast as an optimal control problem and an efficient numerical procedure for its solution is given. The numerical and experimental tests address standard multiecho sequences and pTx configurations. Results Standard, analytically derived flip angle trains are recovered by the numerical optimal control approach. New sequences are designed where constraints on radiofrequency total and peak power are included. In the case of parallel transmit application, the method is able to calculate the optimal echo train for two‐dimensional and three‐dimensional turbo spin echo sequences in the order of 10 s with a single central processing unit (CPU) implementation. The image contrast is maintained through the whole field of view despite inhomogeneities of the radiofrequency fields. Conclusion The optimal control design sheds new light on the sequence design process and makes it possible to design sequences in an online, patient‐specific fashion. Magn Reson Med 77:361–373, 2017. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine PMID:26800383
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
The Extrapolation of Elementary Sequences
NASA Technical Reports Server (NTRS)
Laird, Philip; Saul, Ronald
1992-01-01
We study sequence extrapolation as a stream-learning problem. Input examples are a stream of data elements of the same type (integers, strings, etc.), and the problem is to construct a hypothesis that both explains the observed sequence of examples and extrapolates the rest of the stream. A primary objective -- and one that distinguishes this work from previous extrapolation algorithms -- is that the same algorithm be able to extrapolate sequences over a variety of different types, including integers, strings, and trees. We define a generous family of constructive data types, and define as our learning bias a stream language called elementary stream descriptions. We then give an algorithm that extrapolates elementary descriptions over constructive datatypes and prove that it learns correctly. For freely-generated types, we prove a polynomial time bound on descriptions of bounded complexity. An especially interesting feature of this work is the ability to provide quantitative measures of confidence in competing hypotheses, using a Bayesian model of prediction.
Papasotiropoulos, Vasilis; Klossa-Kilia, Elena; Alahiotis, Stamatis N; Kilias, George
2007-08-01
Mitochondrial DNA sequence analysis has been used to explore genetic differentiation and phylogenetic relationships among five species of the Mugilidae family, Mugil cephalus, Chelon labrosus, Liza aurata, Liza ramada, and Liza saliens. DNA was isolated from samples originating from the Messolongi Lagoon in Greece. Three mtDNA segments (12s rRNA, 16s rRNA, and CO I) were PCR amplified and sequenced. Sequencing analysis revealed that the greatest genetic differentiation was observed between M. cephalus and all the other species studied, while C. labrosus and L. aurata were the closest taxa. Dendrograms obtained by the neighbor-joining method and Bayesian inference analysis exhibited the same topology. According to this topology, M. cephalus is the most distinct species and the remaining taxa are clustered together, with C. labrosus and L. aurata forming a single group. The latter result brings into question the monophyletic origin of the genus Liza.
Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C
2007-06-01
The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.
Rybarczyk-Mydłowska, Katarzyna; Maboreke, Hazel Ruvimbo; van Megen, Hanny; van den Elsen, Sven; Mooyman, Paul; Smant, Geert; Bakker, Jaap; Helder, Johannes
2012-11-21
Plant parasitic nematodes are unusual Metazoans as they are equipped with genes that allow for symbiont-independent degradation of plant cell walls. Among the cell wall-degrading enzymes, glycoside hydrolase family 5 (GHF5) cellulases are relatively well characterized, especially for high impact parasites such as root-knot and cyst nematodes. Interestingly, ancestors of extant nematodes most likely acquired these GHF5 cellulases from a prokaryote donor by one or multiple lateral gene transfer events. To obtain insight into the origin of GHF5 cellulases among evolutionary advanced members of the order Tylenchida, cellulase biodiversity data from less distal family members were collected and analyzed. Single nematodes were used to obtain (partial) genomic sequences of cellulases from representatives of the genera Meloidogyne, Pratylenchus, Hirschmanniella and Globodera. Combined Bayesian analysis of ≈ 100 cellulase sequences revealed three types of catalytic domains (A, B, and C). Represented by 84 sequences, type B is numerically dominant, and the overall topology of the catalytic domain type shows remarkable resemblance with trees based on neutral (= pathogenicity-unrelated) small subunit ribosomal DNA sequences. Bayesian analysis further suggested a sister relationship between the lesion nematode Pratylenchus thornei and all type B cellulases from root-knot nematodes. Yet, the relationship between the three catalytic domain types remained unclear. Superposition of intron data onto the cellulase tree suggests that types B and C are related, and together distinct from type A that is characterized by two unique introns. All Tylenchida members investigated here harbored one or multiple GHF5 cellulases. Three types of catalytic domains are distinguished, and the presence of at least two types is relatively common among plant parasitic Tylenchida. Analysis of coding sequences of cellulases suggests that root-knot and cyst nematodes did not acquire this gene directly by lateral genes transfer. More likely, these genes were passed on by ancestors of a family nowadays known as the Pratylenchidae.
Cinelli, Mattia; Sun, , Yuxin; Best, Katharine; Heather, James M.; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-01-01
Abstract Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. Results: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. Contact: b.chain@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073756
Perspective: Stochastic magnetic devices for cognitive computing
NASA Astrophysics Data System (ADS)
Roy, Kaushik; Sengupta, Abhronil; Shim, Yong
2018-06-01
Stochastic switching of nanomagnets can potentially enable probabilistic cognitive hardware consisting of noisy neural and synaptic components. Furthermore, computational paradigms inspired from the Ising computing model require stochasticity for achieving near-optimality in solutions to various types of combinatorial optimization problems such as the Graph Coloring Problem or the Travelling Salesman Problem. Achieving optimal solutions in such problems are computationally exhaustive and requires natural annealing to arrive at the near-optimal solutions. Stochastic switching of devices also finds use in applications involving Deep Belief Networks and Bayesian Inference. In this article, we provide a multi-disciplinary perspective across the stack of devices, circuits, and algorithms to illustrate how the stochastic switching dynamics of spintronic devices in the presence of thermal noise can provide a direct mapping to the computational units of such probabilistic intelligent systems.
Development of Scoring Functions for Antibody Sequence Assessment and Optimization
Seeliger, Daniel
2013-01-01
Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity. PMID:24204701
A novel gammaherpesvirus in a large flying fox (Pteropus vampyrus) with blepharitis.
Paige Brock, A; Cortés-Hinojosa, Galaxia; Plummer, Caryn E; Conway, Julia A; Roff, Shannon R; Childress, April L; Wellehan, James F X
2013-05-01
A novel gammaherpesvirus was identified in a large flying fox (Pteropus vampyrus) with conjunctivitis, blepharitis, and meibomianitis by nested polymerase chain reaction and sequencing. Polymerase chain reaction amplification and sequencing of 472 base pairs of the DNA-dependent DNA polymerase gene were used to identify a novel herpesvirus. Bayesian and maximum likelihood phylogenetic analyses indicated that the virus is a member of the genus Percavirus in the subfamily Gammaherpesvirinae. Additional research is needed regarding the association of this virus with conjunctivitis and other ocular pathology. This virus may be useful as a biomarker of stress and may be a useful model of virus recrudescence in Pteropus spp.
Mammoth and Elephant Phylogenetic Relationships: Mammut Americanum, the Missing Outgroup
Orlando, Ludovic; Hänni, Catherine; Douady, Christophe J.
2007-01-01
At the morphological level, the woolly mammoth has most often been considered as the sister-species of Asian elephants, but at the DNA level, different studies have found support for proximity with African elephants. Recent reports have increased the available sequence data and apparently solved the discrepancy, finding mammoths to be most closely related to Asian elephants. However, we demonstrate here that the three competing topologies have similar likelihood, bayesian and parsimony supports. The analysis further suggests the inadequacy of using Sirenia or Hyracoidea as outgroups. We therefore argue that orthologous sequences from the extinct American mastodon will be required to definitively solve this long-standing question. PMID:19430604
A Comparison of FPGA and GPGPU Designs for Bayesian Occupancy Filters
Medina, Luis; Diez-Ochoa, Miguel; Correal, Raul; Cuenca-Asensi, Sergio; Godoy, Jorge; Martínez-Álvarez, Antonio
2017-01-01
Grid-based perception techniques in the automotive sector based on fusing information from different sensors and their robust perceptions of the environment are proliferating in the industry. However, one of the main drawbacks of these techniques is the traditionally prohibitive, high computing performance that is required for embedded automotive systems. In this work, the capabilities of new computing architectures that embed these algorithms are assessed in a real car. The paper compares two ad hoc optimized designs of the Bayesian Occupancy Filter; one for General Purpose Graphics Processing Unit (GPGPU) and the other for Field-Programmable Gate Array (FPGA). The resulting implementations are compared in terms of development effort, accuracy and performance, using datasets from a realistic simulator and from a real automated vehicle. PMID:29137137
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad
2016-02-01
Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2012-12-01
In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.
Wellehan, James F X; Pessier, Allan P; Archer, Linda L; Childress, April L; Jacobson, Elliott R; Tesh, Robert B
2012-08-17
Rhabdoviruses infect a variety of hosts, including non-avian reptiles. Consensus PCR techniques were used to obtain partial RNA-dependent RNA polymerase gene sequence from five rhabdoviruses of South American lizards; Marco, Chaco, Timbo, Sena Madureira, and a rhabdovirus from a caiman lizard (Dracaena guianensis). The caiman lizard rhabdovirus formed inclusions in erythrocytes, which may be a route for infecting hematophagous insects. This is the first information on behavior of a rhabdovirus in squamates. We also obtained sequence from two rhabdoviruses of Australian lizards, confirming previous Charleville virus sequence and finding that, unlike a previous sequence report but in agreement with serologic reports, Almpiwar virus is clearly distinct from Charleville virus. Bayesian and maximum likelihood phylogenetic analysis revealed that most known rhabdoviruses of squamates cluster in the Almpiwar subgroup. The exception is Marco virus, which is found in the Hart Park group. Copyright © 2012 Elsevier B.V. All rights reserved.
Optimal digital dynamical decoupling for general decoherence via Walsh modulation
NASA Astrophysics Data System (ADS)
Qi, Haoyu; Dowling, Jonathan P.; Viola, Lorenza
2017-11-01
We provide a general framework for constructing digital dynamical decoupling sequences based on Walsh modulation—applicable to arbitrary qubit decoherence scenarios. By establishing equivalence between decoupling design based on Walsh functions and on concatenated projections, we identify a family of optimal Walsh sequences, which can be exponentially more efficient, in terms of the required total pulse number, for fixed cancellation order, than known digital sequences based on concatenated design. Optimal sequences for a given cancellation order are highly non-unique—their performance depending sensitively on the control path. We provide an analytic upper bound to the achievable decoupling error and show how sequences within the optimal Walsh family can substantially outperform concatenated decoupling in principle, while respecting realistic timing constraints.
The solution space of sorting by DCJ.
Braga, Marília D V; Stoye, Jens
2010-09-01
In genome rearrangements, the double cut and join (DCJ) operation, introduced by Yancopoulos et al. in 2005, allows one to represent most rearrangement events that could happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. No restriction on the genome structure considering linear and circular chromosomes is imposed. An advantage of this general model is that it leads to considerable algorithmic simplifications compared to other genome rearrangement models. Recently, several works concerning the DCJ operation have been published, and in particular, an algorithm was proposed to find an optimal DCJ sequence for sorting one genome into another one. Here we study the solution space of this problem and give an easy-to-compute formula that corresponds to the exact number of optimal DCJ sorting sequences for a particular subset of instances of the problem. We also give an algorithm to count the number of optimal sorting sequences for any instance of the problem. Another interesting result is the demonstration of the possibility of obtaining one optimal sorting sequence by properly replacing any pair of consecutive operations in another optimal sequence. As a consequence, any optimal sorting sequence can be obtained from one other by applying such replacements successively, but the problem of finding the shortest number of replacements between two sorting sequences is still open.
"New turns from old STaRs": enhancing the capabilities of forensic short tandem repeat analysis.
Phillips, Christopher; Gelabert-Besada, Miguel; Fernandez-Formoso, Luis; García-Magariños, Manuel; Santos, Carla; Fondevila, Manuel; Ballard, David; Syndercombe Court, Denise; Carracedo, Angel; Lareu, Maria Victoria
2014-11-01
The field of research and development of forensic STR genotyping remains active, innovative, and focused on continuous improvements. A series of recent developments including the introduction of a sixth dye have brought expanded STR multiplex sizes while maintaining sensitivity to typical forensic DNA. New supplementary kits complimenting the core STRs have also helped improve analysis of challenging identification cases such as distant pairwise relationships in deficient pedigrees. This article gives an overview of several recent key developments in forensic STR analysis: availability of expanded core STR kits and supplementary STRs, short-amplicon mini-STRs offering practical options for highly degraded DNA, Y-STR enhancements made from the identification of rapidly mutating loci, and enhanced analysis of genetic ancestry by analyzing 32-STR profiles with a Bayesian forensic classifier originally developed for SNP population data. As well as providing scope for genotyping larger numbers of STRs optimized for forensic applications, the launch of compact next-generation sequencing systems provides considerable potential for genotyping the sizeable proportion of nucleotide variation existing in forensic STRs, which currently escapes detection with CE. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Integral modeling of human eyes: from anatomy to visual response
NASA Astrophysics Data System (ADS)
Navarro, Rafael
2006-02-01
Three basic stages towards the global modeling of the eye are presented. In the first stage, an adequate choice of the basis geometrical model, general ellipsoid in this case, permits, to fit in a natural way the typical "melon" shape of the cornea with minimum complexity. In addition it facilitates to extract most of its optically relevant parameters, such as the position and orientation of it optical axis in the 3D space, the paraxial and overall refractive power, the amount and axis of astigmatism, etc. In the second level, this geometrical model, along with optical design and optimization tools, is applied to build customized optical models of individual eyes, able to reproduce the measured wave aberration with high fidelity. Finally, we put together a sequence of schematic, but functionally realistic models of the different stages of image acquisition, coding and analysis in the visual system, along with a probabilistic Bayesian maximum a posteriori identification approach. This permitted us to build a realistic simulation of the all the essential processes involved in a visual acuity clinical exam. It is remarkable that at all three levels, it has been possible for the models to predict the experimental data with high accuracy.
Ecological statistics of Gestalt laws for the perceptual organization of contours.
Elder, James H; Goldberg, Richard M
2002-01-01
Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.
Gerber, Brian D; Kendall, William L; Hooten, Mevin B; Dubovsky, James A; Drewien, Roderick C
2015-09-01
1. Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment. 2. Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression. 3. Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring-summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect. 4. Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond. 5. Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
Optimism as a Prior Belief about the Probability of Future Reward
Kalra, Aditi; Seriès, Peggy
2014-01-01
Optimists hold positive a priori beliefs about the future. In Bayesian statistical theory, a priori beliefs can be overcome by experience. However, optimistic beliefs can at times appear surprisingly resistant to evidence, suggesting that optimism might also influence how new information is selected and learned. Here, we use a novel Pavlovian conditioning task, embedded in a normative framework, to directly assess how trait optimism, as classically measured using self-report questionnaires, influences choices between visual targets, by learning about their association with reward progresses. We find that trait optimism relates to an a priori belief about the likelihood of rewards, but not losses, in our task. Critically, this positive belief behaves like a probabilistic prior, i.e. its influence reduces with increasing experience. Contrary to findings in the literature related to unrealistic optimism and self-beliefs, it does not appear to influence the iterative learning process directly. PMID:24853098
Mycofier: a new machine learning-based classifier for fungal ITS sequences.
Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel
2016-08-11
The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
Xiao, Hu; Cui, Rongxin; Xu, Demin
2018-06-01
This paper presents a cooperative multiagent search algorithm to solve the problem of searching for a target on a 2-D plane under multiple constraints. A Bayesian framework is used to update the local probability density functions (PDFs) of the target when the agents obtain observation information. To obtain the global PDF used for decision making, a sampling-based logarithmic opinion pool algorithm is proposed to fuse the local PDFs, and a particle sampling approach is used to represent the continuous PDF. Then the Gaussian mixture model (GMM) is applied to reconstitute the global PDF from the particles, and a weighted expectation maximization algorithm is presented to estimate the parameters of the GMM. Furthermore, we propose an optimization objective which aims to guide agents to find the target with less resource consumptions, and to keep the resource consumption of each agent balanced simultaneously. To this end, a utility function-based optimization problem is put forward, and it is solved by a gradient-based approach. Several contrastive simulations demonstrate that compared with other existing approaches, the proposed one uses less overall resources and shows a better performance of balancing the resource consumption.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Zhu, Tianqi; Dos Reis, Mario; Yang, Ziheng
2015-03-01
Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
Vrancken, Bram; Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Longdon, Ben; Günthard, Huldrych F.; Suchard, Marc A.
2014-01-01
Phylogenetic signal quantifies the degree to which resemblance in continuously-valued traits reflects phylogenetic relatedness. Measures of phylogenetic signal are widely used in ecological and evolutionary research, and are recently gaining traction in viral evolutionary studies. Standard estimators of phylogenetic signal frequently condition on data summary statistics of the repeated trait observations and fixed phylogenetics trees, resulting in information loss and potential bias. To incorporate the observation process and phylogenetic uncertainty in a model-based approach, we develop a novel Bayesian inference method to simultaneously estimate the evolutionary history and phylogenetic signal from molecular sequence data and repeated multivariate traits. Our approach builds upon a phylogenetic diffusion framework that model continuous trait evolution as a Brownian motion process and incorporates Pagel’s λ transformation parameter to estimate dependence among traits. We provide a computationally efficient inference implementation in the BEAST software package. We evaluate the synthetic performance of the Bayesian estimator of phylogenetic signal against standard estimators, and demonstrate the use of our coherent framework to address several virus-host evolutionary questions, including virulence heritability for HIV, antigenic evolution in influenza and HIV, and Drosophila sensitivity to sigma virus infection. Finally, we discuss model extensions that will make useful contributions to our flexible framework for simultaneously studying sequence and trait evolution. PMID:25780554
The complete mitochondrial genomes of five Eimeria species infecting domestic rabbits.
Liu, Guo-Hua; Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Wang, Chun-Ren; Zhu, Xing-Quan
2015-12-01
Rabbit coccidiosis caused by members of the genus Eimeria can cause enormous economic impact worldwide, but the genetics, epidemiology and biology of these parasites remain poorly understood. In the present study, we sequenced and annotated the complete mitochondrial (mt) genomes of five Eimeria species that commonly infect the domestic rabbits. The complete mt genomes of Eimeria intestinalis, Eimeria flavescens, Eimeria media, Eimeria vejdovskyi and Eimeria irresidua were 6261bp, 6258bp, 6168bp, 6254bp, 6259bp in length, respectively. All of the mt genomes consist of 3 genes for proteins (cytb, cox1, and cox3), 14 gene fragments for the large subunit (LSU) rRNA and 11 gene fragments for the small subunit (SSU) rRNA, but no transfer RNA (tRNA) genes. The gene order of the mt genomes is similar to that of Plasmodium, but distinct from Haemosporida and Theileria. Phylogenetic analyses based on full nucleotide sequences using Bayesian analysis revealed that the monophyly of the Eimeria of rabbits was strongly statistically supported with a Bayesian posterior probabilities. These data provide novel mtDNA markers for studying the population genetics and molecular epidemiology of the Eimeria species, and should have implications for the molecular diagnosis, prevention and control of coccidiosis in rabbits. Copyright © 2015 Elsevier Inc. All rights reserved.
Does History Repeat Itself? Wavelets and the Phylodynamics of Influenza A
Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.
2012-01-01
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters. PMID:22160768
Montazerhodjat, Vahid; Chaudhuri, Shomesh E; Sargent, Daniel J; Lo, Andrew W
2017-09-14
Randomized clinical trials (RCTs) currently apply the same statistical threshold of alpha = 2.5% for controlling for false-positive results or type 1 error, regardless of the burden of disease or patient preferences. Is there an objective and systematic framework for designing RCTs that incorporates these considerations on a case-by-case basis? To apply Bayesian decision analysis (BDA) to cancer therapeutics to choose an alpha and sample size that minimize the potential harm to current and future patients under both null and alternative hypotheses. We used the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) database and data from the 10 clinical trials of the Alliance for Clinical Trials in Oncology. The NCI SEER database was used because it is the most comprehensive cancer database in the United States. The Alliance trial data was used owing to the quality and breadth of data, and because of the expertise in these trials of one of us (D.J.S.). The NCI SEER and Alliance data have already been thoroughly vetted. Computations were replicated independently by 2 coauthors and reviewed by all coauthors. Our prior hypothesis was that an alpha of 2.5% would not minimize the overall expected harm to current and future patients for the most deadly cancers, and that a less conservative alpha may be necessary. Our primary study outcomes involve measuring the potential harm to patients under both null and alternative hypotheses using NCI and Alliance data, and then computing BDA-optimal type 1 error rates and sample sizes for oncology RCTs. We computed BDA-optimal parameters for the 23 most common cancer sites using NCI data, and for the 10 Alliance clinical trials. For RCTs involving therapies for cancers with short survival times, no existing treatments, and low prevalence, the BDA-optimal type 1 error rates were much higher than the traditional 2.5%. For cancers with longer survival times, existing treatments, and high prevalence, the corresponding BDA-optimal error rates were much lower, in some cases even lower than 2.5%. Bayesian decision analysis is a systematic, objective, transparent, and repeatable process for deciding the outcomes of RCTs that explicitly incorporates burden of disease and patient preferences.
Montazerhodjat, Vahid; Chaudhuri, Shomesh E.; Sargent, Daniel J.
2017-01-01
Importance Randomized clinical trials (RCTs) currently apply the same statistical threshold of alpha = 2.5% for controlling for false-positive results or type 1 error, regardless of the burden of disease or patient preferences. Is there an objective and systematic framework for designing RCTs that incorporates these considerations on a case-by-case basis? Objective To apply Bayesian decision analysis (BDA) to cancer therapeutics to choose an alpha and sample size that minimize the potential harm to current and future patients under both null and alternative hypotheses. Data Sources We used the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) database and data from the 10 clinical trials of the Alliance for Clinical Trials in Oncology. Study Selection The NCI SEER database was used because it is the most comprehensive cancer database in the United States. The Alliance trial data was used owing to the quality and breadth of data, and because of the expertise in these trials of one of us (D.J.S.). Data Extraction and Synthesis The NCI SEER and Alliance data have already been thoroughly vetted. Computations were replicated independently by 2 coauthors and reviewed by all coauthors. Main Outcomes and Measures Our prior hypothesis was that an alpha of 2.5% would not minimize the overall expected harm to current and future patients for the most deadly cancers, and that a less conservative alpha may be necessary. Our primary study outcomes involve measuring the potential harm to patients under both null and alternative hypotheses using NCI and Alliance data, and then computing BDA-optimal type 1 error rates and sample sizes for oncology RCTs. Results We computed BDA-optimal parameters for the 23 most common cancer sites using NCI data, and for the 10 Alliance clinical trials. For RCTs involving therapies for cancers with short survival times, no existing treatments, and low prevalence, the BDA-optimal type 1 error rates were much higher than the traditional 2.5%. For cancers with longer survival times, existing treatments, and high prevalence, the corresponding BDA-optimal error rates were much lower, in some cases even lower than 2.5%. Conclusions and Relevance Bayesian decision analysis is a systematic, objective, transparent, and repeatable process for deciding the outcomes of RCTs that explicitly incorporates burden of disease and patient preferences. PMID:28418507
Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu
2002-01-01
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences. PMID:12136032
Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu
2002-07-01
Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.
NASA Astrophysics Data System (ADS)
Leube, Philipp; Geiges, Andreas; Nowak, Wolfgang
2010-05-01
Incorporating hydrogeological data, such as head and tracer data, into stochastic models of subsurface flow and transport helps to reduce prediction uncertainty. Considering limited financial resources available for the data acquisition campaign, information needs towards the prediction goal should be satisfied in a efficient and task-specific manner. For finding the best one among a set of design candidates, an objective function is commonly evaluated, which measures the expected impact of data on prediction confidence, prior to their collection. An appropriate approach to this task should be stochastically rigorous, master non-linear dependencies between data, parameters and model predictions, and allow for a wide variety of different data types. Existing methods fail to fulfill all these requirements simultaneously. For this reason, we introduce a new method, denoted as CLUE (Cross-bred Likelihood Uncertainty Estimator), that derives the essential distributions and measures of data utility within a generalized, flexible and accurate framework. The method makes use of Bayesian GLUE (Generalized Likelihood Uncertainty Estimator) and extends it to an optimal design method by marginalizing over the yet unknown data values. Operating in a purely Bayesian Monte-Carlo framework, CLUE is a strictly formal information processing scheme free of linearizations. It provides full flexibility associated with the type of measurements (linear, non-linear, direct, indirect) and accounts for almost arbitrary sources of uncertainty (e.g. heterogeneity, geostatistical assumptions, boundary conditions, model concepts) via stochastic simulation and Bayesian model averaging. This helps to minimize the strength and impact of possible subjective prior assumptions, that would be hard to defend prior to data collection. Our study focuses on evaluating two different uncertainty measures: (i) expected conditional variance and (ii) expected relative entropy of a given prediction goal. The applicability and advantages are shown in a synthetic example. Therefor, we consider a contaminant source, posing a threat on a drinking water well in an aquifer. Furthermore, we assume uncertainty in geostatistical parameters, boundary conditions and hydraulic gradient. The two mentioned measures evaluate the sensitivity of (1) general prediction confidence and (2) exceedance probability of a legal regulatory threshold value on sampling locations.
NASA's Human Mission to a Near-Earth Asteroid: Landing on a Moving Target
NASA Technical Reports Server (NTRS)
Smith, Jeffrey H.; Lincoln, William P.; Weisbin, Charles R.
2011-01-01
This paper describes a Bayesian approach for comparing the productivity and cost-risk tradeoffs of sending versus not sending one or more robotic surveyor missions prior to a human mission to land on an asteroid. The expected value of sample information based on productivity combined with parametric variations in the prior probability an asteroid might be found suitable for landing were used to assess the optimal number of spacecraft and asteroids to survey. The analysis supports the value of surveyor missions to asteroids and indicates one launch with two spacecraft going simultaneously to two independent asteroids appears optimal.
Stochastic Optimal Prediction with Application to Averaged Euler Equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bell, John; Chorin, Alexandre J.; Crutchfield, William
Optimal prediction (OP) methods compensate for a lack of resolution in the numerical solution of complex problems through the use of an invariant measure as a prior measure in the Bayesian sense. In first-order OP, unresolved information is approximated by its conditional expectation with respect to the invariant measure. In higher-order OP, unresolved information is approximated by a stochastic estimator, leading to a system of random or stochastic differential equations. We explain the ideas through a simple example, and then apply them to the solution of Averaged Euler equations in two space dimensions.