Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
A Bayesian Nonparametric Approach to Test Equating
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
2016-05-31
and included explosives such as TATP, HMTD, RDX, RDX, ammonium nitrate , potassium perchlorate, potassium nitrate , sugar, and TNT. The approach...Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2. d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2. d Bayesian and Non-parametric Statistics: Integration of Neural
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes.
Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D
2016-10-01
This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. © The Author 2016. Published by Oxford University Press.
A Bayesian nonparametric approach to dynamical noise reduction
NASA Astrophysics Data System (ADS)
Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2018-06-01
We propose a Bayesian nonparametric approach for the noise reduction of a given chaotic time series contaminated by dynamical noise, based on Markov Chain Monte Carlo methods. The underlying unknown noise process (possibly) exhibits heavy tailed behavior. We introduce the Dynamic Noise Reduction Replicator model with which we reconstruct the unknown dynamic equations and in parallel we replicate the dynamics under reduced noise level dynamical perturbations. The dynamic noise reduction procedure is demonstrated specifically in the case of polynomial maps. Simulations based on synthetic time series are presented.
Accurate Biomass Estimation via Bayesian Adaptive Sampling
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay
2005-01-01
The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.
Ponciano, José Miguel
2017-11-22
Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Kharroubi, Samer A; Brazier, John E; McGhee, Sarah
2013-01-01
This article reports on the findings from applying a recently described approach to modeling health state valuation data and the impact of the respondent characteristics on health state valuations. The approach applies a nonparametric model to estimate a Bayesian six-dimensional health state short form (derived from short-form 36 health survey) health state valuation algorithm. A sample of 197 states defined by the six-dimensional health state short form (derived from short-form 36 health survey)has been valued by a representative sample of the Hong Kong general population by using standard gamble. The article reports the application of the nonparametric model and compares it to the original model estimated by using a conventional parametric random effects model. The two models are compared theoretically and in terms of empirical performance. Advantages of the nonparametric model are that it can be used to predict scores in populations with different distributions of characteristics than observed in the survey sample and that it allows for the impact of respondent characteristics to vary by health state (while ensuring that full health passes through unity). The results suggest an important age effect with sex, having some effect, but the remaining covariates having no discernible effect. The nonparametric Bayesian model is argued to be more theoretically appropriate than previously used parametric models. Furthermore, it is more flexible to take into account the impact of covariates. Copyright © 2013, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.
Nonparametric Bayesian predictive distributions for future order statistics
Richard A. Johnson; James W. Evans; David W. Green
1999-01-01
We derive the predictive distribution for a specified order statistic, determined from a future random sample, under a Dirichlet process prior. Two variants of the approach are treated and some limiting cases studied. A practical application to monitoring the strength of lumber is discussed including choices of prior expectation and comparisons made to a Bayesian...
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
A Bayesian Nonparametric Approach to Image Super-Resolution.
Polatkan, Gungor; Zhou, Mingyuan; Carin, Lawrence; Blei, David; Daubechies, Ingrid
2015-02-01
Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called dictionary elements, from the data. Because it is nonparametric, the number of elements found is also determined from the data. We test the results on both benchmark and natural images, comparing with several other models from the research literature. We perform large-scale human evaluation experiments to assess the visual quality of the results. In a first implementation, we use Gibbs sampling to approximate the posterior. However, this algorithm is not feasible for large-scale data. To circumvent this, we then develop an online variational Bayes (VB) algorithm. This algorithm finds high quality dictionaries in a fraction of the time needed by the Gibbs sampler.
Bayesian estimation of the discrete coefficient of determination.
Chen, Ting; Braga-Neto, Ulisses M
2016-12-01
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
Bayesian dynamic mediation analysis.
Huang, Jing; Yuan, Ying
2017-12-01
Most existing methods for mediation analysis assume that mediation is a stationary, time-invariant process, which overlooks the inherently dynamic nature of many human psychological processes and behavioral activities. In this article, we consider mediation as a dynamic process that continuously changes over time. We propose Bayesian multilevel time-varying coefficient models to describe and estimate such dynamic mediation effects. By taking the nonparametric penalized spline approach, the proposed method is flexible and able to accommodate any shape of the relationship between time and mediation effects. Simulation studies show that the proposed method works well and faithfully reflects the true nature of the mediation process. By modeling mediation effect nonparametrically as a continuous function of time, our method provides a valuable tool to help researchers obtain a more complete understanding of the dynamic nature of the mediation process underlying psychological and behavioral phenomena. We also briefly discuss an alternative approach of using dynamic autoregressive mediation model to estimate the dynamic mediation effect. The computer code is provided to implement the proposed Bayesian dynamic mediation analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.
Macaro, Christian; Prado, Raquel
2014-01-01
We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Bayesian Unimodal Density Regression for Causal Inference
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2011-01-01
Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
NASA Astrophysics Data System (ADS)
Agapiou, Sergios; Burger, Martin; Dashti, Masoumeh; Helin, Tapio
2018-04-01
We consider the inverse problem of recovering an unknown functional parameter u in a separable Banach space, from a noisy observation vector y of its image through a known possibly non-linear map {{\\mathcal G}} . We adopt a Bayesian approach to the problem and consider Besov space priors (see Lassas et al (2009 Inverse Problems Imaging 3 87-122)), which are well-known for their edge-preserving and sparsity-promoting properties and have recently attracted wide attention especially in the medical imaging community. Our key result is to show that in this non-parametric setup the maximum a posteriori (MAP) estimates are characterized by the minimizers of a generalized Onsager-Machlup functional of the posterior. This is done independently for the so-called weak and strong MAP estimates, which as we show coincide in our context. In addition, we prove a form of weak consistency for the MAP estimators in the infinitely informative data limit. Our results are remarkable for two reasons: first, the prior distribution is non-Gaussian and does not meet the smoothness conditions required in previous research on non-parametric MAP estimates. Second, the result analytically justifies existing uses of the MAP estimate in finite but high dimensional discretizations of Bayesian inverse problems with the considered Besov priors.
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)
ERIC Educational Resources Information Center
Arenson, Ethan A.; Karabatsos, George
2017-01-01
Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
Advanced Imaging Methods for Long-Baseline Optical Interferometry
NASA Astrophysics Data System (ADS)
Le Besnerais, G.; Lacour, S.; Mugnier, L. M.; Thiebaut, E.; Perrin, G.; Meimon, S.
2008-11-01
We address the data processing methods needed for imaging with a long baseline optical interferometer. We first describe parametric reconstruction approaches and adopt a general formulation of nonparametric image reconstruction as the solution of a constrained optimization problem. Within this framework, we present two recent reconstruction methods, Mira and Wisard, representative of the two generic approaches for dealing with the missing phase information. Mira is based on an implicit approach and a direct optimization of a Bayesian criterion while Wisard adopts a self-calibration approach and an alternate minimization scheme inspired from radio-astronomy. Both methods can handle various regularization criteria. We review commonly used regularization terms and introduce an original quadratic regularization called ldquosoft support constraintrdquo that favors the object compactness. It yields images of quality comparable to nonquadratic regularizations on the synthetic data we have processed. We then perform image reconstructions, both parametric and nonparametric, on astronomical data from the IOTA interferometer, and discuss the respective roles of parametric and nonparametric approaches for optical interferometric imaging.
Analyzing Single-Molecule Time Series via Nonparametric Bayesian Inference
Hines, Keegan E.; Bankston, John R.; Aldrich, Richard W.
2015-01-01
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. PMID:25650922
Discriminative Bayesian Dictionary Learning for Classification.
Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal
2016-12-01
We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.
ERIC Educational Resources Information Center
Finch, Holmes; Edwards, Julianne M.
2016-01-01
Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M
2002-11-01
Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study. Copyright 2002 Elsevier Science (USA)
Nonparametric Bayesian models through probit stick-breaking processes
Rodríguez, Abel; Dunson, David B.
2013-01-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology. PMID:24358072
Nonparametric Bayesian models through probit stick-breaking processes.
Rodríguez, Abel; Dunson, David B
2011-03-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.
Nonparametric Bayesian inference of the microcanonical stochastic block model
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Marginally specified priors for non-parametric Bayesian estimation
Kessler, David C.; Hoff, Peter D.; Dunson, David B.
2014-01-01
Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813
Bayesian decoding using unsorted spikes in the rat hippocampus
Layton, Stuart P.; Chen, Zhe; Wilson, Matthew A.
2013-01-01
A fundamental task in neuroscience is to understand how neural ensembles represent information. Population decoding is a useful tool to extract information from neuronal populations based on the ensemble spiking activity. We propose a novel Bayesian decoding paradigm to decode unsorted spikes in the rat hippocampus. Our approach uses a direct mapping between spike waveform features and covariates of interest and avoids accumulation of spike sorting errors. Our decoding paradigm is nonparametric, encoding model-free for representing stimuli, and extracts information from all available spikes and their waveform features. We apply the proposed Bayesian decoding algorithm to a position reconstruction task for freely behaving rats based on tetrode recordings of rat hippocampal neuronal activity. Our detailed decoding analyses demonstrate that our approach is efficient and better utilizes the available information in the nonsortable hash than the standard sorting-based decoding algorithm. Our approach can be adapted to an online encoding/decoding framework for applications that require real-time decoding, such as brain-machine interfaces. PMID:24089403
Bayesian hierarchical functional data analysis via contaminated informative priors.
Scarpa, Bruno; Dunson, David B
2009-09-01
A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by applications to modeling of temperature curves in the menstrual cycle, this article proposes a flexible approach for incorporating prior information in semiparametric Bayesian analyses of hierarchical functional data. The proposed approach is based on specifying the distribution of functions as a mixture of a parametric hierarchical model and a nonparametric contamination. The parametric component is chosen based on prior knowledge, while the contamination is characterized as a functional Dirichlet process. In the motivating application, the contamination component allows unanticipated curve shapes in unhealthy menstrual cycles. Methods are developed for posterior computation, and the approach is applied to data from a European fecundability study.
Nonparametric Bayesian clustering to detect bipolar methylated genomic loci.
Wu, Xiaowei; Sun, Ming-An; Zhu, Hongxiao; Xie, Hehuang
2015-01-16
With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms. Highly variable methylation patterns reflect stochastic fluctuations in DNA methylation, whereas well-structured methylation patterns imply deterministic methylation events. Among these methylation patterns, bipolar patterns are important as they may originate from allele-specific methylation (ASM) or cell-specific methylation (CSM). Utilizing nonparametric Bayesian clustering followed by hypothesis testing, we have developed a novel statistical approach to identify bipolar methylated genomic regions in bisulfite sequencing data. Simulation studies demonstrate that the proposed method achieves good performance in terms of specificity and sensitivity. We used the method to analyze data from mouse brain and human blood methylomes. The bipolar methylated segments detected are found highly consistent with the differentially methylated regions identified by using purified cell subsets. Bipolar DNA methylation often indicates epigenetic heterogeneity caused by ASM or CSM. With allele-specific events filtered out or appropriately taken into account, our proposed approach sheds light on the identification of cell-specific genes/pathways under strong epigenetic control in a heterogeneous cell population.
Network structure exploration in networks with node attributes
NASA Astrophysics Data System (ADS)
Chen, Yi; Wang, Xiaolong; Bu, Junzhao; Tang, Buzhou; Xiang, Xin
2016-05-01
Complex networks provide a powerful way to represent complex systems and have been widely studied during the past several years. One of the most important tasks of network analysis is to detect structures (also called structural regularities) embedded in networks by determining group number and group partition. Most of network structure exploration models only consider network links. However, in real world networks, nodes may have attributes that are useful for network structure exploration. In this paper, we propose a novel Bayesian nonparametric (BNP) model to explore structural regularities in networks with node attributes, called Bayesian nonparametric attribute (BNPA) model. This model does not only take full advantage of both links between nodes and node attributes for group partition via shared hidden variables, but also determine group number automatically via the Bayesian nonparametric theory. Experiments conducted on a number of real and synthetic networks show that our BNPA model is able to automatically explore structural regularities in networks with node attributes and is competitive with other state-of-the-art models.
A Comparison of Japan and U.K. SF-6D Health-State Valuations Using a Non-Parametric Bayesian Method.
Kharroubi, Samer A
2015-08-01
There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. We sought to estimate and compare two directly elicited valuations for SF-6D health states between the Japan and U.K. general adult populations using Bayesian methods. We analysed data from two SF-6D valuation studies where, using similar standard gamble protocols, values for 241 and 249 states were elicited from representative samples of the Japan and U.K. general adult populations, respectively. We estimate a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using data from both countries. The results suggest that differences in SF-6D health-state valuations between the Japan and U.K. general populations are potentially important. The magnitude of these country-specific differences in health-state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian non-parametric method is a powerful approach for analysing data from multiple nationalities or ethnic groups, to understand the differences between them and potentially to estimate the underlying utility functions more efficiently.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Bayesian Nonparametric Inference – Why and How
Müller, Peter; Mitra, Riten
2013-01-01
We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
Bhattacharya, Abhishek; Dunson, David B.
2012-01-01
This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels. PMID:22984295
Palacios, Julia A; Minin, Vladimir N
2013-03-01
Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method. Copyright © 2013, The International Biometric Society.
Kharroubi, Samer A; Brazier, John E; McGhee, Sarah
2014-06-01
There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. The present study applies a nonparametric model to estimate and compare two HK and UK standard gamble values for six-dimensional health state short form (derived from short-form 36 health survey) (SF-6D) health states using Bayesian methods. The data set is the HK and UK SF-6D valuation studies in which two samples of 197 and 249 states defined by the SF-6D were valued by representative samples of the HK and UK general populations, respectively, both using the standard gamble technique. We estimated a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using the data from both countries. The results suggest that differences in SF-6D health state valuations between the UK and HK general populations are potentially important. In particular, the valuations of Hong Kong were meaningfully higher than those of the United Kingdom for most of the selected SF-6D health states. The magnitude of these country-specific differences in health state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian nonparametric method is a powerful approach for analyzing data from multiple nationalities or ethnic groups to understand the differences between them and potentially to estimate the underlying utility functions more efficiently. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data
NASA Technical Reports Server (NTRS)
Scott, D. W.; Jee, R.
1984-01-01
The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.
Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence
2010-11-09
Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Nonparametric Bayesian Context Learning for Buried Threat Detection
2012-01-01
scans of field-collected GPR data collected on dirt (top) and gravel (bottom) lanes at an Eastern US test site. . . . . . . . . . 47 2.11 Plot of ...partition the data into M components, they differ in the information used to partition the data. First, approaches that as- sume independence of observations...and the advantages and disadvantages of using each are discussed. Furthermore, the merits of incorporating spatial information are also highlighted
Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.
Deshwar, Amit G; Vembu, Shankar; Morris, Quaid
2015-01-01
Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…
Nonparametric weighted stochastic block models
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
2018-01-01
We present a Bayesian formulation of weighted stochastic block models that can be used to infer the large-scale modular structure of weighted networks, including their hierarchical organization. Our method is nonparametric, and thus does not require the prior knowledge of the number of groups or other dimensions of the model, which are instead inferred from data. We give a comprehensive treatment of different kinds of edge weights (i.e., continuous or discrete, signed or unsigned, bounded or unbounded), as well as arbitrary weight transformations, and describe an unsupervised model selection approach to choose the best network description. We illustrate the application of our method to a variety of empirical weighted networks, such as global migrations, voting patterns in congress, and neural connections in the human brain.
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Tau-REx: A new look at the retrieval of exoplanetary atmospheres
NASA Astrophysics Data System (ADS)
Waldmann, Ingo
2014-11-01
The field of exoplanetary spectroscopy is as fast moving as it is new. With an increasing amount of space and ground based instruments obtaining data on a large set of extrasolar planets we are indeed entering the era of exoplanetary characterisation. Permanently at the edge of instrument feasibility, it is as important as it is difficult to find the most optimal and objective methodologies to analysing and interpreting current data. This is particularly true for smaller and fainter Earth and Super-Earth type planets.For low to mid signal to noise (SNR) observations, we are prone to two sources of biases: 1) Prior selection in the data reduction and analysis; 2) Prior constraints on the spectral retrieval. In Waldmann et al. (2013), Morello et al. (2014) and Waldmann (2012, 2014) we have shown a prior-free approach to data analysis based on non-parametric machine learning techniques. Following these approaches we will present a new take on the spectral retrieval of extrasolar planets. Tau-REx (tau-retrieval of exoplanets) is a new line-by-line, atmospheric retrieval framework. In the past the decision on what opacity sources go into an atmospheric model were usually user defined. Manual input can lead to model biases and poor convergence of the atmospheric model to the data. In Tau-REx we have set out to solve this. Through custom built pattern recognition software, Tau-REx is able to rapidly identify the most likely atmospheric opacities from a large number of possible absorbers/emitters (ExoMol or HiTran data bases) and non-parametrically constrain the prior space for the Bayesian retrieval. Unlike other (MCMC based) techniques, Tau-REx is able to fully integrate high-dimensional log-likelihood spaces and to calculate the full Bayesian Evidence of the atmospheric models. We achieve this through a combination of Nested Sampling and a high degree of code parallelisation. This allows for an exact and unbiased Bayesian model selection and a fully mapping of potential model-data degeneracies. Together with non-parametric data de-trending of exoplanetary spectra, we can reach an un- precedented level of objectivity in our atmospheric characterisation of these foreign worlds.
Moscoso del Prado Martín, Fermín
2013-12-01
I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. PsycINFO Database Record (c) 2014 APA, all rights reserved.
On the Bayesian Nonparametric Generalization of IRT-Type Models
ERIC Educational Resources Information Center
San Martin, Ernesto; Jara, Alejandro; Rolin, Jean-Marie; Mouchart, Michel
2011-01-01
We study the identification and consistency of Bayesian semiparametric IRT-type models, where the uncertainty on the abilities' distribution is modeled using a prior distribution on the space of probability measures. We show that for the semiparametric Rasch Poisson counts model, simple restrictions ensure the identification of a general…
Fan, Yue; Wang, Xiao; Peng, Qinke
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.
Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence
2012-12-01
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.
Kharroubi, Samer A
2017-10-06
Valuations of health state descriptors such as EQ-5D or SF6D have been conducted in different countries. There is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately. Data from 2 EQ-5D valuation studies were analyzed using the time trade-off technique, where values for 42 health states were devised from representative samples of the UK and US populations. A Bayesian non-parametric approach has been applied to predict the health utilities of the US population, where the UK results were used as informative priors in the model to improve their estimation. The findings showed that employing additional information from the UK data helped in the production of US utility estimates much more precisely than would have been possible using the US study data alone. It is very plausible that this method would serve useful in countries where the conduction of large evaluation studies is not very feasible.
2010-01-01
Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data. PMID:21062443
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Kwak, Sehyun; Svensson, J; Brix, M; Ghim, Y-C
2016-02-01
A Bayesian model of the emission spectrum of the JET lithium beam has been developed to infer the intensity of the Li I (2p-2s) line radiation and associated uncertainties. The detected spectrum for each channel of the lithium beam emission spectroscopy system is here modelled by a single Li line modified by an instrumental function, Bremsstrahlung background, instrumental offset, and interference filter curve. Both the instrumental function and the interference filter curve are modelled with non-parametric Gaussian processes. All free parameters of the model, the intensities of the Li line, Bremsstrahlung background, and instrumental offset, are inferred using Bayesian probability theory with a Gaussian likelihood for photon statistics and electronic background noise. The prior distributions of the free parameters are chosen as Gaussians. Given these assumptions, the intensity of the Li line and corresponding uncertainties are analytically available using a Bayesian linear inversion technique. The proposed approach makes it possible to extract the intensity of Li line without doing a separate background subtraction through modulation of the Li beam.
ERIC Educational Resources Information Center
Si, Yajuan; Reiter, Jerome P.
2013-01-01
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…
A Bayesian approach to model structural error and input variability in groundwater modeling
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Lin, Y. F. F.; Liang, F.
2015-12-01
Effective water resource management typically relies on numerical models to analyze groundwater flow and solute transport processes. Model structural error (due to simplification and/or misrepresentation of the "true" environmental system) and input forcing variability (which commonly arises since some inputs are uncontrolled or estimated with high uncertainty) are ubiquitous in groundwater models. Calibration that overlooks errors in model structure and input data can lead to biased parameter estimates and compromised predictions. We present a fully Bayesian approach for a complete assessment of uncertainty for spatially distributed groundwater models. The approach explicitly recognizes stochastic input and uses data-driven error models based on nonparametric kernel methods to account for model structural error. We employ exploratory data analysis to assist in specifying informative prior for error models to improve identifiability. The inference is facilitated by an efficient sampling algorithm based on DREAM-ZS and a parameter subspace multiple-try strategy to reduce the required number of forward simulations of the groundwater model. We demonstrate the Bayesian approach through a synthetic case study of surface-ground water interaction under changing pumping conditions. It is found that explicit treatment of errors in model structure and input data (groundwater pumping rate) has substantial impact on the posterior distribution of groundwater model parameters. Using error models reduces predictive bias caused by parameter compensation. In addition, input variability increases parametric and predictive uncertainty. The Bayesian approach allows for a comparison among the contributions from various error sources, which could inform future model improvement and data collection efforts on how to best direct resources towards reducing predictive uncertainty.
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
Zhou, Mingyuan; Chen, Haojun; Paisley, John; Ren, Lu; Li, Lingbo; Xing, Zhengming; Dunson, David; Sapiro, Guillermo; Carin, Lawrence
2013-01-01
Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature. PMID:21693421
Cure modeling in real-time prediction: How much does it help?
Ying, Gui-Shuang; Zhang, Qiang; Lan, Yu; Li, Yimei; Heitjan, Daniel F
2017-08-01
Various parametric and nonparametric modeling approaches exist for real-time prediction in time-to-event clinical trials. Recently, Chen (2016 BMC Biomedical Research Methodology 16) proposed a prediction method based on parametric cure-mixture modeling, intending to cover those situations where it appears that a non-negligible fraction of subjects is cured. In this article we apply a Weibull cure-mixture model to create predictions, demonstrating the approach in RTOG 0129, a randomized trial in head-and-neck cancer. We compare the ultimate realized data in RTOG 0129 to interim predictions from a Weibull cure-mixture model, a standard Weibull model without a cure component, and a nonparametric model based on the Bayesian bootstrap. The standard Weibull model predicted that events would occur earlier than the Weibull cure-mixture model, but the difference was unremarkable until late in the trial when evidence for a cure became clear. Nonparametric predictions often gave undefined predictions or infinite prediction intervals, particularly at early stages of the trial. Simulations suggest that cure modeling can yield better-calibrated prediction intervals when there is a cured component, or the appearance of a cured component, but at a substantial cost in the average width of the intervals. Copyright © 2017 Elsevier Inc. All rights reserved.
Long-range dismount activity classification: LODAC
NASA Astrophysics Data System (ADS)
Garagic, Denis; Peskoe, Jacob; Liu, Fang; Cuevas, Manuel; Freeman, Andrew M.; Rhodes, Bradley J.
2014-06-01
Continuous classification of dismount types (including gender, age, ethnicity) and their activities (such as walking, running) evolving over space and time is challenging. Limited sensor resolution (often exacerbated as a function of platform standoff distance) and clutter from shadows in dense target environments, unfavorable environmental conditions, and the normal properties of real data all contribute to the challenge. The unique and innovative aspect of our approach is a synthesis of multimodal signal processing with incremental non-parametric, hierarchical Bayesian machine learning methods to create a new kind of target classification architecture. This architecture is designed from the ground up to optimally exploit correlations among the multiple sensing modalities (multimodal data fusion) and rapidly and continuously learns (online self-tuning) patterns of distinct classes of dismounts given little a priori information. This increases classification performance in the presence of challenges posed by anti-access/area denial (A2/AD) sensing. To fuse multimodal features, Long-range Dismount Activity Classification (LODAC) develops a novel statistical information theoretic approach for multimodal data fusion that jointly models multimodal data (i.e., a probabilistic model for cross-modal signal generation) and discovers the critical cross-modal correlations by identifying components (features) with maximal mutual information (MI) which is efficiently estimated using non-parametric entropy models. LODAC develops a generic probabilistic pattern learning and classification framework based on a new class of hierarchical Bayesian learning algorithms for efficiently discovering recurring patterns (classes of dismounts) in multiple simultaneous time series (sensor modalities) at multiple levels of feature granularity.
Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murakami, Haruko; Chen, X.; Hahn, Melanie S.
2010-10-21
This study presents a stochastic, three-dimensional characterization of a heterogeneous hydraulic conductivity field within DOE's Hanford 300 Area site, Washington, by assimilating large-scale, constant-rate injection test data with small-scale, three-dimensional electromagnetic borehole flowmeter (EBF) measurement data. We first inverted the injection test data to estimate the transmissivity field, using zeroth-order temporal moments of pressure buildup curves. We applied a newly developed Bayesian geostatistical inversion framework, the method of anchored distributions (MAD), to obtain a joint posterior distribution of geostatistical parameters and local log-transmissivities at multiple locations. The unique aspects of MAD that make it suitable for this purpose are itsmore » ability to integrate multi-scale, multi-type data within a Bayesian framework and to compute a nonparametric posterior distribution. After we combined the distribution of transmissivities with depth-discrete relative-conductivity profile from EBF data, we inferred the three-dimensional geostatistical parameters of the log-conductivity field, using the Bayesian model-based geostatistics. Such consistent use of the Bayesian approach throughout the procedure enabled us to systematically incorporate data uncertainty into the final posterior distribution. The method was tested in a synthetic study and validated using the actual data that was not part of the estimation. Results showed broader and skewed posterior distributions of geostatistical parameters except for the mean, which suggests the importance of inferring the entire distribution to quantify the parameter uncertainty.« less
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.
Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo
2017-01-01
Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
McCarron, C Elizabeth; Pullenayegum, Eleanor M; Thabane, Lehana; Goeree, Ron; Tarride, Jean-Eric
2013-04-01
Bayesian methods have been proposed as a way of synthesizing all available evidence to inform decision making. However, few practical applications of the use of Bayesian methods for combining patient-level data (i.e., trial) with additional evidence (e.g., literature) exist in the cost-effectiveness literature. The objective of this study was to compare a Bayesian cost-effectiveness analysis using informative priors to a standard non-Bayesian nonparametric method to assess the impact of incorporating additional information into a cost-effectiveness analysis. Patient-level data from a previously published nonrandomized study were analyzed using traditional nonparametric bootstrap techniques and bivariate normal Bayesian models with vague and informative priors. Two different types of informative priors were considered to reflect different valuations of the additional evidence relative to the patient-level data (i.e., "face value" and "skeptical"). The impact of using different distributions and valuations was assessed in a sensitivity analysis. Models were compared in terms of incremental net monetary benefit (INMB) and cost-effectiveness acceptability frontiers (CEAFs). The bootstrapping and Bayesian analyses using vague priors provided similar results. The most pronounced impact of incorporating the informative priors was the increase in estimated life years in the control arm relative to what was observed in the patient-level data alone. Consequently, the incremental difference in life years originally observed in the patient-level data was reduced, and the INMB and CEAF changed accordingly. The results of this study demonstrate the potential impact and importance of incorporating additional information into an analysis of patient-level data, suggesting this could alter decisions as to whether a treatment should be adopted and whether more information should be acquired.
Pressman, Alice R; Avins, Andrew L; Hubbard, Alan; Satariano, William A
2011-07-01
There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien-Fleming and Lan-DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. Copyright © 2011 Elsevier Inc. All rights reserved.
Pressman, Alice R.; Avins, Andrew L.; Hubbard, Alan; Satariano, William A.
2014-01-01
Background There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. Methods We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien–Fleming and Lan–DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. Results No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. Conclusions Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient. PMID:21453792
Bayesian isotonic density regression
Wang, Lianming; Dunson, David B.
2011-01-01
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259
Semiparametric time varying coefficient model for matched case-crossover studies.
Ortega-Villa, Ana Maria; Kim, Inyoung; Kim, H
2017-03-15
In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. This is because any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. However, some matching covariates such as time often play an important role as an effect modification leading to incorrect statistical estimation and prediction. Therefore, we propose three approaches to evaluate effect modification by time. The first is a parametric approach, the second is a semiparametric penalized approach, and the third is a semiparametric Bayesian approach. Our parametric approach is a two-stage method, which uses conditional logistic regression in the first stage and then estimates polynomial regression in the second stage. Our semiparametric penalized and Bayesian approaches are one-stage approaches developed by using regression splines. Our semiparametric one stage approach allows us to not only detect the parametric relationship between the predictor and binary outcomes, but also evaluate nonparametric relationships between the predictor and time. We demonstrate the advantage of our semiparametric one-stage approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity. We also provide statistical inference for the semiparametric Bayesian approach using Bayes Factors. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
Bayesian analysis of energy and count rate data for detection of low count rate radioactive sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klumpp, John
We propose a radiation detection system which generates its own discrete sampling distribution based on past measurements of background. The advantage to this approach is that it can take into account variations in background with respect to time, location, energy spectra, detector-specific characteristics (i.e. different efficiencies at different count rates and energies), etc. This would therefore be a 'machine learning' approach, in which the algorithm updates and improves its characterization of background over time. The system would have a 'learning mode,' in which it measures and analyzes background count rates, and a 'detection mode,' in which it compares measurements frommore » an unknown source against its unique background distribution. By characterizing and accounting for variations in the background, general purpose radiation detectors can be improved with little or no increase in cost. The statistical and computational techniques to perform this kind of analysis have already been developed. The necessary signal analysis can be accomplished using existing Bayesian algorithms which account for multiple channels, multiple detectors, and multiple time intervals. Furthermore, Bayesian machine-learning techniques have already been developed which, with trivial modifications, can generate appropriate decision thresholds based on the comparison of new measurements against a nonparametric sampling distribution. (authors)« less
Bayesian Nonparametric Prediction and Statistical Inference
1989-09-07
Kadane, J. (1980), "Bayesian decision theory and the sim- plification of models," in Evaluation of Econometric Models, J. Kmenta and J. Ramsey , eds...the random model and weighted least squares regression," in Evaluation of Econometric Models, ed. by J. Kmenta and J. Ramsey , Academic Press, 197-217...likelihood function. On the other hand, H. Jeffreys’s theory of hypothesis testing covers the most important situations in which the prior is not diffuse. See
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2014-01-01
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
Gasbarra, Dario; Arjas, Elja; Vehtari, Aki; Slama, Rémy; Keiding, Niels
2015-10-01
This paper was inspired by the studies of Niels Keiding and co-authors on estimating the waiting time-to-pregnancy (TTP) distribution, and in particular on using the current duration design in that context. In this design, a cross-sectional sample of women is collected from those who are currently attempting to become pregnant, and then by recording from each the time she has been attempting. Our aim here is to study the identifiability and the estimation of the waiting time distribution on the basis of current duration data. The main difficulty in this stems from the fact that very short waiting times are only rarely selected into the sample of current durations, and this renders their estimation unstable. We introduce here a Bayesian method for this estimation problem, prove its asymptotic consistency, and compare the method to some variants of the non-parametric maximum likelihood estimators, which have been used previously in this context. The properties of the Bayesian estimation method are studied also empirically, using both simulated data and TTP data on current durations collected by Slama et al. (Hum Reprod 27(5):1489-1498, 2012).
Bayesian nonparametric regression with varying residual density
Pati, Debdeep; Dunson, David B.
2013-01-01
We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235
Torres-Carvajal, Omar; Schulte, James A; Cadle, John E
2006-04-01
The South American iguanian lizard genus Stenocercus includes 54 species occurring mostly in the Andes and adjacent lowland areas from northern Venezuela and Colombia to central Argentina at elevations of 0-4000m. Small taxon or character sampling has characterized all phylogenetic analyses of Stenocercus, which has long been recognized as sister taxon to the Tropidurus Group. In this study, we use mtDNA sequence data to perform phylogenetic analyses that include 32 species of Stenocercus and 12 outgroup taxa. Monophyly of this genus is strongly supported by maximum parsimony and Bayesian analyses. Evolutionary relationships within Stenocercus are further analyzed with a Bayesian implementation of a general mixture model, which accommodates variability in the pattern of evolution across sites. These analyses indicate a basal split of Stenocercus into two clades, one of which receives very strong statistical support. In addition, we test previous hypotheses using non-parametric and parametric statistical methods, and provide a phylogenetic classification for Stenocercus.
Modeling Non-Gaussian Time Series with Nonparametric Bayesian Model.
Xu, Zhiguang; MacEachern, Steven; Xu, Xinyi
2015-02-01
We present a class of Bayesian copula models whose major components are the marginal (limiting) distribution of a stationary time series and the internal dynamics of the series. We argue that these are the two features with which an analyst is typically most familiar, and hence that these are natural components with which to work. For the marginal distribution, we use a nonparametric Bayesian prior distribution along with a cdf-inverse cdf transformation to obtain large support. For the internal dynamics, we rely on the traditionally successful techniques of normal-theory time series. Coupling the two components gives us a family of (Gaussian) copula transformed autoregressive models. The models provide coherent adjustments of time scales and are compatible with many extensions, including changes in volatility of the series. We describe basic properties of the models, show their ability to recover non-Gaussian marginal distributions, and use a GARCH modification of the basic model to analyze stock index return series. The models are found to provide better fit and improved short-range and long-range predictions than Gaussian competitors. The models are extensible to a large variety of fields, including continuous time models, spatial models, models for multiple series, models driven by external covariate streams, and non-stationary models.
Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates
Gill, Mandev S.; Lemey, Philippe; Bennett, Shannon N.; Biek, Roman; Suchard, Marc A.
2016-01-01
Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics and evolutionary biology. Kingman’s coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. Major goals of demographic reconstruction include identifying driving factors of effective population size, and understanding the association between the effective population size and such factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates that exploit Gaussian Markov random fields to achieve temporal smoothing of effective population size trajectories. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change. [Coalescent; effective population size; Gaussian Markov random fields; phylodynamics; phylogenetics; population genetics. PMID:27368344
NASA Astrophysics Data System (ADS)
Meresescu, Alina G.; Kowalski, Matthieu; Schmidt, Frédéric; Landais, François
2018-06-01
The Water Residence Time distribution is the equivalent of the impulse response of a linear system allowing the propagation of water through a medium, e.g. the propagation of rain water from the top of the mountain towards the aquifers. We consider the output aquifer levels as the convolution between the input rain levels and the Water Residence Time, starting with an initial aquifer base level. The estimation of Water Residence Time is important for a better understanding of hydro-bio-geochemical processes and mixing properties of wetlands used as filters in ecological applications, as well as protecting fresh water sources for wells from pollutants. Common methods of estimating the Water Residence Time focus on cross-correlation, parameter fitting and non-parametric deconvolution methods. Here we propose a 1D full-deconvolution, regularized, non-parametric inverse problem algorithm that enforces smoothness and uses constraints of causality and positivity to estimate the Water Residence Time curve. Compared to Bayesian non-parametric deconvolution approaches, it has a fast runtime per test case; compared to the popular and fast cross-correlation method, it produces a more precise Water Residence Time curve even in the case of noisy measurements. The algorithm needs only one regularization parameter to balance between smoothness of the Water Residence Time and accuracy of the reconstruction. We propose an approach on how to automatically find a suitable value of the regularization parameter from the input data only. Tests on real data illustrate the potential of this method to analyze hydrological datasets.
Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.
Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib
2017-03-01
A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Uncertainty Estimates of Psychoacoustic Thresholds Obtained from Group Tests
NASA Technical Reports Server (NTRS)
Rathsam, Jonathan; Christian, Andrew
2016-01-01
Adaptive psychoacoustic test methods, in which the next signal level depends on the response to the previous signal, are the most efficient for determining psychoacoustic thresholds of individual subjects. In many tests conducted in the NASA psychoacoustic labs, the goal is to determine thresholds representative of the general population. To do this economically, non-adaptive testing methods are used in which three or four subjects are tested at the same time with predetermined signal levels. This approach requires us to identify techniques for assessing the uncertainty in resulting group-average psychoacoustic thresholds. In this presentation we examine the Delta Method of frequentist statistics, the Generalized Linear Model (GLM), the Nonparametric Bootstrap, a frequentist method, and Markov Chain Monte Carlo Posterior Estimation and a Bayesian approach. Each technique is exercised on a manufactured, theoretical dataset and then on datasets from two psychoacoustics facilities at NASA. The Delta Method is the simplest to implement and accurate for the cases studied. The GLM is found to be the least robust, and the Bootstrap takes the longest to calculate. The Bayesian Posterior Estimate is the most versatile technique examined because it allows the inclusion of prior information.
A Non-parametric Cutout Index for Robust Evaluation of Identified Proteins*
Serang, Oliver; Paulo, Joao; Steen, Hanno; Steen, Judith A.
2013-01-01
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems. PMID:23292186
A Rational Analysis of the Acquisition of Multisensory Representations
ERIC Educational Resources Information Center
Yildirim, Ilker; Jacobs, Robert A.
2012-01-01
How do people learn multisensory, or amodal, representations, and what consequences do these representations have for perceptual performance? We address this question by performing a rational analysis of the problem of learning multisensory representations. This analysis makes use of a Bayesian nonparametric model that acquires latent multisensory…
A Bayesian Nonparametric Meta-Analysis Model
ERIC Educational Resources Information Center
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
A Bayesian Semiparametric Latent Variable Model for Mixed Responses
ERIC Educational Resources Information Center
Fahrmeir, Ludwig; Raach, Alexander
2007-01-01
In this paper we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model. We extend existing LVMs with the usual linear covariate effects by including nonparametric components for nonlinear…
Automatic discovery of cell types and microcircuitry from neural connectomics
Jonas, Eric; Kording, Konrad
2015-01-01
Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity, better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets. DOI: http://dx.doi.org/10.7554/eLife.04250.001 PMID:25928186
Automatic discovery of cell types and microcircuitry from neural connectomics
Jonas, Eric; Kording, Konrad
2015-04-30
Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity,more » better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets.« less
Automatic discovery of cell types and microcircuitry from neural connectomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jonas, Eric; Kording, Konrad
Neural connectomics has begun producing massive amounts of data, necessitating new analysis methods to discover the biological and computational structure. It has long been assumed that discovering neuron types and their relation to microcircuitry is crucial to understanding neural function. Here we developed a non-parametric Bayesian technique that identifies neuron types and microcircuitry patterns in connectomics data. It combines the information traditionally used by biologists in a principled and probabilistically coherent manner, including connectivity, cell body location, and the spatial distribution of synapses. We show that the approach recovers known neuron types in the retina and enables predictions of connectivity,more » better than simpler algorithms. It also can reveal interesting structure in the nervous system of Caenorhabditis elegans and an old man-made microprocessor. Our approach extracts structural meaning from connectomics, enabling new approaches of automatically deriving anatomical insights from these emerging datasets.« less
Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN.
Hao, Jie; Liebeke, Manuel; Astle, William; De Iorio, Maria; Bundy, Jacob G; Ebbels, Timothy M D
2014-01-01
Data processing for 1D NMR spectra is a key bottleneck for metabolomic and other complex-mixture studies, particularly where quantitative data on individual metabolites are required. We present a protocol for automated metabolite deconvolution and quantification from complex NMR spectra by using the Bayesian automated metabolite analyzer for NMR (BATMAN) R package. BATMAN models resonances on the basis of a user-controllable set of templates, each of which specifies the chemical shifts, J-couplings and relative peak intensities for a single metabolite. Peaks are allowed to shift position slightly between spectra, and peak widths are allowed to vary by user-specified amounts. NMR signals not captured by the templates are modeled non-parametrically by using wavelets. The protocol covers setting up user template libraries, optimizing algorithmic input parameters, improving prior information on peak positions, quality control and evaluation of outputs. The outputs include relative concentration estimates for named metabolites together with associated Bayesian uncertainty estimates, as well as the fit of the remainder of the spectrum using wavelets. Graphical diagnostics allow the user to examine the quality of the fit for multiple spectra simultaneously. This approach offers a workflow to analyze large numbers of spectra and is expected to be useful in a wide range of metabolomics studies.
Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A
2017-12-01
The onset of muscle activity, as measured by electromyography (EMG), is a commonly applied metric in biomechanics. Intramuscular EMG is often used to examine deep musculature and there are currently no studies examining the effectiveness of algorithms for intramuscular EMG onset. The present study examines standard surface EMG onset algorithms (linear envelope, Teager-Kaiser Energy Operator, and sample entropy) and novel algorithms (time series mean-variance analysis, sequential/batch processing with parametric and nonparametric methods, and Bayesian changepoint analysis). Thirteen male and 5 female subjects had intramuscular EMG collected during isolated biceps brachii and vastus lateralis contractions, resulting in 103 trials. EMG onset was visually determined twice by 3 blinded reviewers. Since the reliability of visual onset was high (ICC (1,1) : 0.92), the mean of the 6 visual assessments was contrasted with the algorithmic approaches. Poorly performing algorithms were stepwise eliminated via (1) root mean square error analysis, (2) algorithm failure to identify onset/premature onset, (3) linear regression analysis, and (4) Bland-Altman plots. The top performing algorithms were all based on Bayesian changepoint analysis of rectified EMG and were statistically indistinguishable from visual analysis. Bayesian changepoint analysis has the potential to produce more reliable, accurate, and objective intramuscular EMG onset results than standard methodologies.
Quantum-Like Bayesian Networks for Modeling Decision Making
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2018-01-30
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A program for the Bayesian Neural Network in the ROOT framework
NASA Astrophysics Data System (ADS)
Zhong, Jiahang; Huang, Run-Sheng; Lee, Shih-Chang
2011-12-01
We present a Bayesian Neural Network algorithm implemented in the TMVA package (Hoecker et al., 2007 [1]), within the ROOT framework (Brun and Rademakers, 1997 [2]). Comparing to the conventional utilization of Neural Network as discriminator, this new implementation has more advantages as a non-parametric regression tool, particularly for fitting probabilities. It provides functionalities including cost function selection, complexity control and uncertainty estimation. An example of such application in High Energy Physics is shown. The algorithm is available with ROOT release later than 5.29. Program summaryProgram title: TMVA-BNN Catalogue identifier: AEJX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD license No. of lines in distributed program, including test data, etc.: 5094 No. of bytes in distributed program, including test data, etc.: 1,320,987 Distribution format: tar.gz Programming language: C++ Computer: Any computer system or cluster with C++ compiler and UNIX-like operating system Operating system: Most UNIX/Linux systems. The application programs were thoroughly tested under Fedora and Scientific Linux CERN. Classification: 11.9 External routines: ROOT package version 5.29 or higher ( http://root.cern.ch) Nature of problem: Non-parametric fitting of multivariate distributions Solution method: An implementation of Neural Network following the Bayesian statistical interpretation. Uses Laplace approximation for the Bayesian marginalizations. Provides the functionalities of automatic complexity control and uncertainty estimation. Running time: Time consumption for the training depends substantially on the size of input sample, the NN topology, the number of training iterations, etc. For the example in this manuscript, about 7 min was used on a PC/Linux with 2.0 GHz processors.
BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs
Eklund, Anders; Dufort, Paul; Villani, Mattias; LaConte, Stephen
2014-01-01
Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/). PMID:24672471
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2007-01-01
The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
Lobach, Iryna; Mallick, Bani; Carroll, Raymond J
2011-01-01
Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.
Nonparametric Bayesian models for a spatial covariance.
Reich, Brian J; Fuentes, Montserrat
2012-01-01
A crucial step in the analysis of spatial data is to estimate the spatial correlation function that determines the relationship between a spatial process at two locations. The standard approach to selecting the appropriate correlation function is to use prior knowledge or exploratory analysis, such as a variogram analysis, to select the correct parametric correlation function. Rather that selecting a particular parametric correlation function, we treat the covariance function as an unknown function to be estimated from the data. We propose a flexible prior for the correlation function to provide robustness to the choice of correlation function. We specify the prior for the correlation function using spectral methods and the Dirichlet process prior, which is a common prior for an unknown distribution function. Our model does not require Gaussian data or spatial locations on a regular grid. The approach is demonstrated using a simulation study as well as an analysis of California air pollution data.
Reconstructing the interaction between dark energy and dark matter using Gaussian processes
NASA Astrophysics Data System (ADS)
Yang, Tao; Guo, Zong-Kuan; Cai, Rong-Gen
2015-06-01
We present a nonparametric approach to reconstruct the interaction between dark energy and dark matter directly from SNIa Union 2.1 data using Gaussian processes, which is a fully Bayesian approach for smoothing data. In this method, once the equation of state (w ) of dark energy is specified, the interaction can be reconstructed as a function of redshift. For the decaying vacuum energy case with w =-1 , the reconstructed interaction is consistent with the standard Λ CDM model, namely, there is no evidence for the interaction. This also holds for the constant w cases from -0.9 to -1.1 and for the Chevallier-Polarski-Linder (CPL) parametrization case. If the equation of state deviates obviously from -1 , the reconstructed interaction exists at 95% confidence level. This shows the degeneracy between the interaction and the equation of state of dark energy when they get constraints from the observational data.
Inhomogeneous Poisson process rate function inference from dead-time limited observations.
Verma, Gunjan; Drost, Robert J
2017-05-01
The estimation of an inhomogeneous Poisson process (IHPP) rate function from a set of process observations is an important problem arising in optical communications and a variety of other applications. However, because of practical limitations of detector technology, one is often only able to observe a corrupted version of the original process. In this paper, we consider how inference of the rate function is affected by dead time, a period of time after the detection of an event during which a sensor is insensitive to subsequent IHPP events. We propose a flexible nonparametric Bayesian approach to infer an IHPP rate function given dead-time limited process realizations. Simulation results illustrate the effectiveness of our inference approach and suggest its ability to extend the utility of existing sensor technology by permitting more accurate inference on signals whose observations are dead-time limited. We apply our inference algorithm to experimentally collected optical communications data, demonstrating the practical utility of our approach in the context of channel modeling and validation.
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
NASA Astrophysics Data System (ADS)
Kozoderov, V. V.; Kondranin, T. V.; Dmitriev, E. V.
2017-12-01
The basic model for the recognition of natural and anthropogenic objects using their spectral and textural features is described in the problem of hyperspectral air-borne and space-borne imagery processing. The model is based on improvements of the Bayesian classifier that is a computational procedure of statistical decision making in machine-learning methods of pattern recognition. The principal component method is implemented to decompose the hyperspectral measurements on the basis of empirical orthogonal functions. Application examples are shown of various modifications of the Bayesian classifier and Support Vector Machine method. Examples are provided of comparing these classifiers and a metrical classifier that operates on finding the minimal Euclidean distance between different points and sets in the multidimensional feature space. A comparison is also carried out with the " K-weighted neighbors" method that is close to the nonparametric Bayesian classifier.
10th Conference on Bayesian Nonparametrics
2016-05-08
RETURN YOUR FORM TO THE ABOVE ADDRESS. North Carolina State University 2701 Sullivan Drive Admin Srvcs III, Box 7514 Raleigh, NC 27695 -7514 ABSTRACT...the conference. The findings from the conference is widely disseminated. The conference web site displays slides of the talks presented in the...being published by the Electronic Journal of Statistics consisting of about 20 papers read at the conference. The conference web site displays
Robust point matching via vector field consensus.
Jiayi Ma; Ji Zhao; Jinwen Tian; Yuille, Alan L; Zhuowen Tu
2014-04-01
In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
Vexler, Albert; Tanajian, Hovig; Hutson, Alan D
In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.
Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon
2017-12-01
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes. © 2017, The International Biometric Society.
A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Antoneli, Fernando; Passos, Fernando M.; Lopes, Luciano R.
2018-01-01
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-parametric (one-sample and two-sample) Kolmogorov-Smirnov (KS) goodness-of-fit test. In the strict clock case, the method consists in using the one-sample Kolmogorov-Smirnov (KS) test to directly test if the phylogeny is clock-like, in other words, if it follows a Poisson law. The ECD is computed from the discretized branch lengths and the parameter λ of the expected Poisson distribution is calculated as the average branch length over the ensemble of trees. To compensate for the auto-correlation in the ensemble of trees and pseudo-replication we take advantage of thinning and effective sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is observed that tree topologies with very long or very short branches lead to Poisson mixtures and in this case we propose the use of the two-sample KS test with samples from two continuous branch length distributions, one obtained from an ensemble of clock-constrained trees and the other from an ensemble of unconstrained trees. Moreover, in this second form the test can also be applied to test for relaxed clock models. The use of a statistically equivalent ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree, yields considerable reduction of the effects of small sample size and provides a gain of power. PMID:29300759
Liao, J. G.; Mcmurry, Timothy; Berg, Arthur
2014-01-01
Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology. PMID:23934072
Burroughs, N J; Pillay, D; Mutimer, D
1999-01-01
Bayesian analysis using a virus dynamics model is demonstrated to facilitate hypothesis testing of patterns in clinical time-series. Our Markov chain Monte Carlo implementation demonstrates that the viraemia time-series observed in two sets of hepatitis B patients on antiviral (lamivudine) therapy, chronic carriers and liver transplant patients, are significantly different, overcoming clinical trial design differences that question the validity of non-parametric tests. We show that lamivudine-resistant mutants grow faster in transplant patients than in chronic carriers, which probably explains the differences in emergence times and failure rates between these two sets of patients. Incorporation of dynamic models into Bayesian parameter analysis is of general applicability in medical statistics. PMID:10643081
Bayesian Geostatistical Modeling of Malaria Indicator Survey Data in Angola
Gosoniu, Laura; Veta, Andre Mia; Vounatsou, Penelope
2010-01-01
The 2006–2007 Angola Malaria Indicator Survey (AMIS) is the first nationally representative household survey in the country assessing coverage of the key malaria control interventions and measuring malaria-related burden among children under 5 years of age. In this paper, the Angolan MIS data were analyzed to produce the first smooth map of parasitaemia prevalence based on contemporary nationwide empirical data in the country. Bayesian geostatistical models were fitted to assess the effect of interventions after adjusting for environmental, climatic and socio-economic factors. Non-linear relationships between parasitaemia risk and environmental predictors were modeled by categorizing the covariates and by employing two non-parametric approaches, the B-splines and the P-splines. The results of the model validation showed that the categorical model was able to better capture the relationship between parasitaemia prevalence and the environmental factors. Model fit and prediction were handled within a Bayesian framework using Markov chain Monte Carlo (MCMC) simulations. Combining estimates of parasitaemia prevalence with the number of children under we obtained estimates of the number of infected children in the country. The population-adjusted prevalence ranges from in Namibe province to in Malanje province. The odds of parasitaemia in children living in a household with at least ITNs per person was by 41% lower (CI: 14%, 60%) than in those with fewer ITNs. The estimates of the number of parasitaemic children produced in this paper are important for planning and implementing malaria control interventions and for monitoring the impact of prevention and control activities. PMID:20351775
NASA Astrophysics Data System (ADS)
Omi, Takahiro; Ogata, Yosihiko; Hirata, Yoshito; Aihara, Kazuyuki
2015-04-01
Because aftershock occurrences can cause significant seismic risks for a considerable time after the main shock, prospective forecasting of the intermediate-term aftershock activity as soon as possible is important. The epidemic-type aftershock sequence (ETAS) model with the maximum likelihood estimate effectively reproduces general aftershock activity including secondary or higher-order aftershocks and can be employed for the forecasting. However, because we cannot always expect the accurate parameter estimation from incomplete early aftershock data where many events are missing, such forecasting using only a single estimated parameter set (plug-in forecasting) can frequently perform poorly. Therefore, we here propose Bayesian forecasting that combines the forecasts by the ETAS model with various probable parameter sets given the data. By conducting forecasting tests of 1 month period aftershocks based on the first 1 day data after the main shock as an example of the early intermediate-term forecasting, we show that the Bayesian forecasting performs better than the plug-in forecasting on average in terms of the log-likelihood score. Furthermore, to improve forecasting of large aftershocks, we apply a nonparametric (NP) model using magnitude data during the learning period and compare its forecasting performance with that of the Gutenberg-Richter (G-R) formula. We show that the NP forecast performs better than the G-R formula in some cases but worse in other cases. Therefore, robust forecasting can be obtained by employing an ensemble forecast that combines the two complementary forecasts. Our proposed method is useful for a stable unbiased intermediate-term assessment of aftershock probabilities.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Bayesian Local Contamination Models for Multivariate Outliers
Page, Garritt L.; Dunson, David B.
2013-01-01
In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outlying labs are present, we propose a local contamination model that is able to accommodate unusual multivariate realizations in a flexible way. The proposed method models the process level of a hierarchical model using a mixture with a parametric component and a possibly nonparametric contamination. Much of the flexibility in the methodology is achieved by allowing varying random subsets of the elements in the lab-specific mean vectors to be allocated to the contamination component. Computational methods are developed and the methodology is compared to three other possible approaches using a simulation study. We apply the proposed method to a NIST/NOAA sponsored inter-laboratory study which motivated the methodological development. PMID:24363465
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
Nonparametric Bayesian inference for mean residual life functions in survival analysis.
Poynor, Valerie; Kottas, Athanasios
2018-01-19
Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Bayesian Approaches for Model and Multi-mission Satellites Data Fusion
NASA Astrophysics Data System (ADS)
Khaki, M., , Dr; Forootan, E.; Awange, J.; Kuhn, M.
2017-12-01
Traditionally, data assimilation is formulated as a Bayesian approach that allows one to update model simulations using new incoming observations. This integration is necessary due to the uncertainty in model outputs, which mainly is the result of several drawbacks, e.g., limitations in accounting for the complexity of real-world processes, uncertainties of (unknown) empirical model parameters, and the absence of high resolution (both spatially and temporally) data. Data assimilation, however, requires knowledge of the physical process of a model, which may be either poorly described or entirely unavailable. Therefore, an alternative method is required to avoid this dependency. In this study we present a novel approach which can be used in hydrological applications. A non-parametric framework based on Kalman filtering technique is proposed to improve hydrological model estimates without using a model dynamics. Particularly, we assesse Kalman-Taken formulations that take advantage of the delay coordinate method to reconstruct nonlinear dynamics in the absence of the physical process. This empirical relationship is then used instead of model equations to integrate satellite products with model outputs. We use water storage variables from World-Wide Water Resources Assessment (W3RA) simulations and update them using data known as the Gravity Recovery And Climate Experiment (GRACE) terrestrial water storage (TWS) and also surface soil moisture data from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) over Australia for the period of 2003 to 2011. The performance of the proposed integration method is compared with data obtained from the more traditional assimilation scheme using the Ensemble Square-Root Filter (EnSRF) filtering technique (Khaki et al., 2017), as well as by evaluating them against ground-based soil moisture and groundwater observations within the Murray-Darling Basin.
Bartlett, Jonathan W; Keogh, Ruth H
2018-06-01
Bayesian approaches for handling covariate measurement error are well established and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper, we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration, arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next, we describe the closely related maximum likelihood and multiple imputation approaches and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of regression calibration and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey.
Implementation of Instrumental Variable Bounds for Data Missing Not at Random.
Marden, Jessica R; Wang, Linbo; Tchetgen, Eric J Tchetgen; Walter, Stefan; Glymour, M Maria; Wirth, Kathleen E
2018-05-01
Instrumental variables are routinely used to recover a consistent estimator of an exposure causal effect in the presence of unmeasured confounding. Instrumental variable approaches to account for nonignorable missing data also exist but are less familiar to epidemiologists. Like instrumental variables for exposure causal effects, instrumental variables for missing data rely on exclusion restriction and instrumental variable relevance assumptions. Yet these two conditions alone are insufficient for point identification. For estimation, researchers have invoked a third assumption, typically involving fairly restrictive parametric constraints. Inferences can be sensitive to these parametric assumptions, which are typically not empirically testable. The purpose of our article is to discuss another approach for leveraging a valid instrumental variable. Although the approach is insufficient for nonparametric identification, it can nonetheless provide informative inferences about the presence, direction, and magnitude of selection bias, without invoking a third untestable parametric assumption. An important contribution of this article is an Excel spreadsheet tool that can be used to obtain empirical evidence of selection bias and calculate bounds and corresponding Bayesian 95% credible intervals for a nonidentifiable population proportion. For illustrative purposes, we used the spreadsheet tool to analyze HIV prevalence data collected by the 2007 Zambia Demographic and Health Survey (DHS).
Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob
2016-08-01
The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.
Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biros, George
Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less
Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis
NASA Astrophysics Data System (ADS)
Sgouralis, Ioannis; Whitmore, Miles; Lapidus, Lisa; Comstock, Matthew J.; Pressé, Steve
2018-03-01
Bayesian nonparametrics (BNPs) are poised to have a deep impact in the analysis of single molecule data as they provide posterior probabilities over entire models consistent with the supplied data, not just model parameters of one preferred model. Thus they provide an elegant and rigorous solution to the difficult problem encountered when selecting an appropriate candidate model. Nevertheless, BNPs' flexibility to learn models and their associated parameters from experimental data is a double-edged sword. Most importantly, BNPs are prone to increasing the complexity of the estimated models due to artifactual features present in time traces. Thus, because of experimental challenges unique to single molecule methods, naive application of available BNP tools is not possible. Here we consider traces with time correlations and, as a specific example, we deal with force spectroscopy traces collected at high acquisition rates. While high acquisition rates are required in order to capture dwells in short-lived molecular states, in this setup, a slow response of the optical trap instrumentation (i.e., trapped beads, ambient fluid, and tethering handles) distorts the molecular signals introducing time correlations into the data that may be misinterpreted as true states by naive BNPs. Our adaptation of BNP tools explicitly takes into consideration these response dynamics, in addition to drift and noise, and makes unsupervised time series analysis of correlated single molecule force spectroscopy measurements possible, even at acquisition rates similar to or below the trap's response times.
Examining the evidence for dynamical dark energy.
Zhao, Gong-Bo; Crittenden, Robert G; Pogosian, Levon; Zhang, Xinmin
2012-10-26
We apply a new nonparametric Bayesian method for reconstructing the evolution history of the equation of state w of dark energy, based on applying a correlated prior for w(z), to a collection of cosmological data. We combine the latest supernova (SNLS 3 year or Union 2.1), cosmic microwave background, redshift space distortion, and the baryonic acoustic oscillation measurements (including BOSS, WiggleZ, and 6dF) and find that the cosmological constant appears consistent with current data, but that a dynamical dark energy model which evolves from w<-1 at z~0.25 to w>-1 at higher redshift is mildly favored. Estimates of the Bayesian evidence show little preference between the cosmological constant model and the dynamical model for a range of correlated prior choices. Looking towards future data, we find that the best fit models for current data could be well distinguished from the ΛCDM model by observations such as Planck and Euclid-like surveys.
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error inmore » addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.« less
Bayesian Nonparametric Statistical Inference for Shock Models and Wear Processes.
1979-12-01
Naval Research under Contract N00014-75-C-0781 and the National Science Foundation under Grant MCS78-01422 with the University of California...SUPPLEMENTARY NOTES Also supported by the National Science Foundation under Grant MCS78-01422. It. 96Y WORDS MOCa’t"u a’ iVWae" side if n*0eaem7 imW~ 149001 b Wek...Barlow and Proschan (1975), among others. The analogy of the shock model in risk and acturial analysis has been given by BUhlmann (1970, Chapter 2
Benchmark dose analysis via nonparametric regression modeling
Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen
2013-01-01
Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057
NASA Astrophysics Data System (ADS)
Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar
2015-06-01
In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.
Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling
F. Pradier, Melanie; J. R. Ruiz, Francisco; Perez-Cruz, Fernando
2016-01-01
This paper presents a novel application of Bayesian nonparametrics (BNP) for marathon data modeling. We make use of two well-known BNP priors, the single-p dependent Dirichlet process and the hierarchical Dirichlet process, in order to address two different problems. First, we study the impact of age, gender and environment on the runners’ performance. We derive a fair grading method that allows direct comparison of runners regardless of their age and gender. Unlike current grading systems, our approach is based not only on top world records, but on the performances of all runners. The presented methodology for comparison of densities can be adopted in many other applications straightforwardly, providing an interesting perspective to build dependent Dirichlet processes. Second, we analyze the running patterns of the marathoners in time, obtaining information that can be valuable for training purposes. We also show that these running patterns can be used to predict finishing time given intermediate interval measurements. We apply our models to New York City, Boston and London marathons. PMID:26821155
Kharroubi, Samer A; O'Hagan, Anthony; Brazier, John E
2010-07-10
Cost-effectiveness analysis of alternative medical treatments relies on having a measure of effectiveness, and many regard the quality adjusted life year (QALY) to be the current 'gold standard.' In order to compute QALYs, we require a suitable system for describing a person's health state, and a utility measure to value the quality of life associated with each possible state. There are a number of different health state descriptive systems, and we focus here on one known as the EQ-5D. Data for estimating utilities for different health states have a number of features that mean care is necessary in statistical modelling.There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained from different countries. This article applies a nonparametric model to estimate and compare EQ-5D health state valuation data obtained from two countries using Bayesian methods. The data set is the US and UK EQ-5D valuation studies where a sample of 42 states defined by the EQ-5D was valued by representative samples of the general population from each country using the time trade-off technique. We estimate a utility function across both countries which explicitly accounts for the differences between them, and is estimated using the data from both countries. The article discusses the implications of these results for future applications of the EQ-5D and for further work in this field. Copyright 2010 John Wiley & Sons, Ltd.
A Bayesian Hierarchical Modeling Approach to Predicting Flow in Ungauged Basins
NASA Astrophysics Data System (ADS)
Gronewold, A.; Alameddine, I.; Anderson, R. M.
2009-12-01
Recent innovative approaches to identifying and applying regression-based relationships between land use patterns (such as increasing impervious surface area and decreasing vegetative cover) and rainfall-runoff model parameters represent novel and promising improvements to predicting flow from ungauged basins. In particular, these approaches allow for predicting flows under uncertain and potentially variable future conditions due to rapid land cover changes, variable climate conditions, and other factors. Despite the broad range of literature on estimating rainfall-runoff model parameters, however, the absence of a robust set of modeling tools for identifying and quantifying uncertainties in (and correlation between) rainfall-runoff model parameters represents a significant gap in current hydrological modeling research. Here, we build upon a series of recent publications promoting novel Bayesian and probabilistic modeling strategies for quantifying rainfall-runoff model parameter estimation uncertainty. Our approach applies alternative measures of rainfall-runoff model parameter joint likelihood (including Nash-Sutcliffe efficiency, among others) to simulate samples from the joint parameter posterior probability density function. We then use these correlated samples as response variables in a Bayesian hierarchical model with land use coverage data as predictor variables in order to develop a robust land use-based tool for forecasting flow in ungauged basins while accounting for, and explicitly acknowledging, parameter estimation uncertainty. We apply this modeling strategy to low-relief coastal watersheds of Eastern North Carolina, an area representative of coastal resource waters throughout the world because of its sensitive embayments and because of the abundant (but currently threatened) natural resources it hosts. Consequently, this area is the subject of several ongoing studies and large-scale planning initiatives, including those conducted through the United States Environmental Protection Agency (USEPA) total maximum daily load (TMDL) program, as well as those addressing coastal population dynamics and sea level rise. Our approach has several advantages, including the propagation of parameter uncertainty through a nonparametric probability distribution which avoids common pitfalls of fitting parameters and model error structure to a predetermined parametric distribution function. In addition, by explicitly acknowledging correlation between model parameters (and reflecting those correlations in our predictive model) our model yields relatively efficient prediction intervals (unlike those in the current literature which are often unnecessarily large, and may lead to overly-conservative management actions). Finally, our model helps improve understanding of the rainfall-runoff process by identifying model parameters (and associated catchment attributes) which are most sensitive to current and future land use change patterns. Disclaimer: Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy.
Ensemble transcript interaction networks: a case study on Alzheimer's disease.
Armañanzas, Rubén; Larrañaga, Pedro; Bielza, Concha
2012-10-01
Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Mercury⊕: An evidential reasoning image classifier
NASA Astrophysics Data System (ADS)
Peddle, Derek R.
1995-12-01
MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.
ERIC Educational Resources Information Center
Kantabutra, Sangchan
2009-01-01
This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA). Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have…
A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study
ERIC Educational Resources Information Center
Kaplan, David; Chen, Jianshen
2012-01-01
A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for…
Constraining geostatistical models with hydrological data to improve prediction realism
NASA Astrophysics Data System (ADS)
Demyanov, V.; Rojas, T.; Christie, M.; Arnold, D.
2012-04-01
Geostatistical models reproduce spatial correlation based on the available on site data and more general concepts about the modelled patters, e.g. training images. One of the problem of modelling natural systems with geostatistics is in maintaining realism spatial features and so they agree with the physical processes in nature. Tuning the model parameters to the data may lead to geostatistical realisations with unrealistic spatial patterns, which would still honour the data. Such model would result in poor predictions, even though although fit the available data well. Conditioning the model to a wider range of relevant data provide a remedy that avoid producing unrealistic features in spatial models. For instance, there are vast amounts of information about the geometries of river channels that can be used in describing fluvial environment. Relations between the geometrical channel characteristics (width, depth, wave length, amplitude, etc.) are complex and non-parametric and are exhibit a great deal of uncertainty, which is important to propagate rigorously into the predictive model. These relations can be described within a Bayesian approach as multi-dimensional prior probability distributions. We propose a way to constrain multi-point statistics models with intelligent priors obtained from analysing a vast collection of contemporary river patterns based on previously published works. We applied machine learning techniques, namely neural networks and support vector machines, to extract multivariate non-parametric relations between geometrical characteristics of fluvial channels from the available data. An example demonstrates how ensuring geological realism helps to deliver more reliable prediction of a subsurface oil reservoir in a fluvial depositional environment.
Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory
ERIC Educational Resources Information Center
Wells, Craig S.; Bolt, Daniel M.
2008-01-01
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Motavalli, Mostafa; Whitney, G Adam; Dennis, James E; Mansour, Joseph M
2013-12-01
A previously developed novel imaging technique for determining the depth dependent properties of cartilage in simple shear is implemented. Shear displacement is determined from images of deformed lines photobleached on a sample, and shear strain is obtained from the derivative of the displacement. We investigated the feasibility of an alternative systematic approach to numerical differentiation for computing the shear strain that is based on fitting a continuous function to the shear displacement. Three models for a continuous shear displacement function are evaluated: polynomials, cubic splines, and non-parametric locally weighted scatter plot curves. Four independent approaches are then applied to identify the best-fit model and the accuracy of the first derivative. One approach is based on the Akaiki Information Criteria, and the Bayesian Information Criteria. The second is based on a method developed to smooth and differentiate digitized data from human motion. The third method is based on photobleaching a predefined circular area with a specific radius. Finally, we integrate the shear strain and compare it with the total shear deflection of the sample measured experimentally. Results show that 6th and 7th order polynomials are the best models for the shear displacement and its first derivative. In addition, failure of tissue-engineered cartilage, consistent with previous results, demonstrates the qualitative value of this imaging approach. © 2013 Elsevier Ltd. All rights reserved.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Covariate Balance in Bayesian Propensity Score Approaches for Observational Studies
ERIC Educational Resources Information Center
Chen, Jianshen; Kaplan, David
2015-01-01
Bayesian alternatives to frequentist propensity score approaches have recently been proposed. However, few studies have investigated their covariate balancing properties. This article compares a recently developed two-step Bayesian propensity score approach to the frequentist approach with respect to covariate balance. The effects of different…
ERIC Educational Resources Information Center
Sueiro, Manuel J.; Abad, Francisco J.
2011-01-01
The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…
An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models
ERIC Educational Resources Information Center
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.
2014-01-01
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Moving beyond qualitative evaluations of Bayesian models of cognition.
Hemmer, Pernille; Tauber, Sean; Steyvers, Mark
2015-06-01
Bayesian models of cognition provide a powerful way to understand the behavior and goals of individuals from a computational point of view. Much of the focus in the Bayesian cognitive modeling approach has been on qualitative model evaluations, where predictions from the models are compared to data that is often averaged over individuals. In many cognitive tasks, however, there are pervasive individual differences. We introduce an approach to directly infer individual differences related to subjective mental representations within the framework of Bayesian models of cognition. In this approach, Bayesian data analysis methods are used to estimate cognitive parameters and motivate the inference process within a Bayesian cognitive model. We illustrate this integrative Bayesian approach on a model of memory. We apply the model to behavioral data from a memory experiment involving the recall of heights of people. A cross-validation analysis shows that the Bayesian memory model with inferred subjective priors predicts withheld data better than a Bayesian model where the priors are based on environmental statistics. In addition, the model with inferred priors at the individual subject level led to the best overall generalization performance, suggesting that individual differences are important to consider in Bayesian models of cognition.
Bayard, David S.; Neely, Michael
2016-01-01
An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a nonparametric model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher Information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the nonparametric model. Specifically, the problem of identifying an individual from a nonparametric prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient’s behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (Multiple-Model Optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications. PMID:27909942
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Conditional adaptive Bayesian spectral analysis of nonstationary biomedical time series.
Bruce, Scott A; Hall, Martica H; Buysse, Daniel J; Krafty, Robert T
2018-03-01
Many studies of biomedical time series signals aim to measure the association between frequency-domain properties of time series and clinical and behavioral covariates. However, the time-varying dynamics of these associations are largely ignored due to a lack of methods that can assess the changing nature of the relationship through time. This article introduces a method for the simultaneous and automatic analysis of the association between the time-varying power spectrum and covariates, which we refer to as conditional adaptive Bayesian spectrum analysis (CABS). The procedure adaptively partitions the grid of time and covariate values into an unknown number of approximately stationary blocks and nonparametrically estimates local spectra within blocks through penalized splines. CABS is formulated in a fully Bayesian framework, in which the number and locations of partition points are random, and fit using reversible jump Markov chain Monte Carlo techniques. Estimation and inference averaged over the distribution of partitions allows for the accurate analysis of spectra with both smooth and abrupt changes. The proposed methodology is used to analyze the association between the time-varying spectrum of heart rate variability and self-reported sleep quality in a study of older adults serving as the primary caregiver for their ill spouse. © 2017, The International Biometric Society.
A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study.
Kaplan, David; Chen, Jianshen
2012-07-01
A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.
Rajwa, Bartek; Wallace, Paul K.; Griffiths, Elizabeth A.; Dundar, Murat
2017-01-01
Objective Flow cytometry (FC) is a widely acknowledged technology in diagnosis of acute myeloid leukemia (AML) and has been indispensable in determining progression of the disease. Although FC plays a key role as a post-therapy prognosticator and evaluator of therapeutic efficacy, the manual analysis of cytometry data is a barrier to optimization of reproducibility and objectivity. This study investigates the utility of our recently introduced non-parametric Bayesian framework in accurately predicting the direction of change in disease progression in AML patients using FC data. Methods The highly flexible non-parametric Bayesian model based on the infinite mixture of infinite Gaussian mixtures is used for jointly modeling data from multiple FC samples to automatically identify functionally distinct cell populations and their local realizations. Phenotype vectors are obtained by characterizing each sample by the proportions of recovered cell populations, which are in turn used to predict the direction of change in disease progression for each patient. Results We used 200 diseased and non-diseased immunophenotypic panels for training and tested the system with 36 additional AML cases collected at multiple time points. The proposed framework identified the change in direction of disease progression with accuracies of 90% (9 out of 10) for relapsing cases and 100% (26 out of 26) for the remaining cases. Conclusions We believe that these promising results are an important first step towards the development of automated predictive systems for disease monitoring and continuous response evaluation. Significance Automated measurement and monitoring of therapeutic response is critical not only for objective evaluation of disease status prognosis but also for timely assessment of treatment strategies. PMID:27416585
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test
ERIC Educational Resources Information Center
Liang, Tie; Wells, Craig S.
2015-01-01
Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…
NASA Astrophysics Data System (ADS)
Hastuti, S.; Harijono; Murtini, E. S.; Fibrianto, K.
2018-03-01
This current study is aimed to investigate the use of parametric and non-parametric approach for sensory RATA (Rate-All-That-Apply) method. Ledre as Bojonegoro unique local food product was used as point of interest, in which 319 panelists were involved in the study. The result showed that ledre is characterized as easy-crushed texture, sticky in mouth, stingy sensation and easy to swallow. It has also strong banana flavour with brown in colour. Compared to eggroll and semprong, ledre has more variances in terms of taste as well the roll length. As RATA questionnaire is designed to collect categorical data, non-parametric approach is the common statistical procedure. However, similar results were also obtained as parametric approach, regardless the fact of non-normal distributed data. Thus, it suggests that parametric approach can be applicable for consumer study with large number of respondents, even though it may not satisfy the assumption of ANOVA (Analysis of Variances).
NASA Astrophysics Data System (ADS)
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Broadband Processing in a Noisy Shallow Ocean Environment: A Particle Filtering Approach
Candy, J. V.
2016-04-14
Here we report that when a broadband source propagates sound in a shallow ocean the received data can become quite complicated due to temperature-related sound-speed variations and therefore a highly dispersive environment. Noise and uncertainties disrupt this already chaotic environment even further because disturbances propagate through the same inherent acoustic channel. The broadband (signal) estimation/detection problem can be decomposed into a set of narrowband solutions that are processed separately and then combined to achieve more enhancement of signal levels than that available from a single frequency, thereby allowing more information to be extracted leading to a more reliable source detection.more » A Bayesian solution to the broadband modal function tracking, pressure-field enhancement, and source detection problem is developed that leads to nonparametric estimates of desired posterior distributions enabling the estimation of useful statistics and an improved processor/detector. In conclusion, to investigate the processor capabilities, we synthesize an ensemble of noisy, broadband, shallow-ocean measurements to evaluate its overall performance using an information theoretical metric for the preprocessor and the receiver operating characteristic curve for the detector.« less
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Application of Bayesian Approach in Cancer Clinical Trial
Bhattacharjee, Atanu
2014-01-01
The application of Bayesian approach in clinical trials becomes more useful over classical method. It is beneficial from design to analysis phase. The straight forward statement is possible to obtain through Bayesian about the drug treatment effect. Complex computational problems are simple to handle with Bayesian techniques. The technique is only feasible to performing presence of prior information of the data. The inference is possible to establish through posterior estimates. However, some limitations are present in this method. The objective of this work was to explore the several merits and demerits of Bayesian approach in cancer research. The review of the technique will be helpful for the clinical researcher involved in the oncology to explore the limitation and power of Bayesian techniques. PMID:29147387
NASA Astrophysics Data System (ADS)
Thelen, Brian J.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.
2017-04-01
In Bayesian decision theory, there has been a great amount of research into theoretical frameworks and information- theoretic quantities that can be used to provide lower and upper bounds for the Bayes error. These include well-known bounds such as Chernoff, Battacharrya, and J-divergence. Part of the challenge of utilizing these various metrics in practice is (i) whether they are "loose" or "tight" bounds, (ii) how they might be estimated via either parametric or non-parametric methods, and (iii) how accurate the estimates are for limited amounts of data. In general what is desired is a methodology for generating relatively tight lower and upper bounds, and then an approach to estimate these bounds efficiently from data. In this paper, we explore the so-called triangle divergence which has been around for a while, but was recently made more prominent in some recent research on non-parametric estimation of information metrics. Part of this work is motivated by applications for quantifying fundamental information content in SAR/LIDAR data, and to help in this, we have developed a flexible multivariate modeling framework based on multivariate Gaussian copula models which can be combined with the triangle divergence framework to quantify this information, and provide approximate bounds on Bayes error. In this paper we present an overview of the bounds, including those based on triangle divergence and verify that under a number of multivariate models, the upper and lower bounds derived from triangle divergence are significantly tighter than the other common bounds, and often times, dramatically so. We also propose some simple but effective means for computing the triangle divergence using Monte Carlo methods, and then discuss estimation of the triangle divergence from empirical data based on Gaussian Copula models.
A neural network approach to cloud classification
NASA Technical Reports Server (NTRS)
Lee, Jonathan; Weger, Ronald C.; Sengupta, Sailes K.; Welch, Ronald M.
1990-01-01
It is shown that, using high-spatial-resolution data, very high cloud classification accuracies can be obtained with a neural network approach. A texture-based neural network classifier using only single-channel visible Landsat MSS imagery achieves an overall cloud identification accuracy of 93 percent. Cirrus can be distinguished from boundary layer cloudiness with an accuracy of 96 percent, without the use of an infrared channel. Stratocumulus is retrieved with an accuracy of 92 percent, cumulus at 90 percent. The use of the neural network does not improve cirrus classification accuracy. Rather, its main effect is in the improved separation between stratocumulus and cumulus cloudiness. While most cloud classification algorithms rely on linear parametric schemes, the present study is based on a nonlinear, nonparametric four-layer neural network approach. A three-layer neural network architecture, the nonparametric K-nearest neighbor approach, and the linear stepwise discriminant analysis procedure are compared. A significant finding is that significantly higher accuracies are attained with the nonparametric approaches using only 20 percent of the database as training data, compared to 67 percent of the database in the linear approach.
Coalescent methods for estimating phylogenetic trees.
Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V
2009-10-01
We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.
MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umetsu, Keiichi
2013-05-20
Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less
QUANTIFYING ALTERNATIVE SPLICING FROM PAIRED-END RNA-SEQUENCING DATA.
Rossell, David; Stephan-Otto Attolini, Camille; Kroiss, Manuel; Stöcker, Almond
2014-03-01
RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a several fold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.
NASA Astrophysics Data System (ADS)
Li, L.; Xu, C.-Y.; Engeland, K.
2012-04-01
With respect to model calibration, parameter estimation and analysis of uncertainty sources, different approaches have been used in hydrological models. Bayesian method is one of the most widely used methods for uncertainty assessment of hydrological models, which incorporates different sources of information into a single analysis through Bayesian theorem. However, none of these applications can well treat the uncertainty in extreme flows of hydrological models' simulations. This study proposes a Bayesian modularization method approach in uncertainty assessment of conceptual hydrological models by considering the extreme flows. It includes a comprehensive comparison and evaluation of uncertainty assessments by a new Bayesian modularization method approach and traditional Bayesian models using the Metropolis Hasting (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions are used in combination with traditional Bayesian: the AR (1) plus Normal and time period independent model (Model 1), the AR (1) plus Normal and time period dependent model (Model 2) and the AR (1) plus multi-normal model (Model 3). The results reveal that (1) the simulations derived from Bayesian modularization method are more accurate with the highest Nash-Sutcliffe efficiency value, and (2) the Bayesian modularization method performs best in uncertainty estimates of entire flows and in terms of the application and computational efficiency. The study thus introduces a new approach for reducing the extreme flow's effect on the discharge uncertainty assessment of hydrological models via Bayesian. Keywords: extreme flow, uncertainty assessment, Bayesian modularization, hydrological model, WASMOD
Harrison, Jay M; Breeze, Matthew L; Harrigan, George G
2011-08-01
Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.
Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.
Mathematical models for nonparametric inferences from line transect data
Burnham, K.P.; Anderson, D.R.
1976-01-01
A general mathematical theory of line transects is develoepd which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(O) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y/r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(O/r).
Pant, Sanjay; Lombardi, Damiano
2015-10-01
A new approach for assessing parameter identifiability of dynamical systems in a Bayesian setting is presented. The concept of Shannon entropy is employed to measure the inherent uncertainty in the parameters. The expected reduction in this uncertainty is seen as the amount of information one expects to gain about the parameters due to the availability of noisy measurements of the dynamical system. Such expected information gain is interpreted in terms of the variance of a hypothetical measurement device that can measure the parameters directly, and is related to practical identifiability of the parameters. If the individual parameters are unidentifiable, correlation between parameter combinations is assessed through conditional mutual information to determine which sets of parameters can be identified together. The information theoretic quantities of entropy and information are evaluated numerically through a combination of Monte Carlo and k-nearest neighbour methods in a non-parametric fashion. Unlike many methods to evaluate identifiability proposed in the literature, the proposed approach takes the measurement-noise into account and is not restricted to any particular noise-structure. Whilst computationally intensive for large dynamical systems, it is easily parallelisable and is non-intrusive as it does not necessitate re-writing of the numerical solvers of the dynamical system. The application of such an approach is presented for a variety of dynamical systems--ranging from systems governed by ordinary differential equations to partial differential equations--and, where possible, validated against results previously published in the literature. Copyright © 2015 Elsevier Inc. All rights reserved.
Combined non-parametric and parametric approach for identification of time-variant systems
NASA Astrophysics Data System (ADS)
Dziedziech, Kajetan; Czop, Piotr; Staszewski, Wieslaw J.; Uhl, Tadeusz
2018-03-01
Identification of systems, structures and machines with variable physical parameters is a challenging task especially when time-varying vibration modes are involved. The paper proposes a new combined, two-step - i.e. non-parametric and parametric - modelling approach in order to determine time-varying vibration modes based on input-output measurements. Single-degree-of-freedom (SDOF) vibration modes from multi-degree-of-freedom (MDOF) non-parametric system representation are extracted in the first step with the use of time-frequency wavelet-based filters. The second step involves time-varying parametric representation of extracted modes with the use of recursive linear autoregressive-moving-average with exogenous inputs (ARMAX) models. The combined approach is demonstrated using system identification analysis based on the experimental mass-varying MDOF frame-like structure subjected to random excitation. The results show that the proposed combined method correctly captures the dynamics of the analysed structure, using minimum a priori information on the model.
An Instructional Module on Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.
2017-01-01
Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore…
Bayesian flood forecasting methods: A review
NASA Astrophysics Data System (ADS)
Han, Shasha; Coulibaly, Paulin
2017-08-01
Over the past few decades, floods have been seen as one of the most common and largely distributed natural disasters in the world. If floods could be accurately forecasted in advance, then their negative impacts could be greatly minimized. It is widely recognized that quantification and reduction of uncertainty associated with the hydrologic forecast is of great importance for flood estimation and rational decision making. Bayesian forecasting system (BFS) offers an ideal theoretic framework for uncertainty quantification that can be developed for probabilistic flood forecasting via any deterministic hydrologic model. It provides suitable theoretical structure, empirically validated models and reasonable analytic-numerical computation method, and can be developed into various Bayesian forecasting approaches. This paper presents a comprehensive review on Bayesian forecasting approaches applied in flood forecasting from 1999 till now. The review starts with an overview of fundamentals of BFS and recent advances in BFS, followed with BFS application in river stage forecasting and real-time flood forecasting, then move to a critical analysis by evaluating advantages and limitations of Bayesian forecasting methods and other predictive uncertainty assessment approaches in flood forecasting, and finally discusses the future research direction in Bayesian flood forecasting. Results show that the Bayesian flood forecasting approach is an effective and advanced way for flood estimation, it considers all sources of uncertainties and produces a predictive distribution of the river stage, river discharge or runoff, thus gives more accurate and reliable flood forecasts. Some emerging Bayesian forecasting methods (e.g. ensemble Bayesian forecasting system, Bayesian multi-model combination) were shown to overcome limitations of single model or fixed model weight and effectively reduce predictive uncertainty. In recent years, various Bayesian flood forecasting approaches have been developed and widely applied, but there is still room for improvements. Future research in the context of Bayesian flood forecasting should be on assimilation of various sources of newly available information and improvement of predictive performance assessment methods.
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Varadhan, Ravi; Wang, Sue-Jane
2016-01-01
Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the “truth” since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes). PMID:26485117
Mathematical models for non-parametric inferences from line transect data
Burnham, K.P.; Anderson, D.R.
1976-01-01
A general mathematical theory of line transects is developed which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(0) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y I r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(0 I r).
Two Approaches to Calibration in Metrology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark
2014-04-01
Inferring mathematical relationships with quantified uncertainty from measurement data is common to computational science and metrology. Sufficient knowledge of measurement process noise enables Bayesian inference. Otherwise, an alternative approach is required, here termed compartmentalized inference, because collection of uncertain data and model inference occur independently. Bayesian parameterized model inference is compared to a Bayesian-compatible compartmentalized approach for ISO-GUM compliant calibration problems in renewable energy metrology. In either approach, model evidence can help reduce model discrepancy.
Nonparametric method for failures diagnosis in the actuating subsystem of aircraft control system
NASA Astrophysics Data System (ADS)
Terentev, M. N.; Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.
2018-02-01
In this paper we design a nonparametric method for failures diagnosis in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on analytical nonparametric one-step-ahead state prediction approach. This makes it possible to predict the behavior of unidentified and failure dynamic systems, to weaken the requirements to control signals, and to reduce the diagnostic time and problem complexity.
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.
Iima, Mami; Kataoka, Masako; Kanao, Shotaro; Onishi, Natsuko; Kawai, Makiko; Ohashi, Akane; Sakaguchi, Rena; Toi, Masakazu; Togashi, Kaori
2018-05-01
Purpose To investigate the performance of integrated approaches that combined intravoxel incoherent motion (IVIM) and non-Gaussian diffusion parameters compared with the Breast Imaging and Reporting Data System (BI-RADS) to establish multiparameter thresholds scores or probabilities by using Bayesian analysis to distinguish malignant from benign breast lesions and their correlation with molecular prognostic factors. Materials and Methods Between May 2013 and March 2015, 411 patients were prospectively enrolled and 199 patients (allocated to training [n = 99] and validation [n = 100] sets) were included in this study. IVIM parameters (flowing blood volume fraction [fIVIM] and pseudodiffusion coefficient [D*]) and non-Gaussian diffusion parameters (theoretical apparent diffusion coefficient [ADC] at b value of 0 sec/mm 2 [ADC 0 ] and kurtosis [K]) by using IVIM and kurtosis models were estimated from diffusion-weighted image series (16 b values up to 2500 sec/mm 2 ), as well as a synthetic ADC (sADC) calculated by using b values of 200 and 1500 (sADC 200-1500 ) and a standard ADC calculated by using b values of 0 and 800 sec/mm 2 (ADC 0-800 ). The performance of two diagnostic approaches (combined parameter thresholds and Bayesian analysis) combining IVIM and diffusion parameters was evaluated and compared with BI-RADS performance. The Mann-Whitney U test and a nonparametric multiple comparison test were used to compare their performance to determine benignity or malignancy and as molecular prognostic biomarkers and subtypes of breast cancer. Results Significant differences were found between malignant and benign breast lesions for IVIM and non-Gaussian diffusion parameters (ADC 0 , K, fIVIM, fIVIM · D*, sADC 200-1500, and ADC 0-800 ; P < .05). Sensitivity and specificity for the validation set by radiologists A and B were as follows: sensitivity, 94.7% and 89.5%, and specificity, 75.0% and 79.2% for sADC 200-1500 , respectively; sensitivity, 94.7% and 96.1%, and specificity, 75.0% and 66.7%, for the combined thresholds approach, respectively; sensitivity, 92.1% and 92.1%, and specificity, 83.3% and 66.7%, for Bayesian analysis, respectively; and sensitivity and specificity, 100% and 79.2%, for BI-RADS, respectively. The significant difference in values of sADC 200-1500 in progesterone receptor status (P = .002) was noted. sADC 200-1500 was significantly different between histologic subtypes (P = .006). Conclusion Approaches that combined various IVIM and non-Gaussian diffusion MR imaging parameters may provide BI-RADS-equivalent scores almost comparable to BI-RADS categories without the use of contrast agents. Non-Gaussian diffusion parameters also differed by biologic prognostic factors. © RSNA, 2017 Online supplemental material is available for this article.
Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837
ERIC Educational Resources Information Center
Levy, Roy
2014-01-01
Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…
Astrophysical data analysis with information field theory
NASA Astrophysics Data System (ADS)
Enßlin, Torsten
2014-12-01
Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory, which permits the construction of optimal signal recovery algorithms. It exploits spatial correlations of the signal fields even for nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for recovering signals of unknown correlation structure by using IFT will be discussed in particular as well as a novel improvement on instrumental self-calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic microwave background, large-scale structure) and astrophysics (galactic magnetism, radio interferometry) are presented.
A Bayesian approach to multivariate measurement system assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamada, Michael Scott
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
A Bayesian approach to multivariate measurement system assessment
Hamada, Michael Scott
2016-07-01
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
NASA Astrophysics Data System (ADS)
Li, Lu; Xu, Chong-Yu; Engeland, Kolbjørn
2013-04-01
SummaryWith respect to model calibration, parameter estimation and analysis of uncertainty sources, various regression and probabilistic approaches are used in hydrological modeling. A family of Bayesian methods, which incorporates different sources of information into a single analysis through Bayes' theorem, is widely used for uncertainty assessment. However, none of these approaches can well treat the impact of high flows in hydrological modeling. This study proposes a Bayesian modularization uncertainty assessment approach in which the highest streamflow observations are treated as suspect information that should not influence the inference of the main bulk of the model parameters. This study includes a comprehensive comparison and evaluation of uncertainty assessments by our new Bayesian modularization method and standard Bayesian methods using the Metropolis-Hastings (MH) algorithm with the daily hydrological model WASMOD. Three likelihood functions were used in combination with standard Bayesian method: the AR(1) plus Normal model independent of time (Model 1), the AR(1) plus Normal model dependent on time (Model 2) and the AR(1) plus Multi-normal model (Model 3). The results reveal that the Bayesian modularization method provides the most accurate streamflow estimates measured by the Nash-Sutcliffe efficiency and provide the best in uncertainty estimates for low, medium and entire flows compared to standard Bayesian methods. The study thus provides a new approach for reducing the impact of high flows on the discharge uncertainty assessment of hydrological models via Bayesian method.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Tweedell, Andrew J.; Haynes, Courtney A.
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60–90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity. PMID:28489897
Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566
Marmarelis, Vasilis Z.; Berger, Theodore W.
2009-01-01
Parametric and non-parametric modeling methods are combined to study the short-term plasticity (STP) of synapses in the central nervous system (CNS). The nonlinear dynamics of STP are modeled by means: (1) previously proposed parametric models based on mechanistic hypotheses and/or specific dynamical processes, and (2) non-parametric models (in the form of Volterra kernels) that transforms the presynaptic signals into postsynaptic signals. In order to synergistically use the two approaches, we estimate the Volterra kernels of the parametric models of STP for four types of synapses using synthetic broadband input–output data. Results show that the non-parametric models accurately and efficiently replicate the input–output transformations of the parametric models. Volterra kernels provide a general and quantitative representation of the STP. PMID:18506609
Particle identification in ALICE: a Bayesian approach
NASA Astrophysics Data System (ADS)
Adam, J.; Adamová, D.; Aggarwal, M. M.; Aglieri Rinella, G.; Agnello, M.; Agrawal, N.; Ahammed, Z.; Ahmad, S.; Ahn, S. U.; Aiola, S.; Akindinov, A.; Alam, S. N.; Albuquerque, D. S. D.; Aleksandrov, D.; Alessandro, B.; Alexandre, D.; Alfaro Molina, R.; Alici, A.; Alkin, A.; Almaraz, J. R. M.; Alme, J.; Alt, T.; Altinpinar, S.; Altsybeev, I.; Alves Garcia Prado, C.; Andrei, C.; Andronic, A.; Anguelov, V.; Antičić, T.; Antinori, F.; Antonioli, P.; Aphecetche, L.; Appelshäuser, H.; Arcelli, S.; Arnaldi, R.; Arnold, O. W.; Arsene, I. C.; Arslandok, M.; Audurier, B.; Augustinus, A.; Averbeck, R.; Azmi, M. D.; Badalà, A.; Baek, Y. W.; Bagnasco, S.; Bailhache, R.; Bala, R.; Balasubramanian, S.; Baldisseri, A.; Baral, R. C.; Barbano, A. M.; Barbera, R.; Barile, F.; Barnaföldi, G. G.; Barnby, L. S.; Barret, V.; Bartalini, P.; Barth, K.; Bartke, J.; Bartsch, E.; Basile, M.; Bastid, N.; Basu, S.; Bathen, B.; Batigne, G.; Batista Camejo, A.; Batyunya, B.; Batzing, P. C.; Bearden, I. G.; Beck, H.; Bedda, C.; Behera, N. K.; Belikov, I.; Bellini, F.; Bello Martinez, H.; Bellwied, R.; Belmont, R.; Belmont-Moreno, E.; Belyaev, V.; Benacek, P.; Bencedi, G.; Beole, S.; Berceanu, I.; Bercuci, A.; Berdnikov, Y.; Berenyi, D.; Bertens, R. A.; Berzano, D.; Betev, L.; Bhasin, A.; Bhat, I. R.; Bhati, A. K.; Bhattacharjee, B.; Bhom, J.; Bianchi, L.; Bianchi, N.; Bianchin, C.; Bielčík, J.; Bielčíková, J.; Bilandzic, A.; Biro, G.; Biswas, R.; Biswas, S.; Bjelogrlic, S.; Blair, J. T.; Blau, D.; Blume, C.; Bock, F.; Bogdanov, A.; Bøggild, H.; Boldizsár, L.; Bombara, M.; Book, J.; Borel, H.; Borissov, A.; Borri, M.; Bossú, F.; Botta, E.; Bourjau, C.; Braun-Munzinger, P.; Bregant, M.; Breitner, T.; Broker, T. A.; Browning, T. A.; Broz, M.; Brucken, E. J.; Bruna, E.; Bruno, G. E.; Budnikov, D.; Buesching, H.; Bufalino, S.; Buncic, P.; Busch, O.; Buthelezi, Z.; Butt, J. B.; Buxton, J. T.; Cabala, J.; Caffarri, D.; Cai, X.; Caines, H.; Calero Diaz, L.; Caliva, A.; Calvo Villar, E.; Camerini, P.; Carena, F.; Carena, W.; Carnesecchi, F.; Castillo Castellanos, J.; Castro, A. J.; Casula, E. A. R.; Ceballos Sanchez, C.; Cepila, J.; Cerello, P.; Cerkala, J.; Chang, B.; Chapeland, S.; Chartier, M.; Charvet, J. L.; Chattopadhyay, S.; Chattopadhyay, S.; Chauvin, A.; Chelnokov, V.; Cherney, M.; Cheshkov, C.; Cheynis, B.; Chibante Barroso, V.; Chinellato, D. D.; Cho, S.; Chochula, P.; Choi, K.; Chojnacki, M.; Choudhury, S.; Christakoglou, P.; Christensen, C. H.; Christiansen, P.; Chujo, T.; Chung, S. U.; Cicalo, C.; Cifarelli, L.; Cindolo, F.; Cleymans, J.; Colamaria, F.; Colella, D.; Collu, A.; Colocci, M.; Conesa Balbastre, G.; Conesa del Valle, Z.; Connors, M. E.; Contreras, J. G.; Cormier, T. M.; Corrales Morales, Y.; Cortés Maldonado, I.; Cortese, P.; Cosentino, M. R.; Costa, F.; Crochet, P.; Cruz Albino, R.; Cuautle, E.; Cunqueiro, L.; Dahms, T.; Dainese, A.; Danisch, M. C.; Danu, A.; Das, D.; Das, I.; Das, S.; Dash, A.; Dash, S.; De, S.; De Caro, A.; de Cataldo, G.; de Conti, C.; de Cuveland, J.; De Falco, A.; De Gruttola, D.; De Marco, N.; De Pasquale, S.; Deisting, A.; Deloff, A.; Dénes, E.; Deplano, C.; Dhankher, P.; Di Bari, D.; Di Mauro, A.; Di Nezza, P.; Diaz Corchero, M. A.; Dietel, T.; Dillenseger, P.; Divià, R.; Djuvsland, Ø.; Dobrin, A.; Domenicis Gimenez, D.; Dönigus, B.; Dordic, O.; Drozhzhova, T.; Dubey, A. K.; Dubla, A.; Ducroux, L.; Dupieux, P.; Ehlers, R. J.; Elia, D.; Endress, E.; Engel, H.; Epple, E.; Erazmus, B.; Erdemir, I.; Erhardt, F.; Espagnon, B.; Estienne, M.; Esumi, S.; Eum, J.; Evans, D.; Evdokimov, S.; Eyyubova, G.; Fabbietti, L.; Fabris, D.; Faivre, J.; Fantoni, A.; Fasel, M.; Feldkamp, L.; Feliciello, A.; Feofilov, G.; Ferencei, J.; Fernández Téllez, A.; Ferreiro, E. G.; Ferretti, A.; Festanti, A.; Feuillard, V. J. G.; Figiel, J.; Figueredo, M. A. S.; Filchagin, S.; Finogeev, D.; Fionda, F. M.; Fiore, E. M.; Fleck, M. G.; Floris, M.; Foertsch, S.; Foka, P.; Fokin, S.; Fragiacomo, E.; Francescon, A.; Frankenfeld, U.; Fronze, G. G.; Fuchs, U.; Furget, C.; Furs, A.; Fusco Girard, M.; Gaardhøje, J. J.; Gagliardi, M.; Gago, A. M.; Gallio, M.; Gangadharan, D. R.; Ganoti, P.; Gao, C.; Garabatos, C.; Garcia-Solis, E.; Gargiulo, C.; Gasik, P.; Gauger, E. F.; Germain, M.; Gheata, A.; Gheata, M.; Ghosh, P.; Ghosh, S. K.; Gianotti, P.; Giubellino, P.; Giubilato, P.; Gladysz-Dziadus, E.; Glässel, P.; Goméz Coral, D. M.; Gomez Ramirez, A.; Gonzalez, A. S.; Gonzalez, V.; González-Zamora, P.; Gorbunov, S.; Görlich, L.; Gotovac, S.; Grabski, V.; Grachov, O. A.; Graczykowski, L. K.; Graham, K. L.; Grelli, A.; Grigoras, A.; Grigoras, C.; Grigoriev, V.; Grigoryan, A.; Grigoryan, S.; Grinyov, B.; Grion, N.; Gronefeld, J. M.; Grosse-Oetringhaus, J. F.; Grosso, R.; Guber, F.; Guernane, R.; Guerzoni, B.; Gulbrandsen, K.; Gunji, T.; Gupta, A.; Gupta, R.; Haake, R.; Haaland, Ø.; Hadjidakis, C.; Haiduc, M.; Hamagaki, H.; Hamar, G.; Hamon, J. C.; Harris, J. W.; Harton, A.; Hatzifotiadou, D.; Hayashi, S.; Heckel, S. T.; Hellbär, E.; Helstrup, H.; Herghelegiu, A.; Herrera Corral, G.; Hess, B. A.; Hetland, K. F.; Hillemanns, H.; Hippolyte, B.; Horak, D.; Hosokawa, R.; Hristov, P.; Humanic, T. J.; Hussain, N.; Hussain, T.; Hutter, D.; Hwang, D. S.; Ilkaev, R.; Inaba, M.; Incani, E.; Ippolitov, M.; Irfan, M.; Ivanov, M.; Ivanov, V.; Izucheev, V.; Jacazio, N.; Jacobs, P. M.; Jadhav, M. B.; Jadlovska, S.; Jadlovsky, J.; Jahnke, C.; Jakubowska, M. J.; Jang, H. J.; Janik, M. A.; Jayarathna, P. H. S. Y.; Jena, C.; Jena, S.; Jimenez Bustamante, R. T.; Jones, P. G.; Jusko, A.; Kalinak, P.; Kalweit, A.; Kamin, J.; Kang, J. H.; Kaplin, V.; Kar, S.; Karasu Uysal, A.; Karavichev, O.; Karavicheva, T.; Karayan, L.; Karpechev, E.; Kebschull, U.; Keidel, R.; Keijdener, D. L. D.; Keil, M.; Mohisin Khan, M.; Khan, P.; Khan, S. A.; Khanzadeev, A.; Kharlov, Y.; Kileng, B.; Kim, D. W.; Kim, D. J.; Kim, D.; Kim, H.; Kim, J. S.; Kim, M.; Kim, S.; Kim, T.; Kirsch, S.; Kisel, I.; Kiselev, S.; Kisiel, A.; Kiss, G.; Klay, J. L.; Klein, C.; Klein, J.; Klein-Bösing, C.; Klewin, S.; Kluge, A.; Knichel, M. L.; Knospe, A. G.; Kobdaj, C.; Kofarago, M.; Kollegger, T.; Kolojvari, A.; Kondratiev, V.; Kondratyeva, N.; Kondratyuk, E.; Konevskikh, A.; Kopcik, M.; Kostarakis, P.; Kour, M.; Kouzinopoulos, C.; Kovalenko, O.; Kovalenko, V.; Kowalski, M.; Koyithatta Meethaleveedu, G.; Králik, I.; Kravčáková, A.; Krivda, M.; Krizek, F.; Kryshen, E.; Krzewicki, M.; Kubera, A. M.; Kučera, V.; Kuhn, C.; Kuijer, P. G.; Kumar, A.; Kumar, J.; Kumar, L.; Kumar, S.; Kurashvili, P.; Kurepin, A.; Kurepin, A. B.; Kuryakin, A.; Kweon, M. J.; Kwon, Y.; La Pointe, S. L.; La Rocca, P.; Ladron de Guevara, P.; Lagana Fernandes, C.; Lakomov, I.; Langoy, R.; Lara, C.; Lardeux, A.; Lattuca, A.; Laudi, E.; Lea, R.; Leardini, L.; Lee, G. R.; Lee, S.; Lehas, F.; Lemmon, R. C.; Lenti, V.; Leogrande, E.; León Monzón, I.; León Vargas, H.; Leoncino, M.; Lévai, P.; Li, S.; Li, X.; Lien, J.; Lietava, R.; Lindal, S.; Lindenstruth, V.; Lippmann, C.; Lisa, M. A.; Ljunggren, H. M.; Lodato, D. F.; Loenne, P. I.; Loginov, V.; Loizides, C.; Lopez, X.; López Torres, E.; Lowe, A.; Luettig, P.; Lunardon, M.; Luparello, G.; Lutz, T. H.; Maevskaya, A.; Mager, M.; Mahajan, S.; Mahmood, S. M.; Maire, A.; Majka, R. D.; Malaev, M.; Maldonado Cervantes, I.; Malinina, L.; Mal'Kevich, D.; Malzacher, P.; Mamonov, A.; Manko, V.; Manso, F.; Manzari, V.; Marchisone, M.; Mareš, J.; Margagliotti, G. V.; Margotti, A.; Margutti, J.; Marín, A.; Markert, C.; Marquard, M.; Martin, N. A.; Martin Blanco, J.; Martinengo, P.; Martínez, M. I.; Martínez García, G.; Martinez Pedreira, M.; Mas, A.; Masciocchi, S.; Masera, M.; Masoni, A.; Mastroserio, A.; Matyja, A.; Mayer, C.; Mazer, J.; Mazzoni, M. A.; Mcdonald, D.; Meddi, F.; Melikyan, Y.; Menchaca-Rocha, A.; Meninno, E.; Mercado Pérez, J.; Meres, M.; Miake, Y.; Mieskolainen, M. M.; Mikhaylov, K.; Milano, L.; Milosevic, J.; Mischke, A.; Mishra, A. N.; Miśkowiec, D.; Mitra, J.; Mitu, C. M.; Mohammadi, N.; Mohanty, B.; Molnar, L.; Montaño Zetina, L.; Montes, E.; Moreira De Godoy, D. A.; Moreno, L. A. P.; Moretto, S.; Morreale, A.; Morsch, A.; Muccifora, V.; Mudnic, E.; Mühlheim, D.; Muhuri, S.; Mukherjee, M.; Mulligan, J. D.; Munhoz, M. G.; Munzer, R. H.; Murakami, H.; Murray, S.; Musa, L.; Musinsky, J.; Naik, B.; Nair, R.; Nandi, B. K.; Nania, R.; Nappi, E.; Naru, M. U.; Natal da Luz, H.; Nattrass, C.; Navarro, S. R.; Nayak, K.; Nayak, R.; Nayak, T. K.; Nazarenko, S.; Nedosekin, A.; Nellen, L.; Ng, F.; Nicassio, M.; Niculescu, M.; Niedziela, J.; Nielsen, B. S.; Nikolaev, S.; Nikulin, S.; Nikulin, V.; Noferini, F.; Nomokonov, P.; Nooren, G.; Noris, J. C. C.; Norman, J.; Nyanin, A.; Nystrand, J.; Oeschler, H.; Oh, S.; Oh, S. K.; Ohlson, A.; Okatan, A.; Okubo, T.; Olah, L.; Oleniacz, J.; Oliveira Da Silva, A. C.; Oliver, M. H.; Onderwaater, J.; Oppedisano, C.; Orava, R.; Oravec, M.; Ortiz Velasquez, A.; Oskarsson, A.; Otwinowski, J.; Oyama, K.; Ozdemir, M.; Pachmayer, Y.; Pagano, D.; Pagano, P.; Paić, G.; Pal, S. K.; Pan, J.; Pandey, A. K.; Papikyan, V.; Pappalardo, G. S.; Pareek, P.; Park, W. J.; Parmar, S.; Passfeld, A.; Paticchio, V.; Patra, R. N.; Paul, B.; Pei, H.; Peitzmann, T.; Pereira Da Costa, H.; Peresunko, D.; Pérez Lara, C. E.; Perez Lezama, E.; Peskov, V.; Pestov, Y.; Petráček, V.; Petrov, V.; Petrovici, M.; Petta, C.; Piano, S.; Pikna, M.; Pillot, P.; Pimentel, L. O. D. L.; Pinazza, O.; Pinsky, L.; Piyarathna, D. B.; Płoskoń, M.; Planinic, M.; Pluta, J.; Pochybova, S.; Podesta-Lerma, P. L. M.; Poghosyan, M. G.; Polichtchouk, B.; Poljak, N.; Poonsawat, W.; Pop, A.; Porteboeuf-Houssais, S.; Porter, J.; Pospisil, J.; Prasad, S. K.; Preghenella, R.; Prino, F.; Pruneau, C. A.; Pshenichnov, I.; Puccio, M.; Puddu, G.; Pujahari, P.; Punin, V.; Putschke, J.; Qvigstad, H.; Rachevski, A.; Raha, S.; Rajput, S.; Rak, J.; Rakotozafindrabe, A.; Ramello, L.; Rami, F.; Raniwala, R.; Raniwala, S.; Räsänen, S. S.; Rascanu, B. T.; Rathee, D.; Read, K. F.; Redlich, K.; Reed, R. J.; Rehman, A.; Reichelt, P.; Reidt, F.; Ren, X.; Renfordt, R.; Reolon, A. R.; Reshetin, A.; Reygers, K.; Riabov, V.; Ricci, R. A.; Richert, T.; Richter, M.; Riedler, P.; Riegler, W.; Riggi, F.; Ristea, C.; Rocco, E.; Rodríguez Cahuantzi, M.; Rodriguez Manso, A.; Røed, K.; Rogochaya, E.; Rohr, D.; Röhrich, D.; Ronchetti, F.; Ronflette, L.; Rosnet, P.; Rossi, A.; Roukoutakis, F.; Roy, A.; Roy, C.; Roy, P.; Rubio Montero, A. J.; Rui, R.; Russo, R.; Ryabinkin, E.; Ryabov, Y.; Rybicki, A.; Saarinen, S.; Sadhu, S.; Sadovsky, S.; Šafařík, K.; Sahlmuller, B.; Sahoo, P.; Sahoo, R.; Sahoo, S.; Sahu, P. K.; Saini, J.; Sakai, S.; Saleh, M. A.; Salzwedel, J.; Sambyal, S.; Samsonov, V.; Šándor, L.; Sandoval, A.; Sano, M.; Sarkar, D.; Sarkar, N.; Sarma, P.; Scapparone, E.; Scarlassara, F.; Schiaua, C.; Schicker, R.; Schmidt, C.; Schmidt, H. R.; Schuchmann, S.; Schukraft, J.; Schulc, M.; Schutz, Y.; Schwarz, K.; Schweda, K.; Scioli, G.; Scomparin, E.; Scott, R.; Šefčík, M.; Seger, J. E.; Sekiguchi, Y.; Sekihata, D.; Selyuzhenkov, I.; Senosi, K.; Senyukov, S.; Serradilla, E.; Sevcenco, A.; Shabanov, A.; Shabetai, A.; Shadura, O.; Shahoyan, R.; Shahzad, M. I.; Shangaraev, A.; Sharma, A.; Sharma, M.; Sharma, M.; Sharma, N.; Sheikh, A. I.; Shigaki, K.; Shou, Q.; Shtejer, K.; Sibiriak, Y.; Siddhanta, S.; Sielewicz, K. M.; Siemiarczuk, T.; Silvermyr, D.; Silvestre, C.; Simatovic, G.; Simonetti, G.; Singaraju, R.; Singh, R.; Singha, S.; Singhal, V.; Sinha, B. C.; Sinha, T.; Sitar, B.; Sitta, M.; Skaali, T. B.; Slupecki, M.; Smirnov, N.; Snellings, R. J. M.; Snellman, T. W.; Song, J.; Song, M.; Song, Z.; Soramel, F.; Sorensen, S.; Souza, R. D. de; Sozzi, F.; Spacek, M.; Spiriti, E.; Sputowska, I.; Spyropoulou-Stassinaki, M.; Stachel, J.; Stan, I.; Stankus, P.; Stenlund, E.; Steyn, G.; Stiller, J. H.; Stocco, D.; Strmen, P.; Suaide, A. A. P.; Sugitate, T.; Suire, C.; Suleymanov, M.; Suljic, M.; Sultanov, R.; Šumbera, M.; Sumowidagdo, S.; Szabo, A.; Szanto de Toledo, A.; Szarka, I.; Szczepankiewicz, A.; Szymanski, M.; Tabassam, U.; Takahashi, J.; Tambave, G. J.; Tanaka, N.; Tarhini, M.; Tariq, M.; Tarzila, M. G.; Tauro, A.; Tejeda Muñoz, G.; Telesca, A.; Terasaki, K.; Terrevoli, C.; Teyssier, B.; Thäder, J.; Thakur, D.; Thomas, D.; Tieulent, R.; Timmins, A. R.; Toia, A.; Trogolo, S.; Trombetta, G.; Trubnikov, V.; Trzaska, W. H.; Tsuji, T.; Tumkin, A.; Turrisi, R.; Tveter, T. S.; Ullaland, K.; Uras, A.; Usai, G. L.; Utrobicic, A.; Vala, M.; Valencia Palomo, L.; Vallero, S.; Van Der Maarel, J.; Van Hoorne, J. W.; van Leeuwen, M.; Vanat, T.; Vande Vyvre, P.; Varga, D.; Vargas, A.; Vargyas, M.; Varma, R.; Vasileiou, M.; Vasiliev, A.; Vauthier, A.; Vechernin, V.; Veen, A. M.; Veldhoen, M.; Velure, A.; Vercellin, E.; Vergara Limón, S.; Vernet, R.; Verweij, M.; Vickovic, L.; Viesti, G.; Viinikainen, J.; Vilakazi, Z.; Villalobos Baillie, O.; Villatoro Tello, A.; Vinogradov, A.; Vinogradov, L.; Vinogradov, Y.; Virgili, T.; Vislavicius, V.; Viyogi, Y. P.; Vodopyanov, A.; Völkl, M. A.; Voloshin, K.; Voloshin, S. A.; Volpe, G.; von Haller, B.; Vorobyev, I.; Vranic, D.; Vrláková, J.; Vulpescu, B.; Wagner, B.; Wagner, J.; Wang, H.; Wang, M.; Watanabe, D.; Watanabe, Y.; Weber, M.; Weber, S. G.; Weiser, D. F.; Wessels, J. P.; Westerhoff, U.; Whitehead, A. M.; Wiechula, J.; Wikne, J.; Wilk, G.; Wilkinson, J.; Williams, M. C. S.; Windelband, B.; Winn, M.; Yang, H.; Yang, P.; Yano, S.; Yasin, Z.; Yin, Z.; Yokoyama, H.; Yoo, I.-K.; Yoon, J. H.; Yurchenko, V.; Yushmanov, I.; Zaborowska, A.; Zaccolo, V.; Zaman, A.; Zampolli, C.; Zanoli, H. J. C.; Zaporozhets, S.; Zardoshti, N.; Zarochentsev, A.; Závada, P.; Zaviyalov, N.; Zbroszczyk, H.; Zgura, I. S.; Zhalov, M.; Zhang, H.; Zhang, X.; Zhang, Y.; Zhang, C.; Zhang, Z.; Zhao, C.; Zhigareva, N.; Zhou, D.; Zhou, Y.; Zhou, Z.; Zhu, H.; Zhu, J.; Zichichi, A.; Zimmermann, A.; Zimmermann, M. B.; Zinovjev, G.; Zyzak, M.
2016-05-01
We present a Bayesian approach to particle identification (PID) within the ALICE experiment. The aim is to more effectively combine the particle identification capabilities of its various detectors. After a brief explanation of the adopted methodology and formalism, the performance of the Bayesian PID approach for charged pions, kaons and protons in the central barrel of ALICE is studied. PID is performed via measurements of specific energy loss ( d E/d x) and time of flight. PID efficiencies and misidentification probabilities are extracted and compared with Monte Carlo simulations using high-purity samples of identified particles in the decay channels K0S → π-π+, φ→ K-K+, and Λ→ p π- in p-Pb collisions at √{s_{NN}}=5.02 TeV. In order to thoroughly assess the validity of the Bayesian approach, this methodology was used to obtain corrected pT spectra of pions, kaons, protons, and D0 mesons in pp collisions at √{s}=7 TeV. In all cases, the results using Bayesian PID were found to be consistent with previous measurements performed by ALICE using a standard PID approach. For the measurement of D0 → K-π+, it was found that a Bayesian PID approach gave a higher signal-to-background ratio and a similar or larger statistical significance when compared with standard PID selections, despite a reduced identification efficiency. Finally, we present an exploratory study of the measurement of Λc+ → p K-π+ in pp collisions at √{s}=7 TeV, using the Bayesian approach for the identification of its decay products.
Esposito, Michael H.; Lee, Hedwig; Hicken, Margret T.; Porter, Lauren C.; Herting, Jerald R.
2017-01-01
A rapidly growing literature has documented the adverse social, economic and, recently, health impacts of experiencing incarceration in the United States. Despite the insights that this work has provided in consistently documenting the deleterious effects of incarceration, little is known about the specific timing of criminal justice contact and early health consequences during the transition from adolescence to adulthood—a critical period in the life course, particularly for the development of poor health. Previous literature on the role of incarceration has also been hampered by the difficulties of parsing out the influence that incarceration exerts on health from the social and economic confounding forces that are linked to both criminal justice contact and health. This paper addresses these two gaps in the literature by examining the association between incarceration and health in the United States during the transition to adulthood, and by using an analytic approach that better isolates the association of incarceration with health from the multitude of confounders which could be alternatively driving this association. In this endeavor, we make use of variable-rich data from The National Longitudinal Study of Adolescent to Adult Health (n = 10,785) and a non-parametric Bayesian machine learning technique- Bayesian Additive Regression Trees. Our results suggest that the experience of incarceration at this stage of the life course increases the probability of depression, adversely affects the perception of general health status, but has no effect on the probability of developing hypertension in early adulthood. These findings signal that incarceration in emerging adulthood is an important stressor that can have immediate implications for mental and general health in early adulthood, and may help to explain long lasting implications incarceration has for health across the life course. PMID:28781613
Estimating extreme river discharges in Europe through a Bayesian network
NASA Astrophysics Data System (ADS)
Paprotny, Dominik; Morales-Nápoles, Oswaldo
2017-06-01
Large-scale hydrological modelling of flood hazards requires adequate extreme discharge data. In practise, models based on physics are applied alongside those utilizing only statistical analysis. The former require enormous computational power, while the latter are mostly limited in accuracy and spatial coverage. In this paper we introduce an alternate, statistical approach based on Bayesian networks (BNs), a graphical model for dependent random variables. We use a non-parametric BN to describe the joint distribution of extreme discharges in European rivers and variables representing the geographical characteristics of their catchments. Annual maxima of daily discharges from more than 1800 river gauges (stations with catchment areas ranging from 1.4 to 807 000 km2) were collected, together with information on terrain, land use and local climate. The (conditional) correlations between the variables are modelled through copulas, with the dependency structure defined in the network. The results show that using this method, mean annual maxima and return periods of discharges could be estimated with an accuracy similar to existing studies using physical models for Europe and better than a comparable global statistical model. Performance of the model varies slightly between regions of Europe, but is consistent between different time periods, and remains the same in a split-sample validation. Though discharge prediction under climate change is not the main scope of this paper, the BN was applied to a large domain covering all sizes of rivers in the continent both for present and future climate, as an example. Results show substantial variation in the influence of climate change on river discharges. The model can be used to provide quick estimates of extreme discharges at any location for the purpose of obtaining input information for hydraulic modelling.
Dalle Carbonare, S; Folli, F; Patrini, E; Giudici, P; Bellazzi, R
2013-01-01
The increasing demand of health care services and the complexity of health care delivery require Health Care Organizations (HCOs) to approach clinical risk management through proper methods and tools. An important aspect of risk management is to exploit the analysis of medical injuries compensation claims in order to reduce adverse events and, at the same time, to optimize the costs of health insurance policies. This work provides a probabilistic method to estimate the risk level of a HCO by computing quantitative risk indexes from medical injury compensation claims. Our method is based on the estimate of a loss probability distribution from compensation claims data through parametric and non-parametric modeling and Monte Carlo simulations. The loss distribution can be estimated both on the whole dataset and, thanks to the application of a Bayesian hierarchical model, on stratified data. The approach allows to quantitatively assessing the risk structure of the HCO by analyzing the loss distribution and deriving its expected value and percentiles. We applied the proposed method to 206 cases of injuries with compensation requests collected from 1999 to the first semester of 2007 by the HCO of Lodi, in the Northern part of Italy. We computed the risk indexes taking into account the different clinical departments and the different hospitals involved. The approach proved to be useful to understand the HCO risk structure in terms of frequency, severity, expected and unexpected loss related to adverse events.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.
Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen
2011-04-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property
Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen
2010-01-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586
Zhao, Zhibiao
2011-06-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Kruschke, John K; Liddell, Torrin M
2018-02-01
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
NASA Astrophysics Data System (ADS)
Gogu, C.; Haftka, R.; LeRiche, R.; Molimard, J.; Vautrin, A.; Sankar, B.
2008-11-01
The basic formulation of the least squares method, based on the L2 norm of the misfit, is still widely used today for identifying elastic material properties from experimental data. An alternative statistical approach is the Bayesian method. We seek here situations with significant difference between the material properties found by the two methods. For a simple three bar truss example we illustrate three such situations in which the Bayesian approach leads to more accurate results: different magnitude of the measurements, different uncertainty in the measurements and correlation among measurements. When all three effects add up, the Bayesian approach can have a large advantage. We then compared the two methods for identification of elastic constants from plate vibration natural frequencies.
Gaussian process regression for forecasting battery state of health
NASA Astrophysics Data System (ADS)
Richardson, Robert R.; Osborne, Michael A.; Howey, David A.
2017-07-01
Accurately predicting the future capacity and remaining useful life of batteries is necessary to ensure reliable system operation and to minimise maintenance costs. The complex nature of battery degradation has meant that mechanistic modelling of capacity fade has thus far remained intractable; however, with the advent of cloud-connected devices, data from cells in various applications is becoming increasingly available, and the feasibility of data-driven methods for battery prognostics is increasing. Here we propose Gaussian process (GP) regression for forecasting battery state of health, and highlight various advantages of GPs over other data-driven and mechanistic approaches. GPs are a type of Bayesian non-parametric method, and hence can model complex systems whilst handling uncertainty in a principled manner. Prior information can be exploited by GPs in a variety of ways: explicit mean functions can be used if the functional form of the underlying degradation model is available, and multiple-output GPs can effectively exploit correlations between data from different cells. We demonstrate the predictive capability of GPs for short-term and long-term (remaining useful life) forecasting on a selection of capacity vs. cycle datasets from lithium-ion cells.
Bayesian deconvolution of [corrected] fMRI data using bilinear dynamical systems.
Makni, Salima; Beckmann, Christian; Smith, Steve; Woolrich, Mark
2008-10-01
In Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993], a particular case of the Linear Dynamical Systems (LDSs) was used to model the dynamic behavior of the BOLD response in functional MRI. This state-space model, called bilinear dynamical system (BDS), is used to deconvolve the fMRI time series in order to estimate the neuronal response induced by the different stimuli of the experimental paradigm. The BDS model parameters are estimated using an expectation-maximization (EM) algorithm proposed by Ghahramani and Hinton [Ghahramani, Z., Hinton, G.E. 1996. Parameter Estimation for Linear Dynamical Systems. Technical Report, Department of Computer Science, University of Toronto]. In this paper we introduce modifications to the BDS model in order to explicitly model the spatial variations of the haemodynamic response function (HRF) in the brain using a non-parametric approach. While in Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993] the relationship between neuronal activation and fMRI signals is formulated as a first-order convolution with a kernel expansion using basis functions (typically two or three), in this paper, we argue in favor of a spatially adaptive GLM in which a local non-parametric estimation of the HRF is performed. Furthermore, in order to overcome the overfitting problem typically associated with simple EM estimates, we propose a full Variational Bayes (VB) solution to infer the BDS model parameters. We demonstrate the usefulness of our model which is able to estimate both the neuronal activity and the haemodynamic response function in every voxel of the brain. We first examine the behavior of this approach when applied to simulated data with different temporal and noise features. As an example we will show how this method can be used to improve interpretability of estimates from an independent component analysis (ICA) analysis of fMRI data. We finally demonstrate its use on real fMRI data in one slice of the brain.
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.
Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
Daniel Goodman’s empirical approach to Bayesian statistics
Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina
2016-01-01
Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.
Numerical study on the sequential Bayesian approach for radioactive materials detection
NASA Astrophysics Data System (ADS)
Qingpei, Xiang; Dongfeng, Tian; Jianyu, Zhu; Fanhua, Hao; Ge, Ding; Jun, Zeng
2013-01-01
A new detection method, based on the sequential Bayesian approach proposed by Candy et al., offers new horizons for the research of radioactive detection. Compared with the commonly adopted detection methods incorporated with statistical theory, the sequential Bayesian approach offers the advantages of shorter verification time during the analysis of spectra that contain low total counts, especially in complex radionuclide components. In this paper, a simulation experiment platform implanted with the methodology of sequential Bayesian approach was developed. Events sequences of γ-rays associating with the true parameters of a LaBr3(Ce) detector were obtained based on an events sequence generator using Monte Carlo sampling theory to study the performance of the sequential Bayesian approach. The numerical experimental results are in accordance with those of Candy. Moreover, the relationship between the detection model and the event generator, respectively represented by the expected detection rate (Am) and the tested detection rate (Gm) parameters, is investigated. To achieve an optimal performance for this processor, the interval of the tested detection rate as a function of the expected detection rate is also presented.
Prior approval: the growth of Bayesian methods in psychology.
Andrews, Mark; Baguley, Thom
2013-02-01
Within the last few years, Bayesian methods of data analysis in psychology have proliferated. In this paper, we briefly review the history or the Bayesian approach to statistics, and consider the implications that Bayesian methods have for the theory and practice of data analysis in psychology.
2017-09-01
efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
ERIC Educational Resources Information Center
Wang, Lijuan; McArdle, John J.
2008-01-01
The main purpose of this research is to evaluate the performance of a Bayesian approach for estimating unknown change points using Monte Carlo simulations. The univariate and bivariate unknown change point mixed models were presented and the basic idea of the Bayesian approach for estimating the models was discussed. The performance of Bayesian…
Bayesian networks improve causal environmental ...
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on value
MUMPS Based Integration of Disparate Computer-Assisted Medical Diagnosis Modules
1989-12-12
modules use a Bayesian approach, while the Opthalmology module uses a Rule Based approach. In the current effort, MUMPS is used to develop an...Abdominal and Chest Pain modules use a Bayesian approach, while the Opthalmology module uses a Rule Based approach. In the current effort, MUMPS is used
Nonparametric model validations for hidden Markov models with applications in financial econometrics
Zhao, Zhibiao
2011-01-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise. PMID:21750601
Bayesian Networks Improve Causal Environmental Assessments for Evidence-Based Policy.
Carriger, John F; Barron, Mace G; Newman, Michael C
2016-12-20
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on valued ecological resources. These aspects are demonstrated through hypothetical problem scenarios that explore some major benefits of using Bayesian networks for reasoning and making inferences in evidence-based policy.
Bivariate discrete beta Kernel graduation of mortality data.
Mazza, Angelo; Punzo, Antonio
2015-07-01
Various parametric/nonparametric techniques have been proposed in literature to graduate mortality data as a function of age. Nonparametric approaches, as for example kernel smoothing regression, are often preferred because they do not assume any particular mortality law. Among the existing kernel smoothing approaches, the recently proposed (univariate) discrete beta kernel smoother has been shown to provide some benefits. Bivariate graduation, over age and calendar years or durations, is common practice in demography and actuarial sciences. In this paper, we generalize the discrete beta kernel smoother to the bivariate case, and we introduce an adaptive bandwidth variant that may provide additional benefits when data on exposures to the risk of death are available; furthermore, we outline a cross-validation procedure for bandwidths selection. Using simulations studies, we compare the bivariate approach proposed here with its corresponding univariate formulation and with two popular nonparametric bivariate graduation techniques, based on Epanechnikov kernels and on P-splines. To make simulations realistic, a bivariate dataset, based on probabilities of dying recorded for the US males, is used. Simulations have confirmed the gain in performance of the new bivariate approach with respect to both the univariate and the bivariate competitors.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Dynamic Bayesian network modeling for longitudinal brain morphometry
Chen, Rong; Resnick, Susan M; Davatzikos, Christos; Herskovits, Edward H
2011-01-01
Identifying interactions among brain regions from structural magnetic-resonance images presents one of the major challenges in computational neuroanatomy. We propose a Bayesian data-mining approach to the detection of longitudinal morphological changes in the human brain. Our method uses a dynamic Bayesian network to represent evolving inter-regional dependencies. The major advantage of dynamic Bayesian network modeling is that it can represent complicated interactions among temporal processes. We validated our approach by analyzing a simulated atrophy study, and found that this approach requires only a small number of samples to detect the ground-truth temporal model. We further applied dynamic Bayesian network modeling to a longitudinal study of normal aging and mild cognitive impairment — the Baltimore Longitudinal Study of Aging. We found that interactions among regional volume-change rates for the mild cognitive impairment group are different from those for the normal-aging group. PMID:21963916
Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen
2017-09-25
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
Nonparametric regression applied to quantitative structure-activity relationships
Constans; Hirst
2000-03-01
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
A Bayesian Approach to More Stable Estimates of Group-Level Effects in Contextual Studies.
Zitzmann, Steffen; Lüdtke, Oliver; Robitzsch, Alexander
2015-01-01
Multilevel analyses are often used to estimate the effects of group-level constructs. However, when using aggregated individual data (e.g., student ratings) to assess a group-level construct (e.g., classroom climate), the observed group mean might not provide a reliable measure of the unobserved latent group mean. In the present article, we propose a Bayesian approach that can be used to estimate a multilevel latent covariate model, which corrects for the unreliable assessment of the latent group mean when estimating the group-level effect. A simulation study was conducted to evaluate the choice of different priors for the group-level variance of the predictor variable and to compare the Bayesian approach with the maximum likelihood approach implemented in the software Mplus. Results showed that, under problematic conditions (i.e., small number of groups, predictor variable with a small ICC), the Bayesian approach produced more accurate estimates of the group-level effect than the maximum likelihood approach did.
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors
Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world’s deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014. PMID:28257437
CytoBayesJ: software tools for Bayesian analysis of cytogenetic radiation dosimetry data.
Ainsbury, Elizabeth A; Vinnikov, Volodymyr; Puig, Pedro; Maznyk, Nataliya; Rothkamm, Kai; Lloyd, David C
2013-08-30
A number of authors have suggested that a Bayesian approach may be most appropriate for analysis of cytogenetic radiation dosimetry data. In the Bayesian framework, probability of an event is described in terms of previous expectations and uncertainty. Previously existing, or prior, information is used in combination with experimental results to infer probabilities or the likelihood that a hypothesis is true. It has been shown that the Bayesian approach increases both the accuracy and quality assurance of radiation dose estimates. New software entitled CytoBayesJ has been developed with the aim of bringing Bayesian analysis to cytogenetic biodosimetry laboratory practice. CytoBayesJ takes a number of Bayesian or 'Bayesian like' methods that have been proposed in the literature and presents them to the user in the form of simple user-friendly tools, including testing for the most appropriate model for distribution of chromosome aberrations and calculations of posterior probability distributions. The individual tools are described in detail and relevant examples of the use of the methods and the corresponding CytoBayesJ software tools are given. In this way, the suitability of the Bayesian approach to biological radiation dosimetry is highlighted and its wider application encouraged by providing a user-friendly software interface and manual in English and Russian. Copyright © 2013 Elsevier B.V. All rights reserved.
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
Fienen, Michael N.; D'Oria, Marco; Doherty, John E.; Hunt, Randall J.
2013-01-01
The application bgaPEST is a highly parameterized inversion software package implementing the Bayesian Geostatistical Approach in a framework compatible with the parameter estimation suite PEST. Highly parameterized inversion refers to cases in which parameters are distributed in space or time and are correlated with one another. The Bayesian aspect of bgaPEST is related to Bayesian probability theory in which prior information about parameters is formally revised on the basis of the calibration dataset used for the inversion. Conceptually, this approach formalizes the conditionality of estimated parameters on the specific data and model available. The geostatistical component of the method refers to the way in which prior information about the parameters is used. A geostatistical autocorrelation function is used to enforce structure on the parameters to avoid overfitting and unrealistic results. Bayesian Geostatistical Approach is designed to provide the smoothest solution that is consistent with the data. Optionally, users can specify a level of fit or estimate a balance between fit and model complexity informed by the data. Groundwater and surface-water applications are used as examples in this text, but the possible uses of bgaPEST extend to any distributed parameter applications.
The Bayesian approach to reporting GSR analysis results: some first-hand experiences
NASA Astrophysics Data System (ADS)
Charles, Sebastien; Nys, Bart
2010-06-01
The use of Bayesian principles in the reporting of forensic findings has been a matter of interest for some years. Recently, also the GSR community is gradually exploring the advantages of this method, or rather approach, for writing reports. Since last year, our GSR group is adapting reporting procedures to the use of Bayesian principles. The police and magistrates find the reports more directly accessible and useful in their part of the criminal investigation. In the lab we find that, through applying the Bayesian principles, unnecessary analyses can be eliminated and thus time can be freed on the instruments.
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates from a high dimensional set of psychological measurements. PMID:25431517
NASA Astrophysics Data System (ADS)
Berliner, M.
2017-12-01
Bayesian statistical decision theory offers a natural framework for decision-policy making in the presence of uncertainty. Key advantages of the approach include efficient incorporation of information and observations. However, in complicated settings it is very difficult, perhaps essentially impossible, to formalize the mathematical inputs needed in the approach. Nevertheless, using the approach as a template is useful for decision support; that is, organizing and communicating our analyses. Bayesian hierarchical modeling is valuable in quantifying and managing uncertainty such cases. I review some aspects of the idea emphasizing statistical model development and use in the context of sea-level rise.
Is probabilistic bias analysis approximately Bayesian?
MacLehose, Richard F.; Gustafson, Paul
2011-01-01
Case-control studies are particularly susceptible to differential exposure misclassification when exposure status is determined following incident case status. Probabilistic bias analysis methods have been developed as ways to adjust standard effect estimates based on the sensitivity and specificity of exposure misclassification. The iterative sampling method advocated in probabilistic bias analysis bears a distinct resemblance to a Bayesian adjustment; however, it is not identical. Furthermore, without a formal theoretical framework (Bayesian or frequentist), the results of a probabilistic bias analysis remain somewhat difficult to interpret. We describe, both theoretically and empirically, the extent to which probabilistic bias analysis can be viewed as approximately Bayesian. While the differences between probabilistic bias analysis and Bayesian approaches to misclassification can be substantial, these situations often involve unrealistic prior specifications and are relatively easy to detect. Outside of these special cases, probabilistic bias analysis and Bayesian approaches to exposure misclassification in case-control studies appear to perform equally well. PMID:22157311
Molitor, John
2012-03-01
Bayesian methods have seen an increase in popularity in a wide variety of scientific fields, including epidemiology. One of the main reasons for their widespread application is the power of the Markov chain Monte Carlo (MCMC) techniques generally used to fit these models. As a result, researchers often implicitly associate Bayesian models with MCMC estimation procedures. However, Bayesian models do not always require Markov-chain-based methods for parameter estimation. This is important, as MCMC estimation methods, while generally quite powerful, are complex and computationally expensive and suffer from convergence problems related to the manner in which they generate correlated samples used to estimate probability distributions for parameters of interest. In this issue of the Journal, Cole et al. (Am J Epidemiol. 2012;175(5):368-375) present an interesting paper that discusses non-Markov-chain-based approaches to fitting Bayesian models. These methods, though limited, can overcome some of the problems associated with MCMC techniques and promise to provide simpler approaches to fitting Bayesian models. Applied researchers will find these estimation approaches intuitively appealing and will gain a deeper understanding of Bayesian models through their use. However, readers should be aware that other non-Markov-chain-based methods are currently in active development and have been widely published in other fields.
The Psychology of Bayesian Reasoning
2014-10-21
The psychology of Bayesian reasoning David R. Mandel* Socio-Cognitive Systems Section, Defence Research and Development Canada and Department...belief revision, subjective probability, human judgment, psychological methods. Most psychological research on Bayesian reasoning since the 1970s has...attention to some important problems with the conventional approach to studying Bayesian reasoning in psychology that has been dominant since the
Bayesian Just-So Stories in Psychology and Neuroscience
ERIC Educational Resources Information Center
Bowers, Jeffrey S.; Davis, Colin J.
2012-01-01
According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak.…
Teaching Bayesian Statistics in a Health Research Methodology Program
ERIC Educational Resources Information Center
Pullenayegum, Eleanor M.; Thabane, Lehana
2009-01-01
Despite the appeal of Bayesian methods in health research, they are not widely used. This is partly due to a lack of courses in Bayesian methods at an appropriate level for non-statisticians in health research. Teaching such a course can be challenging because most statisticians have been taught Bayesian methods using a mathematical approach, and…
Model-free quantification of dynamic PET data using nonparametric deconvolution
Zanderigo, Francesca; Parsey, Ramin V; Todd Ogden, R
2015-01-01
Dynamic positron emission tomography (PET) data are usually quantified using compartment models (CMs) or derived graphical approaches. Often, however, CMs either do not properly describe the tracer kinetics, or are not identifiable, leading to nonphysiologic estimates of the tracer binding. The PET data are modeled as the convolution of the metabolite-corrected input function and the tracer impulse response function (IRF) in the tissue. Using nonparametric deconvolution methods, it is possible to obtain model-free estimates of the IRF, from which functionals related to tracer volume of distribution and binding may be computed, but this approach has rarely been applied in PET. Here, we apply nonparametric deconvolution using singular value decomposition to simulated and test–retest clinical PET data with four reversible tracers well characterized by CMs ([11C]CUMI-101, [11C]DASB, [11C]PE2I, and [11C]WAY-100635), and systematically compare reproducibility, reliability, and identifiability of various IRF-derived functionals with that of traditional CMs outcomes. Results show that nonparametric deconvolution, completely free of any model assumptions, allows for estimates of tracer volume of distribution and binding that are very close to the estimates obtained with CMs and, in some cases, show better test–retest performance than CMs outcomes. PMID:25873427
A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging
Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu
2017-01-01
This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583
Sparse Event Modeling with Hierarchical Bayesian Kernel Methods
2016-01-05
SECURITY CLASSIFICATION OF: The research objective of this proposal was to develop a predictive Bayesian kernel approach to model count data based on...several predictive variables. Such an approach, which we refer to as the Poisson Bayesian kernel model , is able to model the rate of occurrence of...which adds specificity to the model and can make nonlinear data more manageable. Early results show that the 1. REPORT DATE (DD-MM-YYYY) 4. TITLE
Identification of transmissivity fields using a Bayesian strategy and perturbative approach
NASA Astrophysics Data System (ADS)
Zanini, Andrea; Tanda, Maria Giovanna; Woodbury, Allan D.
2017-10-01
The paper deals with the crucial problem of the groundwater parameter estimation that is the basis for efficient modeling and reclamation activities. A hierarchical Bayesian approach is developed: it uses the Akaike's Bayesian Information Criteria in order to estimate the hyperparameters (related to the covariance model chosen) and to quantify the unknown noise variance. The transmissivity identification proceeds in two steps: the first, called empirical Bayesian interpolation, uses Y* (Y = lnT) observations to interpolate Y values on a specified grid; the second, called empirical Bayesian update, improve the previous Y estimate through the addition of hydraulic head observations. The relationship between the head and the lnT has been linearized through a perturbative solution of the flow equation. In order to test the proposed approach, synthetic aquifers from literature have been considered. The aquifers in question contain a variety of boundary conditions (both Dirichelet and Neuman type) and scales of heterogeneities (σY2 = 1.0 and σY2 = 5.3). The estimated transmissivity fields were compared to the true one. The joint use of Y* and head measurements improves the estimation of Y considering both degrees of heterogeneity. Even if the variance of the strong transmissivity field can be considered high for the application of the perturbative approach, the results show the same order of approximation of the non-linear methods proposed in literature. The procedure allows to compute the posterior probability distribution of the target quantities and to quantify the uncertainty in the model prediction. Bayesian updating has advantages related both to the Monte-Carlo (MC) and non-MC approaches. In fact, as the MC methods, Bayesian updating allows computing the direct posterior probability distribution of the target quantities and as non-MC methods it has computational times in the order of seconds.
A Bayesian Approach for Image Segmentation with Shape Priors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Hang; Yang, Qing; Parvin, Bahram
2008-06-20
Color and texture have been widely used in image segmentation; however, their performance is often hindered by scene ambiguities, overlapping objects, or missingparts. In this paper, we propose an interactive image segmentation approach with shape prior models within a Bayesian framework. Interactive features, through mouse strokes, reduce ambiguities, and the incorporation of shape priors enhances quality of the segmentation where color and/or texture are not solely adequate. The novelties of our approach are in (i) formulating the segmentation problem in a well-de?ned Bayesian framework with multiple shape priors, (ii) ef?ciently estimating parameters of the Bayesian model, and (iii) multi-object segmentationmore » through user-speci?ed priors. We demonstrate the effectiveness of our method on a set of natural and synthetic images.« less
Editorial: Bayesian benefits for child psychology and psychiatry researchers.
Oldehinkel, Albertine J
2016-09-01
For many scientists, performing statistical tests has become an almost automated routine. However, p-values are frequently used and interpreted incorrectly; and even when used appropriately, p-values tend to provide answers that do not match researchers' questions and hypotheses well. Bayesian statistics present an elegant and often more suitable alternative. The Bayesian approach has rarely been applied in child psychology and psychiatry research so far, but the development of user-friendly software packages and tutorials has placed it well within reach now. Because Bayesian analyses require a more refined definition of hypothesized probabilities of possible outcomes than the classical approach, going Bayesian may offer the additional benefit of sparkling the development and refinement of theoretical models in our field. © 2016 Association for Child and Adolescent Mental Health.
Bayesian randomized clinical trials: From fixed to adaptive design.
Yin, Guosheng; Lam, Chi Kin; Shi, Haolun
2017-08-01
Randomized controlled studies are the gold standard for phase III clinical trials. Using α-spending functions to control the overall type I error rate, group sequential methods are well established and have been dominating phase III studies. Bayesian randomized design, on the other hand, can be viewed as a complement instead of competitive approach to the frequentist methods. For the fixed Bayesian design, the hypothesis testing can be cast in the posterior probability or Bayes factor framework, which has a direct link to the frequentist type I error rate. Bayesian group sequential design relies upon Bayesian decision-theoretic approaches based on backward induction, which is often computationally intensive. Compared with the frequentist approaches, Bayesian methods have several advantages. The posterior predictive probability serves as a useful and convenient tool for trial monitoring, and can be updated at any time as the data accrue during the trial. The Bayesian decision-theoretic framework possesses a direct link to the decision making in the practical setting, and can be modeled more realistically to reflect the actual cost-benefit analysis during the drug development process. Other merits include the possibility of hierarchical modeling and the use of informative priors, which would lead to a more comprehensive utilization of information from both historical and longitudinal data. From fixed to adaptive design, we focus on Bayesian randomized controlled clinical trials and make extensive comparisons with frequentist counterparts through numerical studies. Copyright © 2017 Elsevier Inc. All rights reserved.
A local approach for focussed Bayesian fusion
NASA Astrophysics Data System (ADS)
Sander, Jennifer; Heizmann, Michael; Goussev, Igor; Beyerer, Jürgen
2009-04-01
Local Bayesian fusion approaches aim to reduce high storage and computational costs of Bayesian fusion which is separated from fixed modeling assumptions. Using the small world formalism, we argue why this proceeding is conform with Bayesian theory. Then, we concentrate on the realization of local Bayesian fusion by focussing the fusion process solely on local regions that are task relevant with a high probability. The resulting local models correspond then to restricted versions of the original one. In a previous publication, we used bounds for the probability of misleading evidence to show the validity of the pre-evaluation of task specific knowledge and prior information which we perform to build local models. In this paper, we prove the validity of this proceeding using information theoretic arguments. For additional efficiency, local Bayesian fusion can be realized in a distributed manner. Here, several local Bayesian fusion tasks are evaluated and unified after the actual fusion process. For the practical realization of distributed local Bayesian fusion, software agents are predestinated. There is a natural analogy between the resulting agent based architecture and criminal investigations in real life. We show how this analogy can be used to improve the efficiency of distributed local Bayesian fusion additionally. Using a landscape model, we present an experimental study of distributed local Bayesian fusion in the field of reconnaissance, which highlights its high potential.
ERIC Educational Resources Information Center
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2018-02-01
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
Zonta, Zivko J; Flotats, Xavier; Magrí, Albert
2014-08-01
The procedure commonly used for the assessment of the parameters included in activated sludge models (ASMs) relies on the estimation of their optimal value within a confidence region (i.e. frequentist inference). Once optimal values are estimated, parameter uncertainty is computed through the covariance matrix. However, alternative approaches based on the consideration of the model parameters as probability distributions (i.e. Bayesian inference), may be of interest. The aim of this work is to apply (and compare) both Bayesian and frequentist inference methods when assessing uncertainty for an ASM-type model, which considers intracellular storage and biomass growth, simultaneously. Practical identifiability was addressed exclusively considering respirometric profiles based on the oxygen uptake rate and with the aid of probabilistic global sensitivity analysis. Parameter uncertainty was thus estimated according to both the Bayesian and frequentist inferential procedures. Results were compared in order to evidence the strengths and weaknesses of both approaches. Since it was demonstrated that Bayesian inference could be reduced to a frequentist approach under particular hypotheses, the former can be considered as a more generalist methodology. Hence, the use of Bayesian inference is encouraged for tackling inferential issues in ASM environments.
O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A
2009-06-01
Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.
Logistic Stick-Breaking Process
Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.
2013-01-01
A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593
BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)
We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis
NASA Astrophysics Data System (ADS)
Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.
2013-12-01
In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
Probabilistic Damage Characterization Using the Computationally-Efficient Bayesian Approach
NASA Technical Reports Server (NTRS)
Warner, James E.; Hochhalter, Jacob D.
2016-01-01
This work presents a computationally-ecient approach for damage determination that quanti es uncertainty in the provided diagnosis. Given strain sensor data that are polluted with measurement errors, Bayesian inference is used to estimate the location, size, and orientation of damage. This approach uses Bayes' Theorem to combine any prior knowledge an analyst may have about the nature of the damage with information provided implicitly by the strain sensor data to form a posterior probability distribution over possible damage states. The unknown damage parameters are then estimated based on samples drawn numerically from this distribution using a Markov Chain Monte Carlo (MCMC) sampling algorithm. Several modi cations are made to the traditional Bayesian inference approach to provide signi cant computational speedup. First, an ecient surrogate model is constructed using sparse grid interpolation to replace a costly nite element model that must otherwise be evaluated for each sample drawn with MCMC. Next, the standard Bayesian posterior distribution is modi ed using a weighted likelihood formulation, which is shown to improve the convergence of the sampling process. Finally, a robust MCMC algorithm, Delayed Rejection Adaptive Metropolis (DRAM), is adopted to sample the probability distribution more eciently. Numerical examples demonstrate that the proposed framework e ectively provides damage estimates with uncertainty quanti cation and can yield orders of magnitude speedup over standard Bayesian approaches.
Le Bihan, Nicolas; Margerin, Ludovic
2009-07-01
In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.
A linear programming approach to characterizing norm bounded uncertainty from experimental data
NASA Technical Reports Server (NTRS)
Scheid, R. E.; Bayard, D. S.; Yam, Y.
1991-01-01
The linear programming spectral overbounding and factorization (LPSOF) algorithm, an algorithm for finding a minimum phase transfer function of specified order whose magnitude tightly overbounds a specified nonparametric function of frequency, is introduced. This method has direct application to transforming nonparametric uncertainty bounds (available from system identification experiments) into parametric representations required for modern robust control design software (i.e., a minimum-phase transfer function multiplied by a norm-bounded perturbation).
[Bayesian approach for the cost-effectiveness evaluation of healthcare technologies].
Berchialla, Paola; Gregori, Dario; Brunello, Franco; Veltri, Andrea; Petrinco, Michele; Pagano, Eva
2009-01-01
The development of Bayesian statistical methods for the assessment of the cost-effectiveness of health care technologies is reviewed. Although many studies adopt a frequentist approach, several authors have advocated the use of Bayesian methods in health economics. Emphasis has been placed on the advantages of the Bayesian approach, which include: (i) the ability to make more intuitive and meaningful inferences; (ii) the ability to tackle complex problems, such as allowing for the inclusion of patients who generate no cost, thanks to the availability of powerful computational algorithms; (iii) the importance of a full use of quantitative and structural prior information to produce realistic inferences. Much literature comparing the cost-effectiveness of two treatments is based on the incremental cost-effectiveness ratio. However, new methods are arising with the purpose of decision making. These methods are based on a net benefits approach. In the present context, the cost-effectiveness acceptability curves have been pointed out to be intrinsically Bayesian in their formulation. They plot the probability of a positive net benefit against the threshold cost of a unit increase in efficacy.A case study is presented in order to illustrate the Bayesian statistics in the cost-effectiveness analysis. Emphasis is placed on the cost-effectiveness acceptability curves. Advantages and disadvantages of the method described in this paper have been compared to frequentist methods and discussed.
Wu, Wei Mo; Wang, Jia Qiang; Cao, Qi; Wu, Jia Ping
2017-02-01
Accurate prediction of soil organic carbon (SOC) distribution is crucial for soil resources utilization and conservation, climate change adaptation, and ecosystem health. In this study, we selected a 1300 m×1700 m solonchak sampling area in northern Tarim Basin, Xinjiang, China, and collected a total of 144 soil samples (5-10 cm). The objectives of this study were to build a Baye-sian geostatistical model to predict SOC content, and to assess the performance of the Bayesian model for the prediction of SOC content by comparing with other three geostatistical approaches [ordinary kriging (OK), sequential Gaussian simulation (SGS), and inverse distance weighting (IDW)]. In the study area, soil organic carbon contents ranged from 1.59 to 9.30 g·kg -1 with a mean of 4.36 g·kg -1 and a standard deviation of 1.62 g·kg -1 . Sample semivariogram was best fitted by an exponential model with the ratio of nugget to sill being 0.57. By using the Bayesian geostatistical approach, we generated the SOC content map, and obtained the prediction variance, upper 95% and lower 95% of SOC contents, which were then used to evaluate the prediction uncertainty. Bayesian geostatistical approach performed better than that of the OK, SGS and IDW, demonstrating the advantages of Bayesian approach in SOC prediction.
Kerschbamer, Rudolf
2015-05-01
This paper proposes a geometric delineation of distributional preference types and a non-parametric approach for their identification in a two-person context. It starts with a small set of assumptions on preferences and shows that this set (i) naturally results in a taxonomy of distributional archetypes that nests all empirically relevant types considered in previous work; and (ii) gives rise to a clean experimental identification procedure - the Equality Equivalence Test - that discriminates between archetypes according to core features of preferences rather than properties of specific modeling variants. As a by-product the test yields a two-dimensional index of preference intensity.
Krafty, Robert T; Rosen, Ori; Stoffer, David S; Buysse, Daniel J; Hall, Martica H
2017-01-01
This article considers the problem of analyzing associations between power spectra of multiple time series and cross-sectional outcomes when data are observed from multiple subjects. The motivating application comes from sleep medicine, where researchers are able to non-invasively record physiological time series signals during sleep. The frequency patterns of these signals, which can be quantified through the power spectrum, contain interpretable information about biological processes. An important problem in sleep research is drawing connections between power spectra of time series signals and clinical characteristics; these connections are key to understanding biological pathways through which sleep affects, and can be treated to improve, health. Such analyses are challenging as they must overcome the complicated structure of a power spectrum from multiple time series as a complex positive-definite matrix-valued function. This article proposes a new approach to such analyses based on a tensor-product spline model of Cholesky components of outcome-dependent power spectra. The approach exibly models power spectra as nonparametric functions of frequency and outcome while preserving geometric constraints. Formulated in a fully Bayesian framework, a Whittle likelihood based Markov chain Monte Carlo (MCMC) algorithm is developed for automated model fitting and for conducting inference on associations between outcomes and spectral measures. The method is used to analyze data from a study of sleep in older adults and uncovers new insights into how stress and arousal are connected to the amount of time one spends in bed.
Review of Reliability-Based Design Optimization Approach and Its Integration with Bayesian Method
NASA Astrophysics Data System (ADS)
Zhang, Xiangnan
2018-03-01
A lot of uncertain factors lie in practical engineering, such as external load environment, material property, geometrical shape, initial condition, boundary condition, etc. Reliability method measures the structural safety condition and determine the optimal design parameter combination based on the probabilistic theory. Reliability-based design optimization (RBDO) is the most commonly used approach to minimize the structural cost or other performance under uncertainty variables which combines the reliability theory and optimization. However, it cannot handle the various incomplete information. The Bayesian approach is utilized to incorporate this kind of incomplete information in its uncertainty quantification. In this paper, the RBDO approach and its integration with Bayesian method are introduced.
Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.; ...
2017-07-11
Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.
Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
Revealing transient strain in geodetic data with Gaussian process regression
NASA Astrophysics Data System (ADS)
Hines, T. T.; Hetland, E. A.
2018-03-01
Transient strain derived from global navigation satellite system (GNSS) data can be used to detect and understand geophysical processes such as slow slip events and post-seismic deformation. Here we propose using Gaussian process regression (GPR) as a tool for estimating transient strain from GNSS data. GPR is a non-parametric, Bayesian method for interpolating scattered data. In our approach, we assume a stochastic prior model for transient displacements. The prior describes how much we expect transient displacements to covary spatially and temporally. A posterior estimate of transient strain is obtained by differentiating the posterior transient displacements, which are formed by conditioning the prior with the GNSS data. As a demonstration, we use GPR to detect transient strain resulting from slow slip events in the Pacific Northwest. Maximum likelihood methods are used to constrain a prior model for transient displacements in this region. The temporal covariance of our prior model is described by a compact Wendland covariance function, which significantly reduces the computational burden that can be associated with GPR. Our results reveal the spatial and temporal evolution of strain from slow slip events. We verify that the transient strain estimated with GPR is in fact geophysical signal by comparing it to the seismic tremor that is associated with Pacific Northwest slow slip events.
Rational approximations to rational models: alternative algorithms for category learning.
Sanborn, Adam N; Griffiths, Thomas L; Navarro, Daniel J
2010-10-01
Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible for behavior. A basic challenge for rational models is thus explaining how optimal solutions can be approximated by psychological processes. We outline a general strategy for answering this question, namely to explore the psychological plausibility of approximation algorithms developed in computer science and statistics. In particular, we argue that Monte Carlo methods provide a source of rational process models that connect optimal solutions to psychological processes. We support this argument through a detailed example, applying this approach to Anderson's (1990, 1991) rational model of categorization (RMC), which involves a particularly challenging computational problem. Drawing on a connection between the RMC and ideas from nonparametric Bayesian statistics, we propose 2 alternative algorithms for approximate inference in this model. The algorithms we consider include Gibbs sampling, a procedure appropriate when all stimuli are presented simultaneously, and particle filters, which sequentially approximate the posterior distribution with a small number of samples that are updated as new data become available. Applying these algorithms to several existing datasets shows that a particle filter with a single particle provides a good description of human inferences.
Occupancy mapping and surface reconstruction using local Gaussian processes with Kinect sensors.
Kim, Soohwan; Kim, Jonghyuk
2013-10-01
Although RGB-D sensors have been successfully applied to visual SLAM and surface reconstruction, most of the applications aim at visualization. In this paper, we propose a noble method of building continuous occupancy maps and reconstructing surfaces in a single framework for both navigation and visualization. Particularly, we apply a Bayesian nonparametric approach, Gaussian process classification, to occupancy mapping. However, it suffers from high-computational complexity of O(n(3))+O(n(2)m), where n and m are the numbers of training and test data, respectively, limiting its use for large-scale mapping with huge training data, which is common with high-resolution RGB-D sensors. Therefore, we partition both training and test data with a coarse-to-fine clustering method and apply Gaussian processes to each local clusters. In addition, we consider Gaussian processes as implicit functions, and thus extract iso-surfaces from the scalar fields, continuous occupancy maps, using marching cubes. By doing that, we are able to build two types of map representations within a single framework of Gaussian processes. Experimental results with 2-D simulated data show that the accuracy of our approximated method is comparable to previous work, while the computational time is dramatically reduced. We also demonstrate our method with 3-D real data to show its feasibility in large-scale environments.
NASA Astrophysics Data System (ADS)
Liao, Meng; To, Quy-Dong; Léonard, Céline; Monchiet, Vincent
2018-03-01
In this paper, we use the molecular dynamics simulation method to study gas-wall boundary conditions. Discrete scattering information of gas molecules at the wall surface is obtained from collision simulations. The collision data can be used to identify the accommodation coefficients for parametric wall models such as Maxwell and Cercignani-Lampis scattering kernels. Since these scattering kernels are based on a limited number of accommodation coefficients, we adopt non-parametric statistical methods to construct the kernel to overcome these issues. Different from parametric kernels, the non-parametric kernels require no parameter (i.e. accommodation coefficients) and no predefined distribution. We also propose approaches to derive directly the Navier friction and Kapitza thermal resistance coefficients as well as other interface coefficients associated with moment equations from the non-parametric kernels. The methods are applied successfully to systems composed of CH4 or CO2 and graphite, which are of interest to the petroleum industry.
Nonparametric Simulation of Signal Transduction Networks with Semi-Synchronized Update
Nassiri, Isar; Masoudi-Nejad, Ali; Jalili, Mahdi; Moeini, Ali
2012-01-01
Simulating signal transduction in cellular signaling networks provides predictions of network dynamics by quantifying the changes in concentration and activity-level of the individual proteins. Since numerical values of kinetic parameters might be difficult to obtain, it is imperative to develop non-parametric approaches that combine the connectivity of a network with the response of individual proteins to signals which travel through the network. The activity levels of signaling proteins computed through existing non-parametric modeling tools do not show significant correlations with the observed values in experimental results. In this work we developed a non-parametric computational framework to describe the profile of the evolving process and the time course of the proportion of active form of molecules in the signal transduction networks. The model is also capable of incorporating perturbations. The model was validated on four signaling networks showing that it can effectively uncover the activity levels and trends of response during signal transduction process. PMID:22737250
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
NASA Astrophysics Data System (ADS)
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2014-11-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud masks were designed to be numerically efficient and suited for the processing of large amounts of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient amounts of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
Bayesian cloud detection for MERIS, AATSR, and their combination
NASA Astrophysics Data System (ADS)
Hollstein, A.; Fischer, J.; Carbajal Henken, C.; Preusker, R.
2015-04-01
A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.
A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers
Puig, X; Ginebra, J; Graffelman, J
2017-01-01
The X chromosome is a relatively large chromosome, harboring a lot of genetic information. Much of the statistical analysis of X-chromosomal information is complicated by the fact that males only have one copy. Recently, frequentist statistical tests for Hardy–Weinberg equilibrium have been proposed specifically for dealing with markers on the X chromosome. Bayesian test procedures for Hardy–Weinberg equilibrium for the autosomes have been described, but Bayesian work on the X chromosome in this context is lacking. This paper gives the first Bayesian approach for testing Hardy–Weinberg equilibrium with biallelic markers at the X chromosome. Marginal and joint posterior distributions for the inbreeding coefficient in females and the male to female allele frequency ratio are computed, and used for statistical inference. The paper gives a detailed account of the proposed Bayesian test, and illustrates it with data from the 1000 Genomes project. In that implementation, a novel approach to tackle multiple testing from a Bayesian perspective through posterior predictive checks is used. PMID:28900292
Bayesian demography 250 years after Bayes
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods
ERIC Educational Resources Information Center
Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich
2013-01-01
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…
Using Bayesian analysis in repeated preclinical in vivo studies for a more effective use of animals.
Walley, Rosalind; Sherington, John; Rastrick, Joe; Detrait, Eric; Hanon, Etienne; Watt, Gillian
2016-05-01
Whilst innovative Bayesian approaches are increasingly used in clinical studies, in the preclinical area Bayesian methods appear to be rarely used in the reporting of pharmacology data. This is particularly surprising in the context of regularly repeated in vivo studies where there is a considerable amount of data from historical control groups, which has potential value. This paper describes our experience with introducing Bayesian analysis for such studies using a Bayesian meta-analytic predictive approach. This leads naturally either to an informative prior for a control group as part of a full Bayesian analysis of the next study or using a predictive distribution to replace a control group entirely. We use quality control charts to illustrate study-to-study variation to the scientists and describe informative priors in terms of their approximate effective numbers of animals. We describe two case studies of animal models: the lipopolysaccharide-induced cytokine release model used in inflammation and the novel object recognition model used to screen cognitive enhancers, both of which show the advantage of a Bayesian approach over the standard frequentist analysis. We conclude that using Bayesian methods in stable repeated in vivo studies can result in a more effective use of animals, either by reducing the total number of animals used or by increasing the precision of key treatment differences. This will lead to clearer results and supports the "3Rs initiative" to Refine, Reduce and Replace animals in research. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Bayesian estimates of the incidence of rare cancers in Europe.
Botta, Laura; Capocaccia, Riccardo; Trama, Annalisa; Herrmann, Christian; Salmerón, Diego; De Angelis, Roberta; Mallone, Sandra; Bidoli, Ettore; Marcos-Gragera, Rafael; Dudek-Godeau, Dorota; Gatta, Gemma; Cleries, Ramon
2018-04-21
The RARECAREnet project has updated the estimates of the burden of the 198 rare cancers in each European country. Suspecting that scant data could affect the reliability of statistical analysis, we employed a Bayesian approach to estimate the incidence of these cancers. We analyzed about 2,000,000 rare cancers diagnosed in 2000-2007 provided by 83 population-based cancer registries from 27 European countries. We considered European incidence rates (IRs), calculated over all the data available in RARECAREnet, as a valid a priori to merge with country-specific observed data. Therefore we provided (1) Bayesian estimates of IRs and the yearly numbers of cases of rare cancers in each country; (2) the expected time (T) in years needed to observe one new case; and (3) practical criteria to decide when to use the Bayesian approach. Bayesian and classical estimates did not differ much; substantial differences (>10%) ranged from 77 rare cancers in Iceland to 14 in England. The smaller the population the larger the number of rare cancers needing a Bayesian approach. Bayesian estimates were useful for cancers with fewer than 150 observed cases in a country during the study period; this occurred mostly when the population of the country is small. For the first time the Bayesian estimates of IRs and the yearly expected numbers of cases for each rare cancer in each individual European country were calculated. Moreover, the indicator T is useful to convey incidence estimates for exceptionally rare cancers and in small countries; it far exceeds the professional lifespan of a medical doctor. Copyright © 2018 Elsevier Ltd. All rights reserved.
Embedding the results of focussed Bayesian fusion into a global context
NASA Astrophysics Data System (ADS)
Sander, Jennifer; Heizmann, Michael
2014-05-01
Bayesian statistics offers a well-founded and powerful fusion methodology also for the fusion of heterogeneous information sources. However, except in special cases, the needed posterior distribution is not analytically derivable. As consequence, Bayesian fusion may cause unacceptably high computational and storage costs in practice. Local Bayesian fusion approaches aim at reducing the complexity of the Bayesian fusion methodology significantly. This is done by concentrating the actual Bayesian fusion on the potentially most task relevant parts of the domain of the Properties of Interest. Our research on these approaches is motivated by an analogy to criminal investigations where criminalists pursue clues also only locally. This publication follows previous publications on a special local Bayesian fusion technique called focussed Bayesian fusion. Here, the actual calculation of the posterior distribution gets completely restricted to a suitably chosen local context. By this, the global posterior distribution is not completely determined. Strategies for using the results of a focussed Bayesian analysis appropriately are needed. In this publication, we primarily contrast different ways of embedding the results of focussed Bayesian fusion explicitly into a global context. To obtain a unique global posterior distribution, we analyze the application of the Maximum Entropy Principle that has been shown to be successfully applicable in metrology and in different other areas. To address the special need for making further decisions subsequently to the actual fusion task, we further analyze criteria for decision making under partial information.
Jiang, Zhehan; Skorupski, William
2017-12-12
In many behavioral research areas, multivariate generalizability theory (mG theory) has been typically used to investigate the reliability of certain multidimensional assessments. However, traditional mG-theory estimation-namely, using frequentist approaches-has limits, leading researchers to fail to take full advantage of the information that mG theory can offer regarding the reliability of measurements. Alternatively, Bayesian methods provide more information than frequentist approaches can offer. This article presents instructional guidelines on how to implement mG-theory analyses in a Bayesian framework; in particular, BUGS code is presented to fit commonly seen designs from mG theory, including single-facet designs, two-facet crossed designs, and two-facet nested designs. In addition to concrete examples that are closely related to the selected designs and the corresponding BUGS code, a simulated dataset is provided to demonstrate the utility and advantages of the Bayesian approach. This article is intended to serve as a tutorial reference for applied researchers and methodologists conducting mG-theory studies.
Bayesian-information-gap decision theory with an application to CO 2 sequestration
O'Malley, D.; Vesselinov, V. V.
2015-09-04
Decisions related to subsurface engineering problems such as groundwater management, fossil fuel production, and geologic carbon sequestration are frequently challenging because of an overabundance of uncertainties (related to conceptualizations, parameters, observations, etc.). Because of the importance of these problems to agriculture, energy, and the climate (respectively), good decisions that are scientifically defensible must be made despite the uncertainties. We describe a general approach to making decisions for challenging problems such as these in the presence of severe uncertainties that combines probabilistic and non-probabilistic methods. The approach uses Bayesian sampling to assess parametric uncertainty and Information-Gap Decision Theory (IGDT) to addressmore » model inadequacy. The combined approach also resolves an issue that frequently arises when applying Bayesian methods to real-world engineering problems related to the enumeration of possible outcomes. In the case of zero non-probabilistic uncertainty, the method reduces to a Bayesian method. Lastly, to illustrate the approach, we apply it to a site-selection decision for geologic CO 2 sequestration.« less
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
The Bayesian Revolution Approaches Psychological Development
ERIC Educational Resources Information Center
Shultz, Thomas R.
2007-01-01
This commentary reviews five articles that apply Bayesian ideas to psychological development, some with psychology experiments, some with computational modeling, and some with both experiments and modeling. The reviewed work extends the current Bayesian revolution into tasks often studied in children, such as causal learning and word learning, and…
An improved approximate-Bayesian model-choice method for estimating shared evolutionary history
2014-01-01
Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
NASA Technical Reports Server (NTRS)
Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James
2013-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks [Scargle 1998]-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piece- wise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.
Does History Repeat Itself? Wavelets and the Phylodynamics of Influenza A
Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.
2012-01-01
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters. PMID:22160768
NASA Astrophysics Data System (ADS)
Yasmirullah, Septia Devi Prihastuti; Iriawan, Nur; Sipayung, Feronika Rosalinda
2017-11-01
The success of regional economic establishment could be measured by economic growth. Since the Act No. 32 of 2004 has been implemented, unbalance economic among the regency in Indonesia is increasing. This condition is contrary different with the government goal to build society welfare through the economic activity development in each region. This research aims to examine economic growth through the distribution of bank credits to each Indonesia's regency. The data analyzed in this research is hierarchically structured data which follow normal distribution in first level. Two modeling approaches are employed in this research, a global-one level Bayesian approach and two-level hierarchical Bayesian approach. The result shows that hierarchical Bayesian has succeeded to demonstrate a better estimation than a global-one level Bayesian. It proves that the different economic growth in each province is significantly influenced by the variations of micro level characteristics in each province. These variations are significantly affected by cities and province characteristics in second level.
Bayesian ensemble refinement by replica simulations and reweighting.
Hummer, Gerhard; Köfinger, Jürgen
2015-12-28
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Bayesian ensemble refinement by replica simulations and reweighting
NASA Astrophysics Data System (ADS)
Hummer, Gerhard; Köfinger, Jürgen
2015-12-01
We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss their practical application. With ensemble refinement, the properties of dynamic and partially disordered (bio)molecular structures can be characterized by integrating a wide range of experimental data, including measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the posterior is a functional that ranks different configuration space distributions. By maximizing this posterior, we derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is identical to that obtained by the maximum entropy "ensemble refinement of SAXS" (EROS) formulation. Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of the restraints should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian result in the limit of infinitely many replicas. In the "Bayesian inference of ensembles" method, we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single-molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting point for further investigations.
Diagnosis and Reconfiguration using Bayesian Networks: An Electrical Power System Case Study
NASA Technical Reports Server (NTRS)
Knox, W. Bradley; Mengshoel, Ole
2009-01-01
Automated diagnosis and reconfiguration are important computational techniques that aim to minimize human intervention in autonomous systems. In this paper, we develop novel techniques and models in the context of diagnosis and reconfiguration reasoning using causal Bayesian networks (BNs). We take as starting point a successful diagnostic approach, using a static BN developed for a real-world electrical power system. We discuss in this paper the extension of this diagnostic approach along two dimensions, namely: (i) from a static BN to a dynamic BN; and (ii) from a diagnostic task to a reconfiguration task. More specifically, we discuss the auto-generation of a dynamic Bayesian network from a static Bayesian network. In addition, we discuss subtle, but important, differences between Bayesian networks when used for diagnosis versus reconfiguration. We discuss a novel reconfiguration agent, which models a system causally, including effects of actions through time, using a dynamic Bayesian network. Though the techniques we discuss are general, we demonstrate them in the context of electrical power systems (EPSs) for aircraft and spacecraft. EPSs are vital subsystems on-board aircraft and spacecraft, and many incidents and accidents of these vehicles have been attributed to EPS failures. We discuss a case study that provides initial but promising results for our approach in the setting of electrical power systems.
Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.
Caillet, Pascal; Klemm, Sarah; Ducher, Michel; Aussem, Alexandre; Schott, Anne-Marie
2015-01-01
Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach. EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences. Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density. Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.
Bayesian Meta-Analysis of Coefficient Alpha
ERIC Educational Resources Information Center
Brannick, Michael T.; Zhang, Nanhua
2013-01-01
The current paper describes and illustrates a Bayesian approach to the meta-analysis of coefficient alpha. Alpha is the most commonly used estimate of the reliability or consistency (freedom from measurement error) for educational and psychological measures. The conventional approach to meta-analysis uses inverse variance weights to combine…
Toribo, S.G.; Gray, B.R.; Liang, S.
2011-01-01
The N-mixture model proposed by Royle in 2004 may be used to approximate the abundance and detection probability of animal species in a given region. In 2006, Royle and Dorazio discussed the advantages of using a Bayesian approach in modelling animal abundance and occurrence using a hierarchical N-mixture model. N-mixture models assume replication on sampling sites, an assumption that may be violated when the site is not closed to changes in abundance during the survey period or when nominal replicates are defined spatially. In this paper, we studied the robustness of a Bayesian approach to fitting the N-mixture model for pseudo-replicated count data. Our simulation results showed that the Bayesian estimates for abundance and detection probability are slightly biased when the actual detection probability is small and are sensitive to the presence of extra variability within local sites.
Spatiotemporal Bayesian analysis of Lyme disease in New York state, 1990-2000.
Chen, Haiyan; Stratton, Howard H; Caraco, Thomas B; White, Dennis J
2006-07-01
Mapping ordinarily increases our understanding of nontrivial spatial and temporal heterogeneities in disease rates. However, the large number of parameters required by the corresponding statistical models often complicates detailed analysis. This study investigates the feasibility of a fully Bayesian hierarchical regression approach to the problem and identifies how it outperforms two more popular methods: crude rate estimates (CRE) and empirical Bayes standardization (EBS). In particular, we apply a fully Bayesian approach to the spatiotemporal analysis of Lyme disease incidence in New York state for the period 1990-2000. These results are compared with those obtained by CRE and EBS in Chen et al. (2005). We show that the fully Bayesian regression model not only gives more reliable estimates of disease rates than the other two approaches but also allows for tractable models that can accommodate more numerous sources of variation and unknown parameters.
Shen, Yanna; Cooper, Gregory F
2012-09-01
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
ERIC Educational Resources Information Center
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Bayesian estimation of seasonal course of canopy leaf area index from hyperspectral satellite data
NASA Astrophysics Data System (ADS)
Varvia, Petri; Rautiainen, Miina; Seppänen, Aku
2018-03-01
In this paper, Bayesian inversion of a physically-based forest reflectance model is investigated to estimate of boreal forest canopy leaf area index (LAI) from EO-1 Hyperion hyperspectral data. The data consist of multiple forest stands with different species compositions and structures, imaged in three phases of the growing season. The Bayesian estimates of canopy LAI are compared to reference estimates based on a spectral vegetation index. The forest reflectance model contains also other unknown variables in addition to LAI, for example leaf single scattering albedo and understory reflectance. In the Bayesian approach, these variables are estimated simultaneously with LAI. The feasibility and seasonal variation of these estimates is also examined. Credible intervals for the estimates are also calculated and evaluated. The results show that the Bayesian inversion approach is significantly better than using a comparable spectral vegetation index regression.
Liu, Fang; Eugenio, Evercita C
2018-04-01
Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.
Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL)
NASA Astrophysics Data System (ADS)
Razali, Nazim; Mustapha, Aida; Yatim, Faiz Ahmad; Aziz, Ruhaya Ab
2017-08-01
The issues of modeling asscoiation football prediction model has become increasingly popular in the last few years and many different approaches of prediction models have been proposed with the point of evaluating the attributes that lead a football team to lose, draw or win the match. There are three types of approaches has been considered for predicting football matches results which include statistical approaches, machine learning approaches and Bayesian approaches. Lately, many studies regarding football prediction models has been produced using Bayesian approaches. This paper proposes a Bayesian Networks (BNs) to predict the results of football matches in term of home win (H), away win (A) and draw (D). The English Premier League (EPL) for three seasons of 2010-2011, 2011-2012 and 2012-2013 has been selected and reviewed. K-fold cross validation has been used for testing the accuracy of prediction model. The required information about the football data is sourced from a legitimate site at http://www.football-data.co.uk. BNs achieved predictive accuracy of 75.09% in average across three seasons. It is hoped that the results could be used as the benchmark output for future research in predicting football matches results.
A Bayesian approach to meta-analysis of plant pathology studies.
Mila, A L; Ngugi, H K
2011-01-01
Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework. Bayesian meta-analysis can readily include information not easily incorporated in classical methods, and allow for a full evaluation of competing models. Given the power and flexibility of Bayesian methods, we expect them to become widely adopted for meta-analysis of plant pathology studies.
A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets.
Carrig, Madeline M; Manrique-Vallier, Daniel; Ranby, Krista W; Reiter, Jerome P; Hoyle, Rick H
2015-01-01
Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches.
A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets
Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.
2015-01-01
Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437
Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)
2002-01-01
We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.
Fields, Andrew T; Fischer, Gunter A; Shea, Stanley K H; Zhang, Huarong; Abercrombie, Debra L; Feldheim, Kevin A; Babcock, Elizabeth A; Chapman, Demian D
2018-04-01
The shark fin trade is a major driver of shark exploitation in fisheries all over the world, most of which are not managed on a species-specific basis. Species-specific trade information highlights taxa of particular concern and can be used to assess the efficacy of management measures and anticipate emerging threats. The species composition of the Hong Kong Special Administrative Region of China, one of the world's largest fin trading hubs, was partially assessed in 1999-2001. We randomly selected and genetically identified fin trimmings (n = 4800), produced during fin processing, from the retail market of Hong Kong in 2014-2015 to assess contemporary species composition of the fin trade. We used nonparametric species estimators to determine that at least 76 species of sharks, batoids, and chimaeras supplied the fin trade and a Bayesian model to determine their relative proportion in the market. The diversity of traded species suggests species substitution could mask depletion of vulnerable species; one-third of identified species are threatened with extinction. The Bayesian model suggested that 8 species each comprised >1% of the fin trimmings (34.1-64.2% for blue [Prionace glauca], 0.2-1.2% for bull [Carcharhinus leucas] and shortfin mako [Isurus oxyrinchus]); thus, trade was skewed to a few globally distributed species. Several other coastal sharks, batoids, and chimaeras are in the trade but poorly managed. Fewer than 10 of the species we modeled have sustainably managed fisheries anywhere in their range, and the most common species in trade, the blue shark, was not among them. Our study and approach serve as a baseline to track changes in composition of species in the fin trade over time to better understand patterns of exploitation and assess the effects of emerging management actions for these animals. © 2017 The Authors. Conservation Biology published by Wiley Periodicals, Inc. on behalf of Society for Conservation Biology.
sourceR: Classification and source attribution of infectious agents among heterogeneous populations
French, Nigel
2017-01-01
Zoonotic diseases are a major cause of morbidity, and productivity losses in both human and animal populations. Identifying the source of food-borne zoonoses (e.g. an animal reservoir or food product) is crucial for the identification and prioritisation of food safety interventions. For many zoonotic diseases it is difficult to attribute human cases to sources of infection because there is little epidemiological information on the cases. However, microbial strain typing allows zoonotic pathogens to be categorised, and the relative frequencies of the strain types among the sources and in human cases allows inference on the likely source of each infection. We introduce sourceR, an R package for quantitative source attribution, aimed at food-borne diseases. It implements a Bayesian model using strain-typed surveillance data from both human cases and source samples, capable of identifying important sources of infection. The model measures the force of infection from each source, allowing for varying survivability, pathogenicity and virulence of pathogen strains, and varying abilities of the sources to act as vehicles of infection. A Bayesian non-parametric (Dirichlet process) approach is used to cluster pathogen strain types by epidemiological behaviour, avoiding model overfitting and allowing detection of strain types associated with potentially high “virulence”. sourceR is demonstrated using Campylobacter jejuni isolate data collected in New Zealand between 2005 and 2008. Chicken from a particular poultry supplier was identified as the major source of campylobacteriosis, which is qualitatively similar to results of previous studies using the same dataset. Additionally, the software identifies a cluster of 9 multilocus sequence types with abnormally high ‘virulence’ in humans. sourceR enables straightforward attribution of cases of zoonotic infection to putative sources of infection. As sourceR develops, we intend it to become an important and flexible resource for food-borne disease attribution studies. PMID:28558033
Sengupta Chattopadhyay, Amrita; Hsiao, Ching-Lin; Chang, Chien Ching; Lian, Ie-Bin; Fann, Cathy S J
2014-01-01
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions. © 2013 Elsevier B.V. All rights reserved.
A Bayesian Approach for Analyzing Longitudinal Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Lu, Zhao-Hua; Hser, Yih-Ing; Lee, Sik-Yum
2011-01-01
This article considers a Bayesian approach for analyzing a longitudinal 2-level nonlinear structural equation model with covariates, and mixed continuous and ordered categorical variables. The first-level model is formulated for measures taken at each time point nested within individuals for investigating their characteristics that are dynamically…
A Robust Bayesian Approach for Structural Equation Models with Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum; Xia, Ye-Mao
2008-01-01
In this paper, normal/independent distributions, including but not limited to the multivariate t distribution, the multivariate contaminated distribution, and the multivariate slash distribution, are used to develop a robust Bayesian approach for analyzing structural equation models with complete or missing data. In the context of a nonlinear…
USDA-ARS?s Scientific Manuscript database
Data assimilation and regression are two commonly used methods for predicting agricultural yield from remote sensing observations. Data assimilation is a generative approach because it requires explicit approximations of the Bayesian prior and likelihood to compute the probability density function...
Varughese, Eunice A.; Brinkman, Nichole E; Anneken, Emily M; Cashdollar, Jennifer S; Fout, G. Shay; Furlong, Edward T.; Kolpin, Dana W.; Glassmeyer, Susan T.; Keely, Scott P
2017-01-01
incorporated into a Bayesian model to more accurately determine viral load in both source and treated water. Results of the Bayesian model indicated that viruses are present in source water and treated water. By using a Bayesian framework that incorporates inhibition, as well as many other parameters that affect viral detection, this study offers an approach for more accurately estimating the occurrence of viral pathogens in environmental waters.
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection
NASA Technical Reports Server (NTRS)
Kumar, Sricharan; Srivistava, Ashok N.
2012-01-01
Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
Bayesian survival analysis in clinical trials: What methods are used in practice?
Brard, Caroline; Le Teuff, Gwénaël; Le Deley, Marie-Cécile; Hampson, Lisa V
2017-02-01
Background Bayesian statistics are an appealing alternative to the traditional frequentist approach to designing, analysing, and reporting of clinical trials, especially in rare diseases. Time-to-event endpoints are widely used in many medical fields. There are additional complexities to designing Bayesian survival trials which arise from the need to specify a model for the survival distribution. The objective of this article was to critically review the use and reporting of Bayesian methods in survival trials. Methods A systematic review of clinical trials using Bayesian survival analyses was performed through PubMed and Web of Science databases. This was complemented by a full text search of the online repositories of pre-selected journals. Cost-effectiveness, dose-finding studies, meta-analyses, and methodological papers using clinical trials were excluded. Results In total, 28 articles met the inclusion criteria, 25 were original reports of clinical trials and 3 were re-analyses of a clinical trial. Most trials were in oncology (n = 25), were randomised controlled (n = 21) phase III trials (n = 13), and half considered a rare disease (n = 13). Bayesian approaches were used for monitoring in 14 trials and for the final analysis only in 14 trials. In the latter case, Bayesian survival analyses were used for the primary analysis in four cases, for the secondary analysis in seven cases, and for the trial re-analysis in three cases. Overall, 12 articles reported fitting Bayesian regression models (semi-parametric, n = 3; parametric, n = 9). Prior distributions were often incompletely reported: 20 articles did not define the prior distribution used for the parameter of interest. Over half of the trials used only non-informative priors for monitoring and the final analysis (n = 12) when it was specified. Indeed, no articles fitting Bayesian regression models placed informative priors on the parameter of interest. The prior for the treatment effect was based on historical data in only four trials. Decision rules were pre-defined in eight cases when trials used Bayesian monitoring, and in only one case when trials adopted a Bayesian approach to the final analysis. Conclusion Few trials implemented a Bayesian survival analysis and few incorporated external data into priors. There is scope to improve the quality of reporting of Bayesian methods in survival trials. Extension of the Consolidated Standards of Reporting Trials statement for reporting Bayesian clinical trials is recommended.
Bayesian just-so stories in psychology and neuroscience.
Bowers, Jeffrey S; Davis, Colin J
2012-05-01
According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak. This weakness relates to the many arbitrary ways that priors, likelihoods, and utility functions can be altered in order to account for the data that are obtained, making the models unfalsifiable. It further relates to the fact that Bayesian theories are rarely better at predicting data compared with alternative (and simpler) non-Bayesian theories. Second, we show that the empirical evidence for Bayesian theories in neuroscience is weaker still. There are impressive mathematical analyses showing how populations of neurons could compute in a Bayesian manner but little or no evidence that they do. Third, we challenge the general scientific approach that characterizes Bayesian theorizing in cognitive science. A common premise is that theories in psychology should largely be constrained by a rational analysis of what the mind ought to do. We question this claim and argue that many of the important constraints come from biological, evolutionary, and processing (algorithmic) considerations that have no adaptive relevance to the problem per se. In our view, these factors have contributed to the development of many Bayesian "just so" stories in psychology and neuroscience; that is, mathematical analyses of cognition that can be used to explain almost any behavior as optimal. 2012 APA, all rights reserved.
NASA Technical Reports Server (NTRS)
Buntine, Wray
1991-01-01
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
The choice of sample size: a mixed Bayesian / frequentist approach.
Pezeshk, Hamid; Nematollahi, Nader; Maroufy, Vahed; Gittins, John
2009-04-01
Sample size computations are largely based on frequentist or classical methods. In the Bayesian approach the prior information on the unknown parameters is taken into account. In this work we consider a fully Bayesian approach to the sample size determination problem which was introduced by Grundy et al. and developed by Lindley. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. We then find the optimal sample size for the trial by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.
Bayesian Approaches to Imputation, Hypothesis Testing, and Parameter Estimation
ERIC Educational Resources Information Center
Ross, Steven J.; Mackey, Beth
2015-01-01
This chapter introduces three applications of Bayesian inference to common and novel issues in second language research. After a review of the critiques of conventional hypothesis testing, our focus centers on ways Bayesian inference can be used for dealing with missing data, for testing theory-driven substantive hypotheses without a default null…
Teaching Bayesian Statistics to Undergraduate Students through Debates
ERIC Educational Resources Information Center
Stewart, Sepideh; Stewart, Wayne
2014-01-01
This paper describes a lecturer's approach to teaching Bayesian statistics to students who were only exposed to the classical paradigm. The study shows how the lecturer extended himself by making use of ventriloquist dolls to grab hold of students' attention and embed important ideas in revealing the differences between the Bayesian and classical…
Testing students' e-learning via Facebook through Bayesian structural equation modeling.
Salarzadeh Jenatabadi, Hashem; Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students' intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods' results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated.
Testing students’ e-learning via Facebook through Bayesian structural equation modeling
Moghavvemi, Sedigheh; Wan Mohamed Radzi, Che Wan Jasimah Bt; Babashamsi, Parastoo; Arashi, Mohammad
2017-01-01
Learning is an intentional activity, with several factors affecting students’ intention to use new learning technology. Researchers have investigated technology acceptance in different contexts by developing various theories/models and testing them by a number of means. Although most theories/models developed have been examined through regression or structural equation modeling, Bayesian analysis offers more accurate data analysis results. To address this gap, the unified theory of acceptance and technology use in the context of e-learning via Facebook are re-examined in this study using Bayesian analysis. The data (S1 Data) were collected from 170 students enrolled in a business statistics course at University of Malaya, Malaysia, and tested with the maximum likelihood and Bayesian approaches. The difference between the two methods’ results indicates that performance expectancy and hedonic motivation are the strongest factors influencing the intention to use e-learning via Facebook. The Bayesian estimation model exhibited better data fit than the maximum likelihood estimator model. The results of the Bayesian and maximum likelihood estimator approaches are compared and the reasons for the result discrepancy are deliberated. PMID:28886019
Bayesian hierarchical model for large-scale covariance matrix estimation.
Zhu, Dongxiao; Hero, Alfred O
2007-12-01
Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.
Agarwal, Rahul; Chen, Zhe; Kloosterman, Fabian; Wilson, Matthew A; Sarma, Sridevi V
2016-07-01
Pyramidal neurons recorded from the rat hippocampus and entorhinal cortex, such as place and grid cells, have diverse receptive fields, which are either unimodal or multimodal. Spiking activity from these cells encodes information about the spatial position of a freely foraging rat. At fine timescales, a neuron's spike activity also depends significantly on its own spike history. However, due to limitations of current parametric modeling approaches, it remains a challenge to estimate complex, multimodal neuronal receptive fields while incorporating spike history dependence. Furthermore, efforts to decode the rat's trajectory in one- or two-dimensional space from hippocampal ensemble spiking activity have mainly focused on spike history-independent neuronal encoding models. In this letter, we address these two important issues by extending a recently introduced nonparametric neural encoding framework that allows modeling both complex spatial receptive fields and spike history dependencies. Using this extended nonparametric approach, we develop novel algorithms for decoding a rat's trajectory based on recordings of hippocampal place cells and entorhinal grid cells. Results show that both encoding and decoding models derived from our new method performed significantly better than state-of-the-art encoding and decoding models on 6 minutes of test data. In addition, our model's performance remains invariant to the apparent modality of the neuron's receptive field.
Carlsson, Kristin Cecilie; Hoem, Nils Ove; Glauser, Tracy; Vinks, Alexander A
2005-05-01
Population models can be important extensions of therapeutic drug monitoring (TDM), as they allow estimation of individual pharmacokinetic parameters based on a small number of measured drug concentrations. This study used a Bayesian approach to explore the utility of routinely collected and sparse TDM data (1 sample per patient) for carbamazepine (CBZ) monotherapy in developing a population pharmacokinetic (PPK) model for CBZ in pediatric patients that would allow prediction of CBZ concentrations for both immediate- and controlled-release formulations. Patient and TDM data were obtained from a pediatric neurology outpatient database. Data were analyzed using an iterative 2-stage Bayesian algorithm and a nonparametric adaptive grid algorithm. Models were compared by final log likelihood, mean error (ME) as a measure of bias, and root mean squared error (RMSE) as a measure of precision. Fifty-seven entries with data on CBZ monotherapy were identified from the database and used in the analysis (36 from males, 21 from females; mean [SD] age, 9.1 [4.4] years [range, 2-21 years]). Preliminary models estimating clearance (Cl) or the elimination rate constant (K(el)) gave good prediction of serum concentrations compared with measured serum concentrations, but estimates of Cl and K(el) were highly correlated with estimates of volume of distribution (V(d)). Different covariate models were then tested. The selected model had zero-order input and had age and body weight as covariates. Cl (L/h) was calculated as K(el) . V(d), where K(el) = [K(i) - (K(s) . age)] and V(d) = [V(i) + (V(s) . body weight)]. Median parameter estimates were V(i) (intercept) = 11.5 L (fixed); V(s) (slope) = 0.3957 L/kg (range, 0.01200-1.5730); K(i) (intercept) = 0.173 h(-1) (fixed); and K(s) (slope) = 0.004487 h(-1) . y(-1) (range, 0.0001800-0.02969). The fit was good for estimates of steady-state serum concentrations based on prior values (population median estimates) (R = 0.468; R(2) = 0.219) but was even better for predictions based on individual Bayesian posterior values (R(2) = 0.991), with little bias (ME = -0.079) and good precision (RMSE = 0.055). Based on the findings of this study, sparse TDM data can be used for PPK modeling of CBZ clearance in children with epilepsy, and these models can be used to predict Cl at steady state in pediatric patients. However, to estimate additional pharmacokinetic model parameters (eg, the absorption rate constant and V(d)), it would be necessary to combine sparse TDM data with additional well-timed samples. This would allow development of more informative PPK models that could be used as part of Bayesian dose-individualization strategies.
Efficient, adaptive estimation of two-dimensional firing rate surfaces via Gaussian process methods.
Rad, Kamiar Rahnama; Paninski, Liam
2010-01-01
Estimating two-dimensional firing rate maps is a common problem, arising in a number of contexts: the estimation of place fields in hippocampus, the analysis of temporally nonstationary tuning curves in sensory and motor areas, the estimation of firing rates following spike-triggered covariance analyses, etc. Here we introduce methods based on Gaussian process nonparametric Bayesian techniques for estimating these two-dimensional rate maps. These techniques offer a number of advantages: the estimates may be computed efficiently, come equipped with natural errorbars, adapt their smoothness automatically to the local density and informativeness of the observed data, and permit direct fitting of the model hyperparameters (e.g., the prior smoothness of the rate map) via maximum marginal likelihood. We illustrate the method's flexibility and performance on a variety of simulated and real data.
Confidence Intervals for Laboratory Sonic Boom Annoyance Tests
NASA Technical Reports Server (NTRS)
Rathsam, Jonathan; Christian, Andrew
2016-01-01
Commercial supersonic flight is currently forbidden over land because sonic booms have historically caused unacceptable annoyance levels in overflown communities. NASA is providing data and expertise to noise regulators as they consider relaxing the ban for future quiet supersonic aircraft. One deliverable NASA will provide is a predictive model for indoor annoyance to aid in setting an acceptable quiet sonic boom threshold. A laboratory study was conducted to determine how indoor vibrations caused by sonic booms affect annoyance judgments. The test method required finding the point of subjective equality (PSE) between sonic boom signals that cause vibrations and signals not causing vibrations played at various amplitudes. This presentation focuses on a few statistical techniques for estimating the interval around the PSE. The techniques examined are the Delta Method, Parametric and Nonparametric Bootstrapping, and Bayesian Posterior Estimation.
Bayesian functional integral method for inferring continuous data from discrete measurements.
Heuett, William J; Miller, Bernard V; Racette, Susan B; Holloszy, John O; Chow, Carson C; Periwal, Vipul
2012-02-08
Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Anees, Asim; Aryal, Jagannath; O'Reilly, Małgorzata M.; Gale, Timothy J.; Wardlaw, Tim
2016-12-01
A robust non-parametric framework, based on multiple Radial Basic Function (RBF) kernels, is proposed in this study, for detecting land/forest cover changes using Landsat 7 ETM+ images. One of the widely used frameworks is to find change vectors (difference image) and use a supervised classifier to differentiate between change and no-change. The Bayesian Classifiers e.g. Maximum Likelihood Classifier (MLC), Naive Bayes (NB), are widely used probabilistic classifiers which assume parametric models, e.g. Gaussian function, for the class conditional distributions. However, their performance can be limited if the data set deviates from the assumed model. The proposed framework exploits the useful properties of Least Squares Probabilistic Classifier (LSPC) formulation i.e. non-parametric and probabilistic nature, to model class posterior probabilities of the difference image using a linear combination of a large number of Gaussian kernels. To this end, a simple technique, based on 10-fold cross-validation is also proposed for tuning model parameters automatically instead of selecting a (possibly) suboptimal combination from pre-specified lists of values. The proposed framework has been tested and compared with Support Vector Machine (SVM) and NB for detection of defoliation, caused by leaf beetles (Paropsisterna spp.) in Eucalyptus nitens and Eucalyptus globulus plantations of two test areas, in Tasmania, Australia, using raw bands and band combination indices of Landsat 7 ETM+. It was observed that due to multi-kernel non-parametric formulation and probabilistic nature, the LSPC outperforms parametric NB with Gaussian assumption in change detection framework, with Overall Accuracy (OA) ranging from 93.6% (κ = 0.87) to 97.4% (κ = 0.94) against 85.3% (κ = 0.69) to 93.4% (κ = 0.85), and is more robust to changing data distributions. Its performance was comparable to SVM, with added advantages of being probabilistic and capable of handling multi-class problems naturally with its original formulation.
NASA Astrophysics Data System (ADS)
Plant, N. G.; Thieler, E. R.; Gutierrez, B.; Lentz, E. E.; Zeigler, S. L.; Van Dongeren, A.; Fienen, M. N.
2016-12-01
We evaluate the strengths and weaknesses of Bayesian networks that have been used to address scientific and decision-support questions related to coastal geomorphology. We will provide an overview of coastal geomorphology research that has used Bayesian networks and describe what this approach can do and when it works (or fails to work). Over the past decade, Bayesian networks have been formulated to analyze the multi-variate structure and evolution of coastal morphology and associated human and ecological impacts. The approach relates observable system variables to each other by estimating discrete correlations. The resulting Bayesian-networks make predictions that propagate errors, conduct inference via Bayes rule, or both. In scientific applications, the model results are useful for hypothesis testing, using confidence estimates to gage the strength of tests while applications to coastal resource management are aimed at decision-support, where the probabilities of desired ecosystems outcomes are evaluated. The range of Bayesian-network applications to coastal morphology includes emulation of high-resolution wave transformation models to make oceanographic predictions, morphologic response to storms and/or sea-level rise, groundwater response to sea-level rise and morphologic variability, habitat suitability for endangered species, and assessment of monetary or human-life risk associated with storms. All of these examples are based on vast observational data sets, numerical model output, or both. We will discuss the progression of our experiments, which has included testing whether the Bayesian-network approach can be implemented and is appropriate for addressing basic and applied scientific problems and evaluating the hindcast and forecast skill of these implementations. We will present and discuss calibration/validation tests that are used to assess the robustness of Bayesian-network models and we will compare these results to tests of other models. This will demonstrate how Bayesian networks are used to extract new insights about coastal morphologic behavior, assess impacts to societal and ecological systems, and communicate probabilistic predictions to decision makers.
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan; Cai, Jing-Heng
2010-01-01
Analysis of ordered binary and unordered binary data has received considerable attention in social and psychological research. This article introduces a Bayesian approach, which has several nice features in practical applications, for analyzing nonlinear structural equation models with dichotomous data. We demonstrate how to use the software…
Bayesian Finite Mixtures for Nonlinear Modeling of Educational Data.
ERIC Educational Resources Information Center
Tirri, Henry; And Others
A Bayesian approach for finding latent classes in data is discussed. The approach uses finite mixture models to describe the underlying structure in the data and demonstrate that the possibility of using full joint probability models raises interesting new prospects for exploratory data analysis. The concepts and methods discussed are illustrated…
Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory
ERIC Educational Resources Information Center
Muthen, Bengt; Asparouhov, Tihomir
2012-01-01
This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed…
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Detection of multiple damages employing best achievable eigenvectors under Bayesian inference
NASA Astrophysics Data System (ADS)
Prajapat, Kanta; Ray-Chaudhuri, Samit
2018-05-01
A novel approach is presented in this work to localize simultaneously multiple damaged elements in a structure along with the estimation of damage severity for each of the damaged elements. For detection of damaged elements, a best achievable eigenvector based formulation has been derived. To deal with noisy data, Bayesian inference is employed in the formulation wherein the likelihood of the Bayesian algorithm is formed on the basis of errors between the best achievable eigenvectors and the measured modes. In this approach, the most probable damage locations are evaluated under Bayesian inference by generating combinations of various possible damaged elements. Once damage locations are identified, damage severities are estimated using a Bayesian inference Markov chain Monte Carlo simulation. The efficiency of the proposed approach has been demonstrated by carrying out a numerical study involving a 12-story shear building. It has been found from this study that damage scenarios involving as low as 10% loss of stiffness in multiple elements are accurately determined (localized and severities quantified) even when 2% noise contaminated modal data are utilized. Further, this study introduces a term parameter impact (evaluated based on sensitivity of modal parameters towards structural parameters) to decide the suitability of selecting a particular mode, if some idea about the damaged elements are available. It has been demonstrated here that the accuracy and efficiency of the Bayesian quantification algorithm increases if damage localization is carried out a-priori. An experimental study involving a laboratory scale shear building and different stiffness modification scenarios shows that the proposed approach is efficient enough to localize the stories with stiffness modification.
Yu, Wenbao; Park, Taesung
2014-01-01
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
Fu, Zhibiao; Baker, Daniel; Cheng, Aili; Leighton, Julie; Appelbaum, Edward; Aon, Juan
2016-05-01
The principle of quality by design (QbD) has been widely applied to biopharmaceutical manufacturing processes. Process characterization is an essential step to implement the QbD concept to establish the design space and to define the proven acceptable ranges (PAR) for critical process parameters (CPPs). In this study, we present characterization of a Saccharomyces cerevisiae fermentation process using risk assessment analysis, statistical design of experiments (DoE), and the multivariate Bayesian predictive approach. The critical quality attributes (CQAs) and CPPs were identified with a risk assessment. The statistical model for each attribute was established using the results from the DoE study with consideration given to interactions between CPPs. Both the conventional overlapping contour plot and the multivariate Bayesian predictive approaches were used to establish the region of process operating conditions where all attributes met their specifications simultaneously. The quantitative Bayesian predictive approach was chosen to define the PARs for the CPPs, which apply to the manufacturing control strategy. Experience from the 10,000 L manufacturing scale process validation, including 64 continued process verification batches, indicates that the CPPs remain under a state of control and within the established PARs. The end product quality attributes were within their drug substance specifications. The probability generated with the Bayesian approach was also used as a tool to assess CPP deviations. This approach can be extended to develop other production process characterization and quantify a reliable operating region. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:799-812, 2016. © 2016 American Institute of Chemical Engineers.
NASA Astrophysics Data System (ADS)
Fernandes, Adji Achmad Rinaldo; Solimun, Arisoesilaningsih, Endang
2017-12-01
The aim of this research is to estimate the spline in Path Analysis-based on Nonparametric Regression using Penalized Weighted Least Square (PWLS) approach. Approach used is Reproducing Kernel Hilbert Space at sobolev space. Nonparametric path analysis model on the equation y1 i=f1.1(x1 i)+ε1 i; y2 i=f1.2(x1 i)+f2.2(y1 i)+ε2 i; i =1 ,2 ,…,n Nonparametric Path Analysis which meet the criteria of minimizing PWLS min fw .k∈W2m[aw .k,bw .k], k =1 ,2 { (2n ) -1(y˜-f ˜ ) TΣ-1(y ˜-f ˜ ) + ∑k =1 2 ∑w =1 2 λw .k ∫aw .k bw .k [fw.k (m )(xi) ] 2d xi } is f ˜^=Ay ˜ with A=T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1+V1U1-1∑-1[I-T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1] columnalign="left">+T2(T2TU2-1∑-1T2)-1T2TU2-1∑-1+V2U2-1∑-1[I1-T2(T2TU2-1∑-1T2) -1T2TU2-1∑-1
ERIC Educational Resources Information Center
West, Patti; Rutstein, Daisy Wise; Mislevy, Robert J.; Liu, Junhui; Choi, Younyoung; Levy, Roy; Crawford, Aaron; DiCerbo, Kristen E.; Chappel, Kristina; Behrens, John T.
2010-01-01
A major issue in the study of learning progressions (LPs) is linking student performance on assessment tasks to the progressions. This report describes the challenges faced in making this linkage using Bayesian networks to model LPs in the field of computer networking. The ideas are illustrated with exemplar Bayesian networks built on Cisco…
Kimberley K. Ayre; Wayne G. Landis
2012-01-01
We present a Bayesian network model based on the ecological risk assessment framework to evaluate potential impacts to habitats and resources resulting from wildfire, grazing, forest management activities, and insect outbreaks in a forested landscape in northeastern Oregon. The Bayesian network structure consisted of three tiers of nodes: landscape disturbances,...
Luce, Bryan R; Connor, Jason T; Broglio, Kristine R; Mullins, C Daniel; Ishak, K Jack; Saunders, Elijah; Davis, Barry R
2016-09-20
Bayesian and adaptive clinical trial designs offer the potential for more efficient processes that result in lower sample sizes and shorter trial durations than traditional designs. To explore the use and potential benefits of Bayesian adaptive clinical trial designs in comparative effectiveness research. Virtual execution of ALLHAT (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) as if it had been done according to a Bayesian adaptive trial design. Comparative effectiveness trial of antihypertensive medications. Patient data sampled from the more than 42 000 patients enrolled in ALLHAT with publicly available data. Number of patients randomly assigned between groups, trial duration, observed numbers of events, and overall trial results and conclusions. The Bayesian adaptive approach and original design yielded similar overall trial conclusions. The Bayesian adaptive trial randomly assigned more patients to the better-performing group and would probably have ended slightly earlier. This virtual trial execution required limited resampling of ALLHAT patients for inclusion in RE-ADAPT (REsearch in ADAptive methods for Pragmatic Trials). Involvement of a data monitoring committee and other trial logistics were not considered. In a comparative effectiveness research trial, Bayesian adaptive trial designs are a feasible approach and potentially generate earlier results and allocate more patients to better-performing groups. National Heart, Lung, and Blood Institute.
Zhang, Wangshu; Coba, Marcelo P; Sun, Fengzhu
2016-01-11
Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes. The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.
Wang, Tianli; Baron, Kyle; Zhong, Wei; Brundage, Richard; Elmquist, William
2014-03-01
The current study presents a Bayesian approach to non-compartmental analysis (NCA), which provides the accurate and precise estimate of AUC 0 (∞) and any AUC 0 (∞) -based NCA parameter or derivation. In order to assess the performance of the proposed method, 1,000 simulated datasets were generated in different scenarios. A Bayesian method was used to estimate the tissue and plasma AUC 0 (∞) s and the tissue-to-plasma AUC 0 (∞) ratio. The posterior medians and the coverage of 95% credible intervals for the true parameter values were examined. The method was applied to laboratory data from a mice brain distribution study with serial sacrifice design for illustration. Bayesian NCA approach is accurate and precise in point estimation of the AUC 0 (∞) and the partition coefficient under a serial sacrifice design. It also provides a consistently good variance estimate, even considering the variability of the data and the physiological structure of the pharmacokinetic model. The application in the case study obtained a physiologically reasonable posterior distribution of AUC, with a posterior median close to the value estimated by classic Bailer-type methods. This Bayesian NCA approach for sparse data analysis provides statistical inference on the variability of AUC 0 (∞) -based parameters such as partition coefficient and drug targeting index, so that the comparison of these parameters following destructive sampling becomes statistically feasible.
Bayesian accounts of covert selective attention: A tutorial review.
Vincent, Benjamin T
2015-05-01
Decision making and optimal observer models offer an important theoretical approach to the study of covert selective attention. While their probabilistic formulation allows quantitative comparison to human performance, the models can be complex and their insights are not always immediately apparent. Part 1 establishes the theoretical appeal of the Bayesian approach, and introduces the way in which probabilistic approaches can be applied to covert search paradigms. Part 2 presents novel formulations of Bayesian models of 4 important covert attention paradigms, illustrating optimal observer predictions over a range of experimental manipulations. Graphical model notation is used to present models in an accessible way and Supplementary Code is provided to help bridge the gap between model theory and practical implementation. Part 3 reviews a large body of empirical and modelling evidence showing that many experimental phenomena in the domain of covert selective attention are a set of by-products. These effects emerge as the result of observers conducting Bayesian inference with noisy sensory observations, prior expectations, and knowledge of the generative structure of the stimulus environment.
Confidence of compliance: a Bayesian approach for percentile standards.
McBride, G B; Ellis, J C
2001-04-01
Rules for assessing compliance with percentile standards commonly limit the number of exceedances permitted in a batch of samples taken over a defined assessment period. Such rules are commonly developed using classical statistical methods. Results from alternative Bayesian methods are presented (using beta-distributed prior information and a binomial likelihood), resulting in "confidence of compliance" graphs. These allow simple reading of the consumer's risk and the supplier's risks for any proposed rule. The influence of the prior assumptions required by the Bayesian technique on the confidence results is demonstrated, using two reference priors (uniform and Jeffreys') and also using optimistic and pessimistic user-defined priors. All four give less pessimistic results than does the classical technique, because interpreting classical results as "confidence of compliance" actually invokes a Bayesian approach with an extreme prior distribution. Jeffreys' prior is shown to be the most generally appropriate choice of prior distribution. Cost savings can be expected using rules based on this approach.
NASA Astrophysics Data System (ADS)
Shen, Chien-wen
2009-01-01
During the processes of TFT-LCD manufacturing, steps like visual inspection of panel surface defects still heavily rely on manual operations. As the manual inspection time of TFT-LCD manufacturing could range from 4 hours to 1 day, the reliability of time forecasting is thus important for production planning, scheduling and customer response. This study would like to propose a practical and easy-to-implement prediction model through the approach of Bayesian networks for time estimation of manual operated procedures in TFT-LCD manufacturing. Given the lack of prior knowledge about manual operation time, algorithms of necessary path condition and expectation-maximization are used for structural learning and estimation of conditional probability distributions respectively. This study also applied Bayesian inference to evaluate the relationships between explanatory variables and manual operation time. With the empirical applications of this proposed forecasting model, approach of Bayesian networks demonstrates its practicability and prediction accountability.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.
Duarte, Belmiro P M; Wong, Weng Kee
2015-08-01
This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach
Duarte, Belmiro P. M.; Wong, Weng Kee
2014-01-01
Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159
Applications of Bayesian Statistics to Problems in Gamma-Ray Bursts
NASA Technical Reports Server (NTRS)
Meegan, Charles A.
1997-01-01
This presentation will describe two applications of Bayesian statistics to Gamma Ray Bursts (GRBS). The first attempts to quantify the evidence for a cosmological versus galactic origin of GRBs using only the observations of the dipole and quadrupole moments of the angular distribution of bursts. The cosmological hypothesis predicts isotropy, while the galactic hypothesis is assumed to produce a uniform probability distribution over positive values for these moments. The observed isotropic distribution indicates that the Bayes factor for the cosmological hypothesis over the galactic hypothesis is about 300. Another application of Bayesian statistics is in the estimation of chance associations of optical counterparts with galaxies. The Bayesian approach is preferred to frequentist techniques here because the Bayesian approach easily accounts for galaxy mass distributions and because one can incorporate three disjoint hypotheses: (1) bursts come from galactic centers, (2) bursts come from galaxies in proportion to luminosity, and (3) bursts do not come from external galaxies. This technique was used in the analysis of the optical counterpart to GRB970228.
Bayesian inference of a historical bottleneck in a heavily exploited marine mammal.
Hoffman, J I; Grant, S M; Forcada, J; Phillips, C D
2011-10-01
Emerging Bayesian analytical approaches offer increasingly sophisticated means of reconstructing historical population dynamics from genetic data, but have been little applied to scenarios involving demographic bottlenecks. Consequently, we analysed a large mitochondrial and microsatellite dataset from the Antarctic fur seal Arctocephalus gazella, a species subjected to one of the most extreme examples of uncontrolled exploitation in history when it was reduced to the brink of extinction by the sealing industry during the late eighteenth and nineteenth centuries. Classical bottleneck tests, which exploit the fact that rare alleles are rapidly lost during demographic reduction, yielded ambiguous results. In contrast, a strong signal of recent demographic decline was detected using both Bayesian skyline plots and Approximate Bayesian Computation, the latter also allowing derivation of posterior parameter estimates that were remarkably consistent with historical observations. This was achieved using only contemporary samples, further emphasizing the potential of Bayesian approaches to address important problems in conservation and evolutionary biology. © 2011 Blackwell Publishing Ltd.
Quantum state estimation when qubits are lost: a no-data-left-behind approach
Williams, Brian P.; Lougovski, Pavel
2017-04-06
We present an approach to Bayesian mean estimation of quantum states using hyperspherical parametrization and an experiment-specific likelihood which allows utilization of all available data, even when qubits are lost. With this method, we report the first closed-form Bayesian mean and maximum likelihood estimates for the ideal single qubit. Due to computational constraints, we utilize numerical sampling to determine the Bayesian mean estimate for a photonic two-qubit experiment in which our novel analysis reduces burdens associated with experimental asymmetries and inefficiencies. This method can be applied to quantum states of any dimension and experimental complexity.
NASA Astrophysics Data System (ADS)
Bakoban, Rana A.
2017-08-01
The coefficient of variation [CV] has several applications in applied statistics. So in this paper, we adopt Bayesian and non-Bayesian approaches for the estimation of CV under type-II censored data from extension exponential distribution [EED]. The point and interval estimate of the CV are obtained for each of the maximum likelihood and parametric bootstrap techniques. Also the Bayesian approach with the help of MCMC method is presented. A real data set is presented and analyzed, hence the obtained results are used to assess the obtained theoretical results.
ERIC Educational Resources Information Center
Aslan, Burak Galip; Öztürk, Özlem; Inceoglu, Mustafa Murat
2014-01-01
Considering the increasing importance of adaptive approaches in CALL systems, this study implemented a machine learning based student modeling middleware with Bayesian networks. The profiling approach of the student modeling system is based on Felder and Silverman's Learning Styles Model and Felder and Soloman's Index of Learning Styles…
ERIC Educational Resources Information Center
Shuck, Brad; Zigarmi, Drea; Owen, Jesse
2015-01-01
Purpose: The purpose of this study was to empirically examine the utility of self-determination theory (SDT) within the engagement-performance linkage. Design/methodology/approach: Bayesian multi-measurement mediation modeling was used to estimate the relation between SDT, engagement and a proxy measure of performance (e.g. work intentions) (N =…
Predicting Graduation Rates at 4-Year Broad Access Institutions Using a Bayesian Modeling Approach
ERIC Educational Resources Information Center
Crisp, Gloria; Doran, Erin; Salis Reyes, Nicole A.
2018-01-01
This study models graduation rates at 4-year broad access institutions (BAIs). We examine the student body, structural-demographic, and financial characteristics that best predict 6-year graduation rates across two time periods (2008-2009 and 2014-2015). A Bayesian model averaging approach is utilized to account for uncertainty in variable…
Inverse and forward modeling under uncertainty using MRE-based Bayesian approach
NASA Astrophysics Data System (ADS)
Hou, Z.; Rubin, Y.
2004-12-01
A stochastic inverse approach for subsurface characterization is proposed and applied to shallow vadose zone at a winery field site in north California and to a gas reservoir at the Ormen Lange field site in the North Sea. The approach is formulated in a Bayesian-stochastic framework, whereby the unknown parameters are identified in terms of their statistical moments or their probabilities. Instead of the traditional single-valued estimation /prediction provided by deterministic methods, the approach gives a probability distribution for an unknown parameter. This allows calculating the mean, the mode, and the confidence interval, which is useful for a rational treatment of uncertainty and its consequences. The approach also allows incorporating data of various types and different error levels, including measurements of state variables as well as information such as bounds on or statistical moments of the unknown parameters, which may represent prior information. To obtain minimally subjective prior probabilities required for the Bayesian approach, the principle of Minimum Relative Entropy (MRE) is employed. The approach is tested in field sites for flow parameters identification and soil moisture estimation in the vadose zone and for gas saturation estimation at great depth below the ocean floor. Results indicate the potential of coupling various types of field data within a MRE-based Bayesian formalism for improving the estimation of the parameters of interest.
Hyperspectral image segmentation using a cooperative nonparametric approach
NASA Astrophysics Data System (ADS)
Taher, Akar; Chehdi, Kacem; Cariou, Claude
2013-10-01
In this paper a new unsupervised nonparametric cooperative and adaptive hyperspectral image segmentation approach is presented. The hyperspectral images are partitioned band by band in parallel and intermediate classification results are evaluated and fused, to get the final segmentation result. Two unsupervised nonparametric segmentation methods are used in parallel cooperation, namely the Fuzzy C-means (FCM) method, and the Linde-Buzo-Gray (LBG) algorithm, to segment each band of the image. The originality of the approach relies firstly on its local adaptation to the type of regions in an image (textured, non-textured), and secondly on the introduction of several levels of evaluation and validation of intermediate segmentation results before obtaining the final partitioning of the image. For the management of similar or conflicting results issued from the two classification methods, we gradually introduced various assessment steps that exploit the information of each spectral band and its adjacent bands, and finally the information of all the spectral bands. In our approach, the detected textured and non-textured regions are treated separately from feature extraction step, up to the final classification results. This approach was first evaluated on a large number of monocomponent images constructed from the Brodatz album. Then it was evaluated on two real applications using a respectively multispectral image for Cedar trees detection in the region of Baabdat (Lebanon) and a hyperspectral image for identification of invasive and non invasive vegetation in the region of Cieza (Spain). A correct classification rate (CCR) for the first application is over 97% and for the second application the average correct classification rate (ACCR) is over 99%.
Bayesian Population Forecasting: Extending the Lee-Carter Method.
Wiśniowski, Arkadiusz; Smith, Peter W F; Bijak, Jakub; Raymer, James; Forster, Jonathan J
2015-06-01
In this article, we develop a fully integrated and dynamic Bayesian approach to forecast populations by age and sex. The approach embeds the Lee-Carter type models for forecasting the age patterns, with associated measures of uncertainty, of fertility, mortality, immigration, and emigration within a cohort projection model. The methodology may be adapted to handle different data types and sources of information. To illustrate, we analyze time series data for the United Kingdom and forecast the components of population change to the year 2024. We also compare the results obtained from different forecast models for age-specific fertility, mortality, and migration. In doing so, we demonstrate the flexibility and advantages of adopting the Bayesian approach for population forecasting and highlight areas where this work could be extended.
Baker, Robert L; Leong, Wen Fung; An, Nan; Brock, Marcus T; Rubin, Matthew J; Welch, Stephen; Weinig, Cynthia
2018-02-01
We develop Bayesian function-valued trait models that mathematically isolate genetic mechanisms underlying leaf growth trajectories by factoring out genotype-specific differences in photosynthesis. Remote sensing data can be used instead of leaf-level physiological measurements. Characterizing the genetic basis of traits that vary during ontogeny and affect plant performance is a major goal in evolutionary biology and agronomy. Describing genetic programs that specifically regulate morphological traits can be complicated by genotypic differences in physiological traits. We describe the growth trajectories of leaves using novel Bayesian function-valued trait (FVT) modeling approaches in Brassica rapa recombinant inbred lines raised in heterogeneous field settings. While frequentist approaches estimate parameter values by treating each experimental replicate discretely, Bayesian models can utilize information in the global dataset, potentially leading to more robust trait estimation. We illustrate this principle by estimating growth asymptotes in the face of missing data and comparing heritabilities of growth trajectory parameters estimated by Bayesian and frequentist approaches. Using pseudo-Bayes factors, we compare the performance of an initial Bayesian logistic growth model and a model that incorporates carbon assimilation (A max ) as a cofactor, thus statistically accounting for genotypic differences in carbon resources. We further evaluate two remotely sensed spectroradiometric indices, photochemical reflectance (pri2) and MERIS Terrestrial Chlorophyll Index (mtci) as covariates in lieu of A max , because these two indices were genetically correlated with A max across years and treatments yet allow much higher throughput compared to direct leaf-level gas-exchange measurements. For leaf lengths in uncrowded settings, including A max improves model fit over the initial model. The mtci and pri2 indices also outperform direct A max measurements. Of particular importance for evolutionary biologists and plant breeders, hierarchical Bayesian models estimating FVT parameters improve heritabilities compared to frequentist approaches.
A Tutorial Introduction to Bayesian Models of Cognitive Development
ERIC Educational Resources Information Center
Perfors, Amy; Tenenbaum, Joshua B.; Griffiths, Thomas L.; Xu, Fei
2011-01-01
We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the "what", the "how", and the "why" of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for…
ERIC Educational Resources Information Center
Sebro, Negusse Yohannes; Goshu, Ayele Taye
2017-01-01
This study aims to explore Bayesian multilevel modeling to investigate variations of average academic achievement of grade eight school students. A sample of 636 students is randomly selected from 26 private and government schools by a two-stage stratified sampling design. Bayesian method is used to estimate the fixed and random effects. Input and…
Is That What Bayesians Believe? Reply to Griffiths, Chater, Norris, and Pouget (2012)
ERIC Educational Resources Information Center
Bowers, Jeffrey S.; Davis, Colin J.
2012-01-01
Griffiths, Chater, Norris, and Pouget (2012) argue that we have misunderstood the Bayesian approach. In their view, it is rarely the case that researchers are making claims that performance in a given task is near optimal, and few, if any, researchers adopt the theoretical Bayesian perspective according to which the mind or brain is actually…
Liang, Li-Jung; Huang, David; Brecht, Mary-Lynn; Hser, Yih-ing
2010-01-01
Studies examining differences in mortality among long-term drug users have been limited. In this paper, we introduce a Bayesian framework that jointly models survival data using a Weibull proportional hazard model with frailty, and substance and alcohol data using mixed-effects models, to examine differences in mortality among heroin, cocaine, and methamphetamine users from five long-term follow-up studies. The traditional approach to analyzing combined survival data from numerous studies assumes that the studies are homogeneous, thus the estimates may be biased due to unobserved heterogeneity among studies. Our approach allows us to structurally combine the data from different studies while accounting for correlation among subjects within each study. Markov chain Monte Carlo facilitates the implementation of Bayesian analyses. Despite the complexity of the model, our approach is relatively straightforward to implement using WinBUGS. We demonstrate our joint modeling approach to the combined data and discuss the results from both approaches. PMID:21052518
More evidence of the plateau effect: a social perspective.
Pinto-Prades, J L; Lopez-Nicolás, A
1998-01-01
The purpose of this study was to test the existence of the plateau effect at the social level. The authors tried to confirm the preliminary conclusion that people may not be willing to trade off any longevity to improve the health state of a large number of people if the health states are mild enough. They tested this assumption using the person-tradeoff technique. They also used a parametric approach and a nonparametric approach to study the relationship between individual and social values. Results show the existence of the plateau effect in the context of resource allocation. Furthermore, with the nonparametric approach, a plateau effect in the middle part of the scale was also observed, suggesting that social preference may not be directly predicted from individual utilities. The authors caution against the possible framing effects that may be present in these kinds of questions.
Using exogenous variables in testing for monotonic trends in hydrologic time series
Alley, William M.
1988-01-01
One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Li, Shi; Mukherjee, Bhramar; Batterman, Stuart; Ghosh, Malay
2013-12-01
Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations. © 2013, The International Biometric Society.
Lefèvre, Thomas; Lepresle, Aude; Chariot, Patrick
2015-09-01
The search for complex, nonlinear relationships and causality in data is hindered by the availability of techniques in many domains, including forensic science. Linear multivariable techniques are useful but present some shortcomings. In the past decade, Bayesian approaches have been introduced in forensic science. To date, authors have mainly focused on providing an alternative to classical techniques for quantifying effects and dealing with uncertainty. Causal networks, including Bayesian networks, can help detangle complex relationships in data. A Bayesian network estimates the joint probability distribution of data and graphically displays dependencies between variables and the circulation of information between these variables. In this study, we illustrate the interest in utilizing Bayesian networks for dealing with complex data through an application in clinical forensic science. Evaluating the functional impairment of assault survivors is a complex task for which few determinants are known. As routinely estimated in France, the duration of this impairment can be quantified by days of 'Total Incapacity to Work' ('Incapacité totale de travail,' ITT). In this study, we used a Bayesian network approach to identify the injury type, victim category and time to evaluation as the main determinants of the 'Total Incapacity to Work' (TIW). We computed the conditional probabilities associated with the TIW node and its parents. We compared this approach with a multivariable analysis, and the results of both techniques were converging. Thus, Bayesian networks should be considered a reliable means to detangle complex relationships in data.
Efficient Probabilistic Diagnostics for Electrical Power Systems
NASA Technical Reports Server (NTRS)
Mengshoel, Ole J.; Chavira, Mark; Cascio, Keith; Poll, Scott; Darwiche, Adnan; Uckun, Serdar
2008-01-01
We consider in this work the probabilistic approach to model-based diagnosis when applied to electrical power systems (EPSs). Our probabilistic approach is formally well-founded, as it based on Bayesian networks and arithmetic circuits. We investigate the diagnostic task known as fault isolation, and pay special attention to meeting two of the main challenges . model development and real-time reasoning . often associated with real-world application of model-based diagnosis technologies. To address the challenge of model development, we develop a systematic approach to representing electrical power systems as Bayesian networks, supported by an easy-to-use speci.cation language. To address the real-time reasoning challenge, we compile Bayesian networks into arithmetic circuits. Arithmetic circuit evaluation supports real-time diagnosis by being predictable and fast. In essence, we introduce a high-level EPS speci.cation language from which Bayesian networks that can diagnose multiple simultaneous failures are auto-generated, and we illustrate the feasibility of using arithmetic circuits, compiled from Bayesian networks, for real-time diagnosis on real-world EPSs of interest to NASA. The experimental system is a real-world EPS, namely the Advanced Diagnostic and Prognostic Testbed (ADAPT) located at the NASA Ames Research Center. In experiments with the ADAPT Bayesian network, which currently contains 503 discrete nodes and 579 edges, we .nd high diagnostic accuracy in scenarios where one to three faults, both in components and sensors, were inserted. The time taken to compute the most probable explanation using arithmetic circuits has a small mean of 0.2625 milliseconds and standard deviation of 0.2028 milliseconds. In experiments with data from ADAPT we also show that arithmetic circuit evaluation substantially outperforms joint tree propagation and variable elimination, two alternative algorithms for diagnosis using Bayesian network inference.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
NASA Astrophysics Data System (ADS)
Lee, Kyungbook; Song, Seok Goo
2017-09-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events ( M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.
Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris
2015-10-01
Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
A multimembership catalogue for 1876 open clusters using UCAC4 data
NASA Astrophysics Data System (ADS)
Sampedro, L.; Dias, W. S.; Alfaro, E. J.; Monteiro, H.; Molino, A.
2017-10-01
The main objective of this work is to determine the cluster members of 1876 open clusters, using positions and proper motions of the astrometric fourth United States Naval Observatory (USNO) CCD Astrograph Catalog (UCAC4). For this purpose, we apply three different methods, all based on a Bayesian approach, but with different formulations: a purely parametric method, another completely non-parametric algorithm and a third, recently developed by Sampedro & Alfaro, using both formulations at different steps of the whole process. The first and second statistical moments of the members' phase-space subspace, obtained after applying the three methods, are compared for every cluster. Although, on average, the three methods yield similar results, there are also specific differences between them, as well as for some particular clusters. The comparison with other published catalogues shows good agreement. We have also estimated, for the first time, the mean proper motion for a sample of 18 clusters. The results are organized in a single catalogue formed by two main files, one with the most relevant information for each cluster, partially including that in UCAC4, and the other showing the individual membership probabilities for each star in the cluster area. The final catalogue, with an interface design that enables an easy interaction with the user, is available in electronic format at the Stellar Systems Group (SSG-IAA) web site (http://ssg.iaa.es/en/content/sampedro-cluster-catalog).
A label field fusion bayesian model and its penalized maximum rand estimator for image segmentation.
Mignotte, Max
2010-06-01
This paper presents a novel segmentation approach based on a Markov random field (MRF) fusion model which aims at combining several segmentation results associated with simpler clustering models in order to achieve a more reliable and accurate segmentation result. The proposed fusion model is derived from the recently introduced probabilistic Rand measure for comparing one segmentation result to one or more manual segmentations of the same image. This non-parametric measure allows us to easily derive an appealing fusion model of label fields, easily expressed as a Gibbs distribution, or as a nonstationary MRF model defined on a complete graph. Concretely, this Gibbs energy model encodes the set of binary constraints, in terms of pairs of pixel labels, provided by each segmentation results to be fused. Combined with a prior distribution, this energy-based Gibbs model also allows for definition of an interesting penalized maximum probabilistic rand estimator with which the fusion of simple, quickly estimated, segmentation results appears as an interesting alternative to complex segmentation models existing in the literature. This fusion framework has been successfully applied on the Berkeley image database. The experiments reported in this paper demonstrate that the proposed method is efficient in terms of visual evaluation and quantitative performance measures and performs well compared to the best existing state-of-the-art segmentation methods recently proposed in the literature.
Fourment, Mathieu; Holmes, Edward C
2014-07-24
Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.
A Bayesian analysis of redshifted 21-cm H I signal and foregrounds: simulations for LOFAR
NASA Astrophysics Data System (ADS)
Ghosh, Abhik; Koopmans, Léon V. E.; Chapman, E.; Jelić, V.
2015-09-01
Observations of the epoch of reionization (EoR) using the 21-cm hyperfine emission of neutral hydrogen (H I) promise to open an entirely new window on the formation of the first stars, galaxies and accreting black holes. In order to characterize the weak 21-cm signal, we need to develop imaging techniques that can reconstruct the extended emission very precisely. Here, we present an inversion technique for LOw Frequency ARray (LOFAR) baselines at the North Celestial Pole (NCP), based on a Bayesian formalism with optimal spatial regularization, which is used to reconstruct the diffuse foreground map directly from the simulated visibility data. We notice that the spatial regularization de-noises the images to a large extent, allowing one to recover the 21-cm power spectrum over a considerable k⊥-k∥ space in the range 0.03 Mpc-1 < k⊥ < 0.19 Mpc-1 and 0.14 Mpc-1 < k∥ < 0.35 Mpc-1 without subtracting the noise power spectrum. We find that, in combination with using generalized morphological component analysis (GMCA), a non-parametric foreground removal technique, we can mostly recover the spherical average power spectrum within 2σ statistical fluctuations for an input Gaussian random root-mean-square noise level of 60 mK in the maps after 600 h of integration over a 10-MHz bandwidth.
Frasso, Gianluca; Lambert, Philippe
2016-10-01
SummaryThe 2014 Ebola outbreak in Sierra Leone is analyzed using a susceptible-exposed-infectious-removed (SEIR) epidemic compartmental model. The discrete time-stochastic model for the epidemic evolution is coupled to a set of ordinary differential equations describing the dynamics of the expected proportions of subjects in each epidemic state. The unknown parameters are estimated in a Bayesian framework by combining data on the number of new (laboratory confirmed) Ebola cases reported by the Ministry of Health and prior distributions for the transition rates elicited using information collected by the WHO during the follow-up of specific Ebola cases. The time-varying disease transmission rate is modeled in a flexible way using penalized B-splines. Our framework represents a valuable stochastic tool for the study of an epidemic dynamic even when only irregularly observed and possibly aggregated data are available. Simulations and the analysis of the 2014 Sierra Leone Ebola data highlight the merits of the proposed methodology. In particular, the flexible modeling of the disease transmission rate makes the estimation of the effective reproduction number robust to the misspecification of the initial epidemic states and to underreporting of the infectious cases. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Mean Field Variational Bayesian Data Assimilation
NASA Astrophysics Data System (ADS)
Vrettas, M.; Cornford, D.; Opper, M.
2012-04-01
Current data assimilation schemes propose a range of approximate solutions to the classical data assimilation problem, particularly state estimation. Broadly there are three main active research areas: ensemble Kalman filter methods which rely on statistical linearization of the model evolution equations, particle filters which provide a discrete point representation of the posterior filtering or smoothing distribution and 4DVAR methods which seek the most likely posterior smoothing solution. In this paper we present a recent extension to our variational Bayesian algorithm which seeks the most probably posterior distribution over the states, within the family of non-stationary Gaussian processes. Our original work on variational Bayesian approaches to data assimilation sought the best approximating time varying Gaussian process to the posterior smoothing distribution for stochastic dynamical systems. This approach was based on minimising the Kullback-Leibler divergence between the true posterior over paths, and our Gaussian process approximation. So long as the observation density was sufficiently high to bring the posterior smoothing density close to Gaussian the algorithm proved very effective, on lower dimensional systems. However for higher dimensional systems, the algorithm was computationally very demanding. We have been developing a mean field version of the algorithm which treats the state variables at a given time as being independent in the posterior approximation, but still accounts for their relationships between each other in the mean solution arising from the original dynamical system. In this work we present the new mean field variational Bayesian approach, illustrating its performance on a range of classical data assimilation problems. We discuss the potential and limitations of the new approach. We emphasise that the variational Bayesian approach we adopt, in contrast to other variational approaches, provides a bound on the marginal likelihood of the observations given parameters in the model which also allows inference of parameters such as observation errors, and parameters in the model and model error representation, particularly if this is written as a deterministic form with small additive noise. We stress that our approach can address very long time window and weak constraint settings. However like traditional variational approaches our Bayesian variational method has the benefit of being posed as an optimisation problem. We finish with a sketch of the future directions for our approach.
Struchen, R; Vial, F; Andersson, M G
2017-04-26
Delayed reporting of health data may hamper the early detection of infectious diseases in surveillance systems. Furthermore, combining multiple data streams, e.g. aiming at improving a system's sensitivity, can be challenging. In this study, we used a Bayesian framework where the result is presented as the value of evidence, i.e. the likelihood ratio for the evidence under outbreak versus baseline conditions. Based on a historical data set of routinely collected cattle mortality events, we evaluated outbreak detection performance (sensitivity, time to detection, in-control run length) under the Bayesian approach among three scenarios: presence of delayed data reporting, but not accounting for it; presence of delayed data reporting accounted for; and absence of delayed data reporting (i.e. an ideal system). Performance on larger and smaller outbreaks was compared with a classical approach, considering syndromes separately or combined. We found that the Bayesian approach performed better than the classical approach, especially for the smaller outbreaks. Furthermore, the Bayesian approach performed similarly well in the scenario where delayed reporting was accounted for to the scenario where it was absent. We argue that the value of evidence framework may be suitable for surveillance systems with multiple syndromes and delayed reporting of data.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented usingmore » the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.« less
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
Order priors for Bayesian network discovery with an application to malware phylogeny
Oyen, Diane; Anderson, Blake; Sentz, Kari; ...
2017-09-15
Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less
Order priors for Bayesian network discovery with an application to malware phylogeny
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oyen, Diane; Anderson, Blake; Sentz, Kari
Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less
NASA Astrophysics Data System (ADS)
Cox, M.; Shirono, K.
2017-10-01
A criticism levelled at the Guide to the Expression of Uncertainty in Measurement (GUM) is that it is based on a mixture of frequentist and Bayesian thinking. In particular, the GUM’s Type A (statistical) uncertainty evaluations are frequentist, whereas the Type B evaluations, using state-of-knowledge distributions, are Bayesian. In contrast, making the GUM fully Bayesian implies, among other things, that a conventional objective Bayesian approach to Type A uncertainty evaluation for a number n of observations leads to the impractical consequence that n must be at least equal to 4, thus presenting a difficulty for many metrologists. This paper presents a Bayesian analysis of Type A uncertainty evaluation that applies for all n ≥slant 2 , as in the frequentist analysis in the current GUM. The analysis is based on assuming that the observations are drawn from a normal distribution (as in the conventional objective Bayesian analysis), but uses an informative prior based on lower and upper bounds for the standard deviation of the sampling distribution for the quantity under consideration. The main outcome of the analysis is a closed-form mathematical expression for the factor by which the standard deviation of the mean observation should be multiplied to calculate the required standard uncertainty. Metrological examples are used to illustrate the approach, which is straightforward to apply using a formula or look-up table.
Inference in the age of big data: Future perspectives on neuroscience.
Bzdok, Danilo; Yeo, B T Thomas
2017-07-15
Neuroscience is undergoing faster changes than ever before. Over 100 years our field qualitatively described and invasively manipulated single or few organisms to gain anatomical, physiological, and pharmacological insights. In the last 10 years neuroscience spawned quantitative datasets of unprecedented breadth (e.g., microanatomy, synaptic connections, and optogenetic brain-behavior assays) and size (e.g., cognition, brain imaging, and genetics). While growing data availability and information granularity have been amply discussed, we direct attention to a less explored question: How will the unprecedented data richness shape data analysis practices? Statistical reasoning is becoming more important to distill neurobiological knowledge from healthy and pathological brain measurements. We argue that large-scale data analysis will use more statistical models that are non-parametric, generative, and mixing frequentist and Bayesian aspects, while supplementing classical hypothesis testing with out-of-sample predictions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Dynamic classification of fetal heart rates by hierarchical Dirichlet process mixture models.
Yu, Kezi; Quirk, J Gerald; Djurić, Petar M
2017-01-01
In this paper, we propose an application of non-parametric Bayesian (NPB) models for classification of fetal heart rate (FHR) recordings. More specifically, we propose models that are used to differentiate between FHR recordings that are from fetuses with or without adverse outcomes. In our work, we rely on models based on hierarchical Dirichlet processes (HDP) and the Chinese restaurant process with finite capacity (CRFC). Two mixture models were inferred from real recordings, one that represents healthy and another, non-healthy fetuses. The models were then used to classify new recordings and provide the probability of the fetus being healthy. First, we compared the classification performance of the HDP models with that of support vector machines on real data and concluded that the HDP models achieved better performance. Then we demonstrated the use of mixture models based on CRFC for dynamic classification of the performance of (FHR) recordings in a real-time setting.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Dynamic classification of fetal heart rates by hierarchical Dirichlet process mixture models
Yu, Kezi; Quirk, J. Gerald
2017-01-01
In this paper, we propose an application of non-parametric Bayesian (NPB) models for classification of fetal heart rate (FHR) recordings. More specifically, we propose models that are used to differentiate between FHR recordings that are from fetuses with or without adverse outcomes. In our work, we rely on models based on hierarchical Dirichlet processes (HDP) and the Chinese restaurant process with finite capacity (CRFC). Two mixture models were inferred from real recordings, one that represents healthy and another, non-healthy fetuses. The models were then used to classify new recordings and provide the probability of the fetus being healthy. First, we compared the classification performance of the HDP models with that of support vector machines on real data and concluded that the HDP models achieved better performance. Then we demonstrated the use of mixture models based on CRFC for dynamic classification of the performance of (FHR) recordings in a real-time setting. PMID:28953927
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
A new method for E-government procurement using collaborative filtering and Bayesian approach.
Zhang, Shuai; Xi, Chengyu; Wang, Yan; Zhang, Wenyu; Chen, Yanhong
2013-01-01
Nowadays, as the Internet services increase faster than ever before, government systems are reinvented as E-government services. Therefore, government procurement sectors have to face challenges brought by the explosion of service information. This paper presents a novel method for E-government procurement (eGP) to search for the optimal procurement scheme (OPS). Item-based collaborative filtering and Bayesian approach are used to evaluate and select the candidate services to get the top-M recommendations such that the involved computation load can be alleviated. A trapezoidal fuzzy number similarity algorithm is applied to support the item-based collaborative filtering and Bayesian approach, since some of the services' attributes can be hardly expressed as certain and static values but only be easily represented as fuzzy values. A prototype system is built and validated with an illustrative example from eGP to confirm the feasibility of our approach.
A New Method for E-Government Procurement Using Collaborative Filtering and Bayesian Approach
Wang, Yan
2013-01-01
Nowadays, as the Internet services increase faster than ever before, government systems are reinvented as E-government services. Therefore, government procurement sectors have to face challenges brought by the explosion of service information. This paper presents a novel method for E-government procurement (eGP) to search for the optimal procurement scheme (OPS). Item-based collaborative filtering and Bayesian approach are used to evaluate and select the candidate services to get the top-M recommendations such that the involved computation load can be alleviated. A trapezoidal fuzzy number similarity algorithm is applied to support the item-based collaborative filtering and Bayesian approach, since some of the services' attributes can be hardly expressed as certain and static values but only be easily represented as fuzzy values. A prototype system is built and validated with an illustrative example from eGP to confirm the feasibility of our approach. PMID:24385869
Bayesian evidence computation for model selection in non-linear geoacoustic inference problems.
Dettmer, Jan; Dosso, Stan E; Osler, John C
2010-12-01
This paper applies a general Bayesian inference approach, based on Bayesian evidence computation, to geoacoustic inversion of interface-wave dispersion data. Quantitative model selection is carried out by computing the evidence (normalizing constants) for several model parameterizations using annealed importance sampling. The resulting posterior probability density estimate is compared to estimates obtained from Metropolis-Hastings sampling to ensure consistent results. The approach is applied to invert interface-wave dispersion data collected on the Scotian Shelf, off the east coast of Canada for the sediment shear-wave velocity profile. Results are consistent with previous work on these data but extend the analysis to a rigorous approach including model selection and uncertainty analysis. The results are also consistent with core samples and seismic reflection measurements carried out in the area.
NASA Astrophysics Data System (ADS)
Curceac, S.; Ternynck, C.; Ouarda, T.
2015-12-01
Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed
Computational Neuropsychology and Bayesian Inference.
Parr, Thomas; Rees, Geraint; Friston, Karl J
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine 'prior' beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology - optimal inference with suboptimal priors - and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient's behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology.
Computational Neuropsychology and Bayesian Inference
Parr, Thomas; Rees, Geraint; Friston, Karl J.
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine ‘prior’ beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology – optimal inference with suboptimal priors – and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient’s behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology. PMID:29527157
A Nonparametric Approach to Segmentation of Ladar Images
2012-12-01
82 5.3.4 Simulation ...91 6.2.1 Multi-Facet Board Simulation ...Pulse spreading effects are ignored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 xiii Figure Page 5.4. Signal pulse energy with
Treatment of Selective Mutism: A Best-Evidence Synthesis.
ERIC Educational Resources Information Center
Stone, Beth Pionek; Kratochwill, Thomas R.; Sladezcek, Ingrid; Serlin, Ronald C.
2002-01-01
Presents systematic analysis of the major treatment approaches used for selective mutism. Based on nonparametric statistical tests of effect sizes, major findings include the following: treatment of selective mutism is more effective than no treatment; behaviorally oriented treatment approaches are more effective than no treatment; and no…
A Bayesian approach for parameter estimation and prediction using a computationally intensive model
Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...
2015-02-05
Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less
Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger
2017-01-01
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Development of dynamic Bayesian models for web application test management
NASA Astrophysics Data System (ADS)
Azarnova, T. V.; Polukhin, P. V.; Bondarenko, Yu V.; Kashirina, I. L.
2018-03-01
The mathematical apparatus of dynamic Bayesian networks is an effective and technically proven tool that can be used to model complex stochastic dynamic processes. According to the results of the research, mathematical models and methods of dynamic Bayesian networks provide a high coverage of stochastic tasks associated with error testing in multiuser software products operated in a dynamically changing environment. Formalized representation of the discrete test process as a dynamic Bayesian model allows us to organize the logical connection between individual test assets for multiple time slices. This approach gives an opportunity to present testing as a discrete process with set structural components responsible for the generation of test assets. Dynamic Bayesian network-based models allow us to combine in one management area individual units and testing components with different functionalities and a direct influence on each other in the process of comprehensive testing of various groups of computer bugs. The application of the proposed models provides an opportunity to use a consistent approach to formalize test principles and procedures, methods used to treat situational error signs, and methods used to produce analytical conclusions based on test results.
Vogt, Martin; Bajorath, Jürgen
2008-01-01
Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.
A Bayesian pick-the-winner design in a randomized phase II clinical trial.
Chen, Dung-Tsa; Huang, Po-Yu; Lin, Hui-Yi; Chiappori, Alberto A; Gabrilovich, Dmitry I; Haura, Eric B; Antonia, Scott J; Gray, Jhanelle E
2017-10-24
Many phase II clinical trials evaluate unique experimental drugs/combinations through multi-arm design to expedite the screening process (early termination of ineffective drugs) and to identify the most effective drug (pick the winner) to warrant a phase III trial. Various statistical approaches have been developed for the pick-the-winner design but have been criticized for lack of objective comparison among the drug agents. We developed a Bayesian pick-the-winner design by integrating a Bayesian posterior probability with Simon two-stage design in a randomized two-arm clinical trial. The Bayesian posterior probability, as the rule to pick the winner, is defined as probability of the response rate in one arm higher than in the other arm. The posterior probability aims to determine the winner when both arms pass the second stage of the Simon two-stage design. When both arms are competitive (i.e., both passing the second stage), the Bayesian posterior probability performs better to correctly identify the winner compared with the Fisher exact test in the simulation study. In comparison to a standard two-arm randomized design, the Bayesian pick-the-winner design has a higher power to determine a clear winner. In application to two studies, the approach is able to perform statistical comparison of two treatment arms and provides a winner probability (Bayesian posterior probability) to statistically justify the winning arm. We developed an integrated design that utilizes Bayesian posterior probability, Simon two-stage design, and randomization into a unique setting. It gives objective comparisons between the arms to determine the winner.
Unification of field theory and maximum entropy methods for learning probability densities
NASA Astrophysics Data System (ADS)
Kinney, Justin B.
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Unification of field theory and maximum entropy methods for learning probability densities.
Kinney, Justin B
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Non-Bayesian Optical Inference Machines
NASA Astrophysics Data System (ADS)
Kadar, Ivan; Eichmann, George
1987-01-01
In a recent paper, Eichmann and Caulfield) presented a preliminary exposition of optical learning machines suited for use in expert systems. In this paper, we extend the previous ideas by introducing learning as a means of reinforcement by information gathering and reasoning with uncertainty in a non-Bayesian framework2. More specifically, the non-Bayesian approach allows the representation of total ignorance (not knowing) as opposed to assuming equally likely prior distributions.
Comparison of methods for estimating the attributable risk in the context of survival analysis.
Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M
2017-01-23
The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.
ERIC Educational Resources Information Center
Pek, Jolynn; Losardo, Diane; Bauer, Daniel J.
2011-01-01
Compared to parametric models, nonparametric and semiparametric approaches to modeling nonlinearity between latent variables have the advantage of recovering global relationships of unknown functional form. Bauer (2005) proposed an indirect application of finite mixtures of structural equation models where latent components are estimated in the…
Chen, Jinsong; Zhang, Dake; Choi, Jaehwa
2015-12-01
It is common to encounter latent variables with ordinal data in social or behavioral research. Although a mediated effect of latent variables (latent mediated effect, or LME) with ordinal data may appear to be a straightforward combination of LME with continuous data and latent variables with ordinal data, the methodological challenges to combine the two are not trivial. This research covers model structures as complex as LME and formulates both point and interval estimates of LME for ordinal data using the Bayesian full-information approach. We also combine weighted least squares (WLS) estimation with the bias-corrected bootstrapping (BCB; Efron Journal of the American Statistical Association, 82, 171-185, 1987) method or the traditional delta method as the limited-information approach. We evaluated the viability of these different approaches across various conditions through simulation studies, and provide an empirical example to illustrate the approaches. We found that the Bayesian approach with reasonably informative priors is preferred when both point and interval estimates are of interest and the sample size is 200 or above.
Bockman, Alexander; Fackler, Cameron; Xiang, Ning
2015-04-01
Acoustic performance for an interior requires an accurate description of the boundary materials' surface acoustic impedance. Analytical methods may be applied to a small class of test geometries, but inverse numerical methods provide greater flexibility. The parameter estimation problem requires minimizing prediction vice observed acoustic field pressure. The Bayesian-network sampling approach presented here mitigates other methods' susceptibility to noise inherent to the experiment, model, and numerics. A geometry agnostic method is developed here and its parameter estimation performance is demonstrated for an air-backed micro-perforated panel in an impedance tube. Good agreement is found with predictions from the ISO standard two-microphone, impedance-tube method, and a theoretical model for the material. Data by-products exclusive to a Bayesian approach are analyzed to assess sensitivity of the method to nuisance parameters.
Bayesian Retrieval of Complete Posterior PDFs of Oceanic Rain Rate From Microwave Observations
NASA Technical Reports Server (NTRS)
Chiu, J. Christine; Petty, Grant W.
2005-01-01
This paper presents a new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measurements Mission (TRMM) Microwave Imager (TMI) over the ocean, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes Theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance our understanding of theoretical benefits of the Bayesian approach, we have conducted sensitivity analyses based on two synthetic datasets for which the true conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak, due to saturation effects. It is also suggested that the choice of the estimators and the prior information are both crucial to the retrieval. In addition, the performance of our Bayesian algorithm is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.
Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.
Yalch, Matthew M
2016-03-01
Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).
ERIC Educational Resources Information Center
Chung, Hwan; Anthony, James C.
2013-01-01
This article presents a multiple-group latent class-profile analysis (LCPA) by taking a Bayesian approach in which a Markov chain Monte Carlo simulation is employed to achieve more robust estimates for latent growth patterns. This article describes and addresses a label-switching problem that involves the LCPA likelihood function, which has…
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
NASA Astrophysics Data System (ADS)
Wang, Q. J.; Robertson, D. E.; Chiew, F. H. S.
2009-05-01
Seasonal forecasting of streamflows can be highly valuable for water resources management. In this paper, a Bayesian joint probability (BJP) modeling approach for seasonal forecasting of streamflows at multiple sites is presented. A Box-Cox transformed multivariate normal distribution is proposed to model the joint distribution of future streamflows and their predictors such as antecedent streamflows and El Niño-Southern Oscillation indices and other climate indicators. Bayesian inference of model parameters and uncertainties is implemented using Markov chain Monte Carlo sampling, leading to joint probabilistic forecasts of streamflows at multiple sites. The model provides a parametric structure for quantifying relationships between variables, including intersite correlations. The Box-Cox transformed multivariate normal distribution has considerable flexibility for modeling a wide range of predictors and predictands. The Bayesian inference formulated allows the use of data that contain nonconcurrent and missing records. The model flexibility and data-handling ability means that the BJP modeling approach is potentially of wide practical application. The paper also presents a number of statistical measures and graphical methods for verification of probabilistic forecasts of continuous variables. Results for streamflows at three river gauges in the Murrumbidgee River catchment in southeast Australia show that the BJP modeling approach has good forecast quality and that the fitted model is consistent with observed data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Zeng, Lingzao
Surrogate models are commonly used in Bayesian approaches such as Markov Chain Monte Carlo (MCMC) to avoid repetitive CPU-demanding model evaluations. However, the approximation error of a surrogate may lead to biased estimations of the posterior distribution. This bias can be corrected by constructing a very accurate surrogate or implementing MCMC in a two-stage manner. Since the two-stage MCMC requires extra original model evaluations, the computational cost is still high. If the information of measurement is incorporated, a locally accurate approximation of the original model can be adaptively constructed with low computational cost. Based on this idea, we propose amore » Gaussian process (GP) surrogate-based Bayesian experimental design and parameter estimation approach for groundwater contaminant source identification problems. A major advantage of the GP surrogate is that it provides a convenient estimation of the approximation error, which can be incorporated in the Bayesian formula to avoid over-confident estimation of the posterior distribution. The proposed approach is tested with a numerical case study. Without sacrificing the estimation accuracy, the new approach achieves about 200 times of speed-up compared to our previous work using two-stage MCMC.« less
Bayesian Estimation of Small Effects in Exercise and Sports Science.
Mengersen, Kerrie L; Drovandi, Christopher C; Robert, Christian P; Pyne, David B; Gore, Christopher J
2016-01-01
The aim of this paper is to provide a Bayesian formulation of the so-called magnitude-based inference approach to quantifying and interpreting effects, and in a case study example provide accurate probabilistic statements that correspond to the intended magnitude-based inferences. The model is described in the context of a published small-scale athlete study which employed a magnitude-based inference approach to compare the effect of two altitude training regimens (live high-train low (LHTL), and intermittent hypoxic exposure (IHE)) on running performance and blood measurements of elite triathletes. The posterior distributions, and corresponding point and interval estimates, for the parameters and associated effects and comparisons of interest, were estimated using Markov chain Monte Carlo simulations. The Bayesian analysis was shown to provide more direct probabilistic comparisons of treatments and able to identify small effects of interest. The approach avoided asymptotic assumptions and overcame issues such as multiple testing. Bayesian analysis of unscaled effects showed a probability of 0.96 that LHTL yields a substantially greater increase in hemoglobin mass than IHE, a 0.93 probability of a substantially greater improvement in running economy and a greater than 0.96 probability that both IHE and LHTL yield a substantially greater improvement in maximum blood lactate concentration compared to a Placebo. The conclusions are consistent with those obtained using a 'magnitude-based inference' approach that has been promoted in the field. The paper demonstrates that a fully Bayesian analysis is a simple and effective way of analysing small effects, providing a rich set of results that are straightforward to interpret in terms of probabilistic statements.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...
NASA Astrophysics Data System (ADS)
Dimas, George; Iakovidis, Dimitris K.; Karargyris, Alexandros; Ciuti, Gastone; Koulaouzidis, Anastasios
2017-09-01
Wireless capsule endoscopy is a non-invasive screening procedure of the gastrointestinal (GI) tract performed with an ingestible capsule endoscope (CE) of the size of a large vitamin pill. Such endoscopes are equipped with a usually low-frame-rate color camera which enables the visualization of the GI lumen and the detection of pathologies. The localization of the commercially available CEs is performed in the 3D abdominal space using radio-frequency (RF) triangulation from external sensor arrays, in combination with transit time estimation. State-of-the-art approaches, such as magnetic localization, which have been experimentally proved more accurate than the RF approach, are still at an early stage. Recently, we have demonstrated that CE localization is feasible using solely visual cues and geometric models. However, such approaches depend on camera parameters, many of which are unknown. In this paper the authors propose a novel non-parametric visual odometry (VO) approach to CE localization based on a feed-forward neural network architecture. The effectiveness of this approach in comparison to state-of-the-art geometric VO approaches is validated using a robotic-assisted in vitro experimental setup.
NASA Astrophysics Data System (ADS)
Hasyim, M.; Prastyo, D. D.
2018-03-01
Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cavanaugh, J.E.; McQuarrie, A.D.; Shumway, R.H.
Conventional methods for discriminating between earthquakes and explosions at regional distances have concentrated on extracting specific features such as amplitude and spectral ratios from the waveforms of the P and S phases. We consider here an optimum nonparametric classification procedure derived from the classical approach to discriminating between two Gaussian processes with unequal spectra. Two robust variations based on the minimum discrimination information statistic and Renyi's entropy are also considered. We compare the optimum classification procedure with various amplitude and spectral ratio discriminants and show that its performance is superior when applied to a small population of 8 land-based earthquakesmore » and 8 mining explosions recorded in Scandinavia. Several parametric characterizations of the notion of complexity based on modeling earthquakes and explosions as autoregressive or modulated autoregressive processes are also proposed and their performance compared with the nonparametric and feature extraction approaches.« less
NASA Astrophysics Data System (ADS)
McCallum, James L.; Engdahl, Nicholas B.; Ginn, Timothy R.; Cook, Peter. G.
2014-03-01
Residence time distributions (RTDs) have been used extensively for quantifying flow and transport in subsurface hydrology. In geochemical approaches, environmental tracer concentrations are used in conjunction with simple lumped parameter models (LPMs). Conversely, numerical simulation techniques require large amounts of parameterization and estimated RTDs are certainly limited by associated uncertainties. In this study, we apply a nonparametric deconvolution approach to estimate RTDs using environmental tracer concentrations. The model is based only on the assumption that flow is steady enough that the observed concentrations are well approximated by linear superposition of the input concentrations with the RTD; that is, the convolution integral holds. Even with large amounts of environmental tracer concentration data, the entire shape of an RTD remains highly nonunique. However, accurate estimates of mean ages and in some cases prediction of young portions of the RTD may be possible. The most useful type of data was found to be the use of a time series of tritium. This was due to the sharp variations in atmospheric concentrations and a short half-life. Conversely, the use of CFC compounds with smoothly varying atmospheric concentrations was more prone to nonuniqueness. This work highlights the benefits and limitations of using environmental tracer data to estimate whole RTDs with either LPMs or through numerical simulation. However, the ability of the nonparametric approach developed here to correct for mixing biases in mean ages appears promising.
A default Bayesian hypothesis test for mediation.
Nuijten, Michèle B; Wetzels, Ruud; Matzke, Dora; Dolan, Conor V; Wagenmakers, Eric-Jan
2015-03-01
In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301-322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys-Zellner-Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).
NASA Astrophysics Data System (ADS)
Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.
2015-12-01
Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Muñoz–Negrete, Francisco J.; Oblanca, Noelia; Rebolleda, Gema
2018-01-01
Purpose To study the structure-function relationship in glaucoma and healthy patients assessed with Spectralis OCT and Humphrey perimetry using new statistical approaches. Materials and Methods Eighty-five eyes were prospectively selected and divided into 2 groups: glaucoma (44) and healthy patients (41). Three different statistical approaches were carried out: (1) factor analysis of the threshold sensitivities (dB) (automated perimetry) and the macular thickness (μm) (Spectralis OCT), subsequently applying Pearson's correlation to the obtained regions, (2) nonparametric regression analysis relating the values in each pair of regions that showed significant correlation, and (3) nonparametric spatial regressions using three models designed for the purpose of this study. Results In the glaucoma group, a map that relates structural and functional damage was drawn. The strongest correlation with visual fields was observed in the peripheral nasal region of both superior and inferior hemigrids (r = 0.602 and r = 0.458, resp.). The estimated functions obtained with the nonparametric regressions provided the mean sensitivity that corresponds to each given macular thickness. These functions allowed for accurate characterization of the structure-function relationship. Conclusions Both maps and point-to-point functions obtained linking structure and function damage contribute to a better understanding of this relationship and may help in the future to improve glaucoma diagnosis. PMID:29850196
Krishnamurthy, Krish
2013-12-01
The intrinsic quantitative nature of NMR is increasingly exploited in areas ranging from complex mixture analysis (as in metabolomics and reaction monitoring) to quality assurance/control. Complex NMR spectra are more common than not, and therefore, extraction of quantitative information generally involves significant prior knowledge and/or operator interaction to characterize resonances of interest. Moreover, in most NMR-based metabolomic experiments, the signals from metabolites are normally present as a mixture of overlapping resonances, making quantification difficult. Time-domain Bayesian approaches have been reported to be better than conventional frequency-domain analysis at identifying subtle changes in signal amplitude. We discuss an approach that exploits Bayesian analysis to achieve a complete reduction to amplitude frequency table (CRAFT) in an automated and time-efficient fashion - thus converting the time-domain FID to a frequency-amplitude table. CRAFT uses a two-step approach to FID analysis. First, the FID is digitally filtered and downsampled to several sub FIDs, and secondly, these sub FIDs are then modeled as sums of decaying sinusoids using the Bayesian approach. CRAFT tables can be used for further data mining of quantitative information using fingerprint chemical shifts of compounds of interest and/or statistical analysis of modulation of chemical quantity in a biological study (metabolomics) or process study (reaction monitoring) or quality assurance/control. The basic principles behind this approach as well as results to evaluate the effectiveness of this approach in mixture analysis are presented. Copyright © 2013 John Wiley & Sons, Ltd.
A Bayesian network approach to the database search problem in criminal proceedings
2012-01-01
Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method’s graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication. PMID:22849390
Power calculation for comparing diagnostic accuracies in a multi-reader, multi-test design.
Kim, Eunhee; Zhang, Zheng; Wang, Youdan; Zeng, Donglin
2014-12-01
Receiver operating characteristic (ROC) analysis is widely used to evaluate the performance of diagnostic tests with continuous or ordinal responses. A popular study design for assessing the accuracy of diagnostic tests involves multiple readers interpreting multiple diagnostic test results, called the multi-reader, multi-test design. Although several different approaches to analyzing data from this design exist, few methods have discussed the sample size and power issues. In this article, we develop a power formula to compare the correlated areas under the ROC curves (AUC) in a multi-reader, multi-test design. We present a nonparametric approach to estimate and compare the correlated AUCs by extending DeLong et al.'s (1988, Biometrics 44, 837-845) approach. A power formula is derived based on the asymptotic distribution of the nonparametric AUCs. Simulation studies are conducted to demonstrate the performance of the proposed power formula and an example is provided to illustrate the proposed procedure. © 2014, The International Biometric Society.
Nakamura, Yoshihiro; Hasegawa, Osamu
2017-01-01
With the ongoing development and expansion of communication networks and sensors, massive amounts of data are continuously generated in real time from real environments. Beforehand, prediction of a distribution underlying such data is difficult; furthermore, the data include substantial amounts of noise. These factors make it difficult to estimate probability densities. To handle these issues and massive amounts of data, we propose a nonparametric density estimator that rapidly learns data online and has high robustness. Our approach is an extension of both kernel density estimation (KDE) and a self-organizing incremental neural network (SOINN); therefore, we call our approach KDESOINN. An SOINN provides a clustering method that learns about the given data as networks of prototype of data; more specifically, an SOINN can learn the distribution underlying the given data. Using this information, KDESOINN estimates the probability density function. The results of our experiments show that KDESOINN outperforms or achieves performance comparable to the current state-of-the-art approaches in terms of robustness, learning time, and accuracy.
An approach to quantifying the efficiency of a Bayesian filter
USDA-ARS?s Scientific Manuscript database
Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation applications require that simplifying assumptions be made about the prior and posterior state distributions...
Bayesian networks improve causal environmental assessments for evidence-based policy
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the p...
A Bayesian account of quantum histories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marlow, Thomas
2006-05-15
We investigate whether quantum history theories can be consistent with Bayesian reasoning and whether such an analysis helps clarify the interpretation of such theories. First, we summarise and extend recent work categorising two different approaches to formalising multi-time measurements in quantum theory. The standard approach consists of describing an ordered series of measurements in terms of history propositions with non-additive 'probabilities.' The non-standard approach consists of defining multi-time measurements to consist of sets of exclusive and exhaustive history propositions and recovering the single-time exclusivity of results when discussing single-time history propositions. We analyse whether such history propositions can be consistentmore » with Bayes' rule. We show that certain class of histories are given a natural Bayesian interpretation, namely, the linearly positive histories originally introduced by Goldstein and Page. Thus, we argue that this gives a certain amount of interpretational clarity to the non-standard approach. We also attempt a justification of our analysis using Cox's axioms of probability theory.« less
BELM: Bayesian extreme learning machine.
Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J
2011-03-01
The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.
A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale
Pérez Sánchez, Carlos Javier
2014-01-01
Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002
Model selection and assessment for multi-species occupancy models
Broms, Kristin M.; Hooten, Mevin B.; Fitzpatrick, Ryan M.
2016-01-01
While multi-species occupancy models (MSOMs) are emerging as a popular method for analyzing biodiversity data, formal checking and validation approaches for this class of models have lagged behind. Concurrent with the rise in application of MSOMs among ecologists, a quiet regime shift is occurring in Bayesian statistics where predictive model comparison approaches are experiencing a resurgence. Unlike single-species occupancy models that use integrated likelihoods, MSOMs are usually couched in a Bayesian framework and contain multiple levels. Standard model checking and selection methods are often unreliable in this setting and there is only limited guidance in the ecological literature for this class of models. We examined several different contemporary Bayesian hierarchical approaches for checking and validating MSOMs and applied these methods to a freshwater aquatic study system in Colorado, USA, to better understand the diversity and distributions of plains fishes. Our findings indicated distinct differences among model selection approaches, with cross-validation techniques performing the best in terms of prediction.
Dommert, M; Reginatto, M; Zboril, M; Fiedler, F; Helmbrecht, S; Enghardt, W; Lutz, B
2017-11-28
Bonner sphere measurements are typically analyzed using unfolding codes. It is well known that it is difficult to get reliable estimates of uncertainties for standard unfolding procedures. An alternative approach is to analyze the data using Bayesian parameter estimation. This method provides reliable estimates of the uncertainties of neutron spectra leading to rigorous estimates of uncertainties of the dose. We extend previous Bayesian approaches and apply the method to stray neutrons in proton therapy environments by introducing a new parameterized model which describes the main features of the expected neutron spectra. The parameterization is based on information that is available from measurements and detailed Monte Carlo simulations. The validity of this approach has been validated with results of an experiment using Bonner spheres carried out at the experimental hall of the OncoRay proton therapy facility in Dresden. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The need for speed: escape velocity and dynamical mass measurements of the Andromeda galaxy
NASA Astrophysics Data System (ADS)
Kafle, Prajwal R.; Sharma, Sanjib; Lewis, Geraint F.; Robotham, Aaron S. G.; Driver, Simon P.
2018-04-01
Our nearest large cosmological neighbour, the Andromeda galaxy (M31), is a dynamical system, and an accurate measurement of its total mass is central to our understanding of its assembly history, the life-cycles of its satellite galaxies, and its role in shaping the Local Group environment. Here, we apply a novel approach to determine the dynamical mass of M31 using high-velocity Planetary Nebulae, establishing a hierarchical Bayesian model united with a scheme to capture potential outliers and marginalize over tracers unknown distances. With this, we derive the escape velocity run of M31 as a function of galactocentric distance, with both parametric and non-parametric approaches. We determine the escape velocity of M31 to be 470 ± 40 km s-1 at a galactocentric distance of 15 kpc, and also, derive the total potential of M31, estimating the virial mass and radius of the galaxy to be 0.8 ± 0.1 × 1012 M⊙ and 240 ± 10 kpc, respectively. Our M31 mass is on the low side of the measured range, this supports the lower expected mass of the M31-Milky Way system from the timing and momentum arguments, satisfying the H I constraint on circular velocity between 10 ≲ R/ kpc < 35, and agreeing with the stellar mass Tully-Fisher relation. To place these results in a broader context, we compare them to the key predictions of the ΛCDM cosmological paradigm, including the stellar-mass-halo-mass and the dark matter halo concentration-virial mass correlation, and finding it to be an outlier to this relation.
ERIC Educational Resources Information Center
Cattaneo, Matias D.; Titiunik, Rocío; Vazquez-Bare, Gonzalo
2017-01-01
The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The most common inference approaches in RD designs employ "flexible" parametric and nonparametric local polynomial methods, which rely on extrapolation and large-sample approximations of conditional expectations…
A Nonparametric Approach to Estimate Classification Accuracy and Consistency
ERIC Educational Resources Information Center
Lathrop, Quinn N.; Cheng, Ying
2014-01-01
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Nonparametric evaluation of birth cohort trends in disease rates.
Tarone, R E; Chu, K C
2000-01-01
Although interpretation of age-period-cohort analyses is complicated by the non-identifiability of maximum likelihood estimates, changes in the slope of the birth-cohort effect curve are identifiable and have potential aetiologic significance. A nonparametric test for a change in the slope of the birth-cohort trend has been developed. The test is a generalisation of the sign test and is based on permutational distributions. A method for identifying interactions between age and calendar-period effects is also presented. The nonparametric method is shown to be powerful in detecting changes in the slope of the birth-cohort trend, although its power can be reduced considerably by calendar-period patterns of risk. The method identifies a previously unidentified decrease in the birth-cohort risk of lung-cancer mortality from 1912 to 1919, which appears to reflect a reduction in the initiation of smoking by young men at the beginning of the Great Depression (1930s). The method also detects an interaction between age and calendar period in leukemia mortality rates, reflecting the better response of children to chemotherapy. The proposed nonparametric method provides a data analytic approach, which is a useful adjunct to log-linear Poisson analysis of age-period-cohort models, either in the initial model building stage, or in the final interpretation stage.
Testing adaptive toolbox models: a Bayesian hierarchical approach.
Scheibehenne, Benjamin; Rieskamp, Jörg; Wagenmakers, Eric-Jan
2013-01-01
Many theories of human cognition postulate that people are equipped with a repertoire of strategies to solve the tasks they face. This theoretical framework of a cognitive toolbox provides a plausible account of intra- and interindividual differences in human behavior. Unfortunately, it is often unclear how to rigorously test the toolbox framework. How can a toolbox model be quantitatively specified? How can the number of toolbox strategies be limited to prevent uncontrolled strategy sprawl? How can a toolbox model be formally tested against alternative theories? The authors show how these challenges can be met by using Bayesian inference techniques. By means of parameter recovery simulations and the analysis of empirical data across a variety of domains (i.e., judgment and decision making, children's cognitive development, function learning, and perceptual categorization), the authors illustrate how Bayesian inference techniques allow toolbox models to be quantitatively specified, strategy sprawl to be contained, and toolbox models to be rigorously tested against competing theories. The authors demonstrate that their approach applies at the individual level but can also be generalized to the group level with hierarchical Bayesian procedures. The suggested Bayesian inference techniques represent a theoretical and methodological advancement for toolbox theories of cognition and behavior.
NASA Technical Reports Server (NTRS)
Mengshoel, Ole Jakob; Poll, Scott; Kurtoglu, Tolga
2009-01-01
In this paper, we investigate the use of Bayesian networks to construct large-scale diagnostic systems. In particular, we consider the development of large-scale Bayesian networks by composition. This compositional approach reflects how (often redundant) subsystems are architected to form systems such as electrical power systems. We develop high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems. The largest among these 24 Bayesian networks contains over 1,000 random variables. Another BN represents the real-world electrical power system ADAPT, which is representative of electrical power systems deployed in aerospace vehicles. In addition to demonstrating the scalability of the compositional approach, we briefly report on experimental results from the diagnostic competition DXC, where the ProADAPT team, using techniques discussed here, obtained the highest scores in both Tier 1 (among 9 international competitors) and Tier 2 (among 6 international competitors) of the industrial track. While we consider diagnosis of power systems specifically, we believe this work is relevant to other system health management problems, in particular in dependable systems such as aircraft and spacecraft. (See CASI ID 20100021910 for supplemental data disk.)
BUMPER: the Bayesian User-friendly Model for Palaeo-Environmental Reconstruction
NASA Astrophysics Data System (ADS)
Holden, Phil; Birks, John; Brooks, Steve; Bush, Mark; Hwang, Grace; Matthews-Bird, Frazer; Valencia, Bryan; van Woesik, Robert
2017-04-01
We describe the Bayesian User-friendly Model for Palaeo-Environmental Reconstruction (BUMPER), a Bayesian transfer function for inferring past climate and other environmental variables from microfossil assemblages. The principal motivation for a Bayesian approach is that the palaeoenvironment is treated probabilistically, and can be updated as additional data become available. Bayesian approaches therefore provide a reconstruction-specific quantification of the uncertainty in the data and in the model parameters. BUMPER is fully self-calibrating, straightforward to apply, and computationally fast, requiring 2 seconds to build a 100-taxon model from a 100-site training-set on a standard personal computer. We apply the model's probabilistic framework to generate thousands of artificial training-sets under ideal assumptions. We then use these to demonstrate both the general applicability of the model and the sensitivity of reconstructions to the characteristics of the training-set, considering assemblage richness, taxon tolerances, and the number of training sites. We demonstrate general applicability to real data, considering three different organism types (chironomids, diatoms, pollen) and different reconstructed variables. In all of these applications an identically configured model is used, the only change being the input files that provide the training-set environment and taxon-count data.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
A Bayesian Approach for Summarizing and Modeling Time-Series Exposure Data with Left Censoring.
Houseman, E Andres; Virji, M Abbas
2017-08-01
Direct reading instruments are valuable tools for measuring exposure as they provide real-time measurements for rapid decision making. However, their use is limited to general survey applications in part due to issues related to their performance. Moreover, statistical analysis of real-time data is complicated by autocorrelation among successive measurements, non-stationary time series, and the presence of left-censoring due to limit-of-detection (LOD). A Bayesian framework is proposed that accounts for non-stationary autocorrelation and LOD issues in exposure time-series data in order to model workplace factors that affect exposure and estimate summary statistics for tasks or other covariates of interest. A spline-based approach is used to model non-stationary autocorrelation with relatively few assumptions about autocorrelation structure. Left-censoring is addressed by integrating over the left tail of the distribution. The model is fit using Markov-Chain Monte Carlo within a Bayesian paradigm. The method can flexibly account for hierarchical relationships, random effects and fixed effects of covariates. The method is implemented using the rjags package in R, and is illustrated by applying it to real-time exposure data. Estimates for task means and covariates from the Bayesian model are compared to those from conventional frequentist models including linear regression, mixed-effects, and time-series models with different autocorrelation structures. Simulations studies are also conducted to evaluate method performance. Simulation studies with percent of measurements below the LOD ranging from 0 to 50% showed lowest root mean squared errors for task means and the least biased standard deviations from the Bayesian model compared to the frequentist models across all levels of LOD. In the application, task means from the Bayesian model were similar to means from the frequentist models, while the standard deviations were different. Parameter estimates for covariates were significant in some frequentist models, but in the Bayesian model their credible intervals contained zero; such discrepancies were observed in multiple datasets. Variance components from the Bayesian model reflected substantial autocorrelation, consistent with the frequentist models, except for the auto-regressive moving average model. Plots of means from the Bayesian model showed good fit to the observed data. The proposed Bayesian model provides an approach for modeling non-stationary autocorrelation in a hierarchical modeling framework to estimate task means, standard deviations, quantiles, and parameter estimates for covariates that are less biased and have better performance characteristics than some of the contemporary methods. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2017.
A Bayesian sequential processor approach to spectroscopic portal system decisions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sale, K; Candy, J; Breitfeller, E
The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waitingmore » for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.« less
Bayesian Computation for Log-Gaussian Cox Processes: A Comparative Analysis of Methods
Teng, Ming; Nathoo, Farouk S.; Johnson, Timothy D.
2017-01-01
The Log-Gaussian Cox Process is a commonly used model for the analysis of spatial point pattern data. Fitting this model is difficult because of its doubly-stochastic property, i.e., it is an hierarchical combination of a Poisson process at the first level and a Gaussian Process at the second level. Various methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulation studies as well as through two applications, the first examining ecological data and the second involving neuroimaging data. PMID:29200537
Modular analysis of the probabilistic genetic interaction network.
Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting
2011-03-15
Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.
Bayesian Analysis of the Power Spectrum of the Cosmic Microwave Background
NASA Technical Reports Server (NTRS)
Jewell, Jeffrey B.; Eriksen, H. K.; O'Dwyer, I. J.; Wandelt, B. D.
2005-01-01
There is a wealth of cosmological information encoded in the spatial power spectrum of temperature anisotropies of the cosmic microwave background. The sky, when viewed in the microwave, is very uniform, with a nearly perfect blackbody spectrum at 2.7 degrees. Very small amplitude brightness fluctuations (to one part in a million!!) trace small density perturbations in the early universe (roughly 300,000 years after the Big Bang), which later grow through gravitational instability to the large-scale structure seen in redshift surveys... In this talk, I will discuss a Bayesian formulation of this problem; discuss a Gibbs sampling approach to numerically sampling from the Bayesian posterior, and the application of this approach to the first-year data from the Wilkinson Microwave Anisotropy Probe. I will also comment on recent algorithmic developments for this approach to be tractable for the even more massive data set to be returned from the Planck satellite.
Lu, Tao
2016-01-01
The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.
Orhan, U.; Erdogmus, D.; Roark, B.; Oken, B.; Purwar, S.; Hild, K. E.; Fowler, A.; Fried-Oken, M.
2013-01-01
RSVP Keyboard™ is an electroencephalography (EEG) based brain computer interface (BCI) typing system, designed as an assistive technology for the communication needs of people with locked-in syndrome (LIS). It relies on rapid serial visual presentation (RSVP) and does not require precise eye gaze control. Existing BCI typing systems which uses event related potentials (ERP) in EEG suffer from low accuracy due to low signal-to-noise ratio. Henceforth, RSVP Keyboard™ utilizes a context based decision making via incorporating a language model, to improve the accuracy of letter decisions. To further improve the contributions of the language model, we propose recursive Bayesian estimation, which relies on non-committing string decisions, and conduct an offline analysis, which compares it with the existing naïve Bayesian fusion approach. The results indicate the superiority of the recursive Bayesian fusion and in the next generation of RSVP Keyboard™ we plan to incorporate this new approach. PMID:23366432
Exponential series approaches for nonparametric graphical models
NASA Astrophysics Data System (ADS)
Janofsky, Eric
Markov Random Fields (MRFs) or undirected graphical models are parsimonious representations of joint probability distributions. This thesis studies high-dimensional, continuous-valued pairwise Markov Random Fields. We are particularly interested in approximating pairwise densities whose logarithm belongs to a Sobolev space. For this problem we propose the method of exponential series which approximates the log density by a finite-dimensional exponential family with the number of sufficient statistics increasing with the sample size. We consider two approaches to estimating these models. The first is regularized maximum likelihood. This involves optimizing the sum of the log-likelihood of the data and a sparsity-inducing regularizer. We then propose a variational approximation to the likelihood based on tree-reweighted, nonparametric message passing. This approximation allows for upper bounds on risk estimates, leverages parallelization and is scalable to densities on hundreds of nodes. We show how the regularized variational MLE may be estimated using a proximal gradient algorithm. We then consider estimation using regularized score matching. This approach uses an alternative scoring rule to the log-likelihood, which obviates the need to compute the normalizing constant of the distribution. For general continuous-valued exponential families, we provide parameter and edge consistency results. As a special case we detail a new approach to sparse precision matrix estimation which has statistical performance competitive with the graphical lasso and computational performance competitive with the state-of-the-art glasso algorithm. We then describe results for model selection in the nonparametric pairwise model using exponential series. The regularized score matching problem is shown to be a convex program; we provide scalable algorithms based on consensus alternating direction method of multipliers (ADMM) and coordinate-wise descent. We use simulations to compare our method to others in the literature as well as the aforementioned TRW estimator.
Updating estimates of low streamflow statistics to account for possible trends
NASA Astrophysics Data System (ADS)
Blum, A. G.; Archfield, S. A.; Hirsch, R. M.; Vogel, R. M.; Kiang, J. E.; Dudley, R. W.
2017-12-01
Given evidence of both increasing and decreasing trends in low flows in many streams, methods are needed to update estimators of low flow statistics used in water resources management. One such metric is the 10-year annual low-flow statistic (7Q10) calculated as the annual minimum seven-day streamflow which is exceeded in nine out of ten years on average. Historical streamflow records may not be representative of current conditions at a site if environmental conditions are changing. We present a new approach to frequency estimation under nonstationary conditions that applies a stationary nonparametric quantile estimator to a subset of the annual minimum flow record. Monte Carlo simulation experiments were used to evaluate this approach across a range of trend and no trend scenarios. Relative to the standard practice of using the entire available streamflow record, use of a nonparametric quantile estimator combined with selection of the most recent 30 or 50 years for 7Q10 estimation were found to improve accuracy and reduce bias. Benefits of data subset selection approaches were greater for higher magnitude trends annual minimum flow records with lower coefficients of variation. A nonparametric trend test approach for subset selection did not significantly improve upon always selecting the last 30 years of record. At 174 stream gages in the Chesapeake Bay region, 7Q10 estimators based on the most recent 30 years of flow record were compared to estimators based on the entire period of record. Given the availability of long records of low streamflow, using only a subset of the flow record ( 30 years) can be used to update 7Q10 estimators to better reflect current streamflow conditions.
Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M
2006-04-21
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.
Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben
2017-06-06
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES
He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.
2017-01-01
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
NASA Astrophysics Data System (ADS)
Pumpe, Daniel; Gabler, Michael; Steininger, Theo; Enßlin, Torsten A.
2018-02-01
Quasi-periodic oscillations (QPOs) discovered in the decaying tails of giant flares of magnetars are believed to be torsional oscillations of neutron stars. These QPOs have a high potential to constrain properties of high-density matter. In search for quasi-periodic signals, we study the light curves of the giant flares of SGR 1806-20 and SGR 1900+14, with a non-parametric Bayesian signal inference method called D3PO. The D3PO algorithm models the raw photon counts as a continuous flux and takes the Poissonian shot noise as well as all instrument effects into account. It reconstructs the logarithmic flux and its power spectrum from the data. Using this fully noise-aware method, we do not confirm previously reported frequency lines at ν ≳ 17 Hz because they fall into the noise-dominated regime. However, we find two new potential candidates for oscillations at 9.2 Hz (SGR 1806-20) and 7.7 Hz (SGR 1900+14). If these are real and the fundamental magneto-elastic oscillations of the magnetars, current theoretical models would favour relatively weak magnetic fields B̅ 6× 1013-3 × 1014 G (SGR 1806-20) and a relatively low shear velocity inside the crust compared to previous findings. Data are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/610/A61
Bayesian analyses of time-interval data for environmental radiation monitoring.
Luo, Peng; Sharp, Julia L; DeVol, Timothy A
2013-01-01
Time-interval (time difference between two consecutive pulses) analysis based on the principles of Bayesian inference was investigated for online radiation monitoring. Using experimental and simulated data, Bayesian analysis of time-interval data [Bayesian (ti)] was compared with Bayesian and a conventional frequentist analysis of counts in a fixed count time [Bayesian (cnt) and single interval test (SIT), respectively]. The performances of the three methods were compared in terms of average run length (ARL) and detection probability for several simulated detection scenarios. Experimental data were acquired with a DGF-4C system in list mode. Simulated data were obtained using Monte Carlo techniques to obtain a random sampling of the Poisson distribution. All statistical algorithms were developed using the R Project for statistical computing. Bayesian analysis of time-interval information provided a similar detection probability as Bayesian analysis of count information, but the authors were able to make a decision with fewer pulses at relatively higher radiation levels. In addition, for the cases with very short presence of the source (< count time), time-interval information is more sensitive to detect a change than count information since the source data is averaged by the background data over the entire count time. The relationships of the source time, change points, and modifications to the Bayesian approach for increasing detection probability are presented.
Evaluating Variability and Uncertainty of Geological Strength Index at a Specific Site
NASA Astrophysics Data System (ADS)
Wang, Yu; Aladejare, Adeyemi Emman
2016-09-01
Geological Strength Index (GSI) is an important parameter for estimating rock mass properties. GSI can be estimated from quantitative GSI chart, as an alternative to the direct observational method which requires vast geological experience of rock. GSI chart was developed from past observations and engineering experience, with either empiricism or some theoretical simplifications. The GSI chart thereby contains model uncertainty which arises from its development. The presence of such model uncertainty affects the GSI estimated from GSI chart at a specific site; it is, therefore, imperative to quantify and incorporate the model uncertainty during GSI estimation from the GSI chart. A major challenge for quantifying the GSI chart model uncertainty is a lack of the original datasets that have been used to develop the GSI chart, since the GSI chart was developed from past experience without referring to specific datasets. This paper intends to tackle this problem by developing a Bayesian approach for quantifying the model uncertainty in GSI chart when using it to estimate GSI at a specific site. The model uncertainty in the GSI chart and the inherent spatial variability in GSI are modeled explicitly in the Bayesian approach. The Bayesian approach generates equivalent samples of GSI from the integrated knowledge of GSI chart, prior knowledge and observation data available from site investigation. Equations are derived for the Bayesian approach, and the proposed approach is illustrated using data from a drill and blast tunnel project. The proposed approach effectively tackles the problem of how to quantify the model uncertainty that arises from using GSI chart for characterization of site-specific GSI in a transparent manner.
Bayesian approach to analyzing holograms of colloidal particles.
Dimiduk, Thomas G; Manoharan, Vinothan N
2016-10-17
We demonstrate a Bayesian approach to tracking and characterizing colloidal particles from in-line digital holograms. We model the formation of the hologram using Lorenz-Mie theory. We then use a tempered Markov-chain Monte Carlo method to sample the posterior probability distributions of the model parameters: particle position, size, and refractive index. Compared to least-squares fitting, our approach allows us to more easily incorporate prior information about the parameters and to obtain more accurate uncertainties, which are critical for both particle tracking and characterization experiments. Our approach also eliminates the need to supply accurate initial guesses for the parameters, so it requires little tuning.
Heisey, Dennis M.; Jennelle, Christopher S.; Russell, Robin E.; Walsh, Daniel P.
2014-01-01
There are numerous situations in which it is important to determine whether a particular disease of interest is present in a free-ranging wildlife population. However adequate disease surveillance can be labor-intensive and expensive and thus there is substantial motivation to conduct it as efficiently as possible. Surveillance is often based on the assumption of a simple random sample, but this can almost always be improved upon if there is auxiliary information available about disease risk factors. We present a Bayesian approach to disease surveillance when auxiliary risk information is available which will usually allow for substantial improvements over simple random sampling. Others have employed risk weights in surveillance, but this can result in overly optimistic statements regarding freedom from disease due to not accounting for the uncertainty in the auxiliary information; our approach remedies this. We compare our Bayesian approach to a published example of risk weights applied to chronic wasting disease in deer in Colorado, and we also present calculations to examine when uncertainty in the auxiliary information has a serious impact on the risk weights approach. Our approach allows “apples-to-apples” comparisons of surveillance efficiencies between units where heterogeneous samples were collected
A Bayesian Approach to Interactive Retrieval
ERIC Educational Resources Information Center
Tague, Jean M.
1973-01-01
A probabilistic model for interactive retrieval is presented. Bayesian statistical decision theory principles are applied: use of prior and sample information about the relationship of document descriptions to query relevance; maximization of expected value of a utility function, to the problem of optimally restructuring search strategies in an…
A BAYESIAN STATISTICAL APPROACH FOR THE EVALUATION OF CMAQ
Bayesian statistical methods are used to evaluate Community Multiscale Air Quality (CMAQ) model simulations of sulfate aerosol over a section of the eastern US for 4-week periods in summer and winter 2001. The observed data come from two U.S. Environmental Protection Agency data ...
Dynamic Bayesian Networks for Student Modeling
ERIC Educational Resources Information Center
Kaser, Tanja; Klingler, Severin; Schwing, Alexander G.; Gross, Markus
2017-01-01
Intelligent tutoring systems adapt the curriculum to the needs of the individual student. Therefore, an accurate representation and prediction of student knowledge is essential. Bayesian Knowledge Tracing (BKT) is a popular approach for student modeling. The structure of BKT models, however, makes it impossible to represent the hierarchy and…
A Bayesian network approach for causal inferences in pesticide risk assessment and management
Pesticide risk assessment and management must balance societal benefits and ecosystem protection, based on quantified risks and the strength of the causal linkages between uses of the pesticide and socioeconomic and ecological endpoints of concern. A Bayesian network (BN) is a gr...
Artificial Intelligence (AI) Center of Excellence at the University of Pennsylvania
1995-07-01
that controls impact forces. Robust Location Estimation for MLR and Non-MLR Distributions (Dissertation Proposal) Gerda L. Kamberova MS-CIS-92-28...Bayesian Approach To Computer Vision Problems Gerda L. Kamberova MS-CIS-92-29 GRASP LAB 310 The object of our study is the Bayesian approach in...Estimation for MLR and Non-MLR Distributions (Dissertation) Gerda L. Kamberova MS-CIS-92-93 GRASP LAB 340 We study the problem of estimating an unknown
Garrard, Lili; Price, Larry R.; Bott, Marjorie J.; Gajewski, Byron J.
2016-01-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts’ bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts’ information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts’ content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development. PMID:27667878
Garrard, Lili; Price, Larry R; Bott, Marjorie J; Gajewski, Byron J
2016-10-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts' bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts' information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts' content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development.
Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui
2015-07-01
High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Assessing T cell clonal size distribution: a non-parametric approach.
Bolkhovskaya, Olesya V; Zorin, Daniil Yu; Ivanchenko, Mikhail V
2014-01-01
Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.
Debt and growth: A non-parametric approach
NASA Astrophysics Data System (ADS)
Brida, Juan Gabriel; Gómez, David Matesanz; Seijas, Maria Nela
2017-11-01
In this study, we explore the dynamic relationship between public debt and economic growth by using a non-parametric approach based on data symbolization and clustering methods. The study uses annual data of general government consolidated gross debt-to-GDP ratio and gross domestic product for sixteen countries between 1977 and 2015. Using symbolic sequences, we introduce a notion of distance between the dynamical paths of different countries. Then, a Minimal Spanning Tree and a Hierarchical Tree are constructed from time series to help detecting the existence of groups of countries sharing similar economic performance. The main finding of the study appears for the period 2008-2016 when several countries surpassed the 90% debt-to-GDP threshold. During this period, three groups (clubs) of countries are obtained: high, mid and low indebted countries, suggesting that the employed debt-to-GDP threshold drives economic dynamics for the selected countries.
Stochastic Modeling of the Environmental Impacts of the Mingtang Tunneling Project
NASA Astrophysics Data System (ADS)
Li, Xiaojun; Li, Yandong; Chang, Ching-Fu; Chen, Ziyang; Tan, Benjamin Zhi Wen; Sege, Jon; Wang, Changhong; Rubin, Yoram
2017-04-01
This paper investigates the environmental impacts of a major tunneling project in China. Of particular interest is the drawdown of the water table, due to its potential impacts on ecosystem health and on agricultural activity. Due to scarcity of data, the study pursues a Bayesian stochastic approach, which is built around a numerical model. We adopted the Bayesian approach with the goal of deriving the posterior distributions of the dependent variables conditional on local data. The choice of the Bayesian approach for this study is somewhat non-trivial because of the scarcity of in-situ measurements. The thought guiding this selection is that prior distributions for the model input variables are valuable tools even if that all inputs are available, the Bayesian approach could provide a good starting point for further updates as and if additional data becomes available. To construct effective priors, a systematic approach was developed and implemented for constructing informative priors based on other, well-documented sites which bear geological and hydrological similarity to the target site (the Mingtang tunneling project). The approach is built around two classes of similarity criteria: a physically-based set of criteria and an additional set covering epistemic criteria. The prior construction strategy was implemented for the hydraulic conductivity of various types of rocks at the site (Granite and Gneiss) and for modeling the geometry and conductivity of the fault zones. Additional elements of our strategy include (1) modeling the water table through bounding surfaces representing upper and lower limits, and (2) modeling the effective conductivity as a random variable (varying between realizations, not in space). The approach was tested successfully against its ability to predict the tunnel infiltration fluxes and against observations of drying soils.
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
2017-01-01
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Shedden-Mora, Meike C.; Löwe, Bernd
2017-01-01
Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Inferring the mesoscale structure of layered, edge-valued, and time-varying networks
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
2015-10-01
Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.
Model uncertainty estimation and risk assessment is essential to environmental management and informed decision making on pollution mitigation strategies. In this study, we apply a probabilistic methodology, which combines Bayesian Monte Carlo simulation and Maximum Likelihood e...
A Bayesian Missing Data Framework for Generalized Multiple Outcome Mixed Treatment Comparisons
ERIC Educational Resources Information Center
Hong, Hwanhee; Chu, Haitao; Zhang, Jing; Carlin, Bradley P.
2016-01-01
Bayesian statistical approaches to mixed treatment comparisons (MTCs) are becoming more popular because of their flexibility and interpretability. Many randomized clinical trials report multiple outcomes with possible inherent correlations. Moreover, MTC data are typically sparse (although richer than standard meta-analysis, comparing only two…
Monte Carlo Algorithms for a Bayesian Analysis of the Cosmic Microwave Background
NASA Technical Reports Server (NTRS)
Jewell, Jeffrey B.; Eriksen, H. K.; ODwyer, I. J.; Wandelt, B. D.; Gorski, K.; Knox, L.; Chu, M.
2006-01-01
A viewgraph presentation on the review of Bayesian approach to Cosmic Microwave Background (CMB) analysis, numerical implementation with Gibbs sampling, a summary of application to WMAP I and work in progress with generalizations to polarization, foregrounds, asymmetric beams, and 1/f noise is given.
A Bayesian Approach to Determination of F, D, and Z Values Used in Steam Sterilization Validation.
Faya, Paul; Stamey, James D; Seaman, John W
2017-01-01
For manufacturers of sterile drug products, steam sterilization is a common method used to provide assurance of the sterility of manufacturing equipment and products. The validation of sterilization processes is a regulatory requirement and relies upon the estimation of key resistance parameters of microorganisms. Traditional methods have relied upon point estimates for the resistance parameters. In this paper, we propose a Bayesian method for estimation of the well-known D T , z , and F o values that are used in the development and validation of sterilization processes. A Bayesian approach allows the uncertainty about these values to be modeled using probability distributions, thereby providing a fully risk-based approach to measures of sterility assurance. An example is given using the survivor curve and fraction negative methods for estimation of resistance parameters, and we present a means by which a probabilistic conclusion can be made regarding the ability of a process to achieve a specified sterility criterion. LAY ABSTRACT: For manufacturers of sterile drug products, steam sterilization is a common method used to provide assurance of the sterility of manufacturing equipment and products. The validation of sterilization processes is a regulatory requirement and relies upon the estimation of key resistance parameters of microorganisms. Traditional methods have relied upon point estimates for the resistance parameters. In this paper, we propose a Bayesian method for estimation of the critical process parameters that are evaluated in the development and validation of sterilization processes. A Bayesian approach allows the uncertainty about these parameters to be modeled using probability distributions, thereby providing a fully risk-based approach to measures of sterility assurance. An example is given using the survivor curve and fraction negative methods for estimation of resistance parameters, and we present a means by which a probabilistic conclusion can be made regarding the ability of a process to achieve a specified sterility criterion. © PDA, Inc. 2017.
Adkison, Milo D.; Peterman, R.M.
1996-01-01
Bayesian methods have been proposed to estimate optimal escapement goals, using both knowledge about physical determinants of salmon productivity and stock-recruitment data. The Bayesian approach has several advantages over many traditional methods for estimating stock productivity: it allows integration of information from diverse sources and provides a framework for decision-making that takes into account uncertainty reflected in the data. However, results can be critically dependent on details of implementation of this approach. For instance, unintended and unwarranted confidence about stock-recruitment relationships can arise if the range of relationships examined is too narrow, if too few discrete alternatives are considered, or if data are contradictory. This unfounded confidence can result in a suboptimal choice of a spawning escapement goal.
Software Health Management with Bayesian Networks
NASA Technical Reports Server (NTRS)
Mengshoel, Ole; Schumann, JOhann
2011-01-01
Most modern aircraft as well as other complex machinery is equipped with diagnostics systems for its major subsystems. During operation, sensors provide important information about the subsystem (e.g., the engine) and that information is used to detect and diagnose faults. Most of these systems focus on the monitoring of a mechanical, hydraulic, or electromechanical subsystem of the vehicle or machinery. Only recently, health management systems that monitor software have been developed. In this paper, we will discuss our approach of using Bayesian networks for Software Health Management (SWHM). We will discuss SWHM requirements, which make advanced reasoning capabilities for the detection and diagnosis important. Then we will present our approach to using Bayesian networks for the construction of health models that dynamically monitor a software system and is capable of detecting and diagnosing faults.
Using Bayesian networks to support decision-focused information retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lehner, P.; Elsaesser, C.; Seligman, L.
This paper has described an approach to controlling the process of pulling data/information from distributed data bases in a way that is specific to a persons specific decision making context. Our prototype implementation of this approach uses a knowledge-based planner to generate a plan, an automatically constructed Bayesian network to evaluate the plan, specialized processing of the network to derive key information items that would substantially impact the evaluation of the plan (e.g., determine that replanning is needed), automated construction of Standing Requests for Information (SRIs) which are automated functions that monitor changes and trends in distributed data base thatmore » are relevant to the key information items. This emphasis of this paper is on how Bayesian networks are used.« less
Şenel, Talat; Cengiz, Mehmet Ali
2016-01-01
In today's world, Public expenditures on health are one of the most important issues for governments. These increased expenditures are putting pressure on public budgets. Therefore, health policy makers have focused on the performance of their health systems and many countries have introduced reforms to improve the performance of their health systems. This study investigates the most important determinants of healthcare efficiency for OECD countries using second stage approach for Bayesian Stochastic Frontier Analysis (BSFA). There are two steps in this study. First we measure 29 OECD countries' healthcare efficiency by BSFA using the data from the OECD Health Database. At second stage, we expose the multiple relationships between the healthcare efficiency and characteristics of healthcare systems across OECD countries using Bayesian beta regression.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Analyzing degradation data with a random effects spline regression model
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
2017-03-17
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
Analyzing degradation data with a random effects spline regression model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
How Recent History Affects Perception: The Normative Approach and Its Heuristic Approximation
Raviv, Ofri; Ahissar, Merav; Loewenstein, Yonatan
2012-01-01
There is accumulating evidence that prior knowledge about expectations plays an important role in perception. The Bayesian framework is the standard computational approach to explain how prior knowledge about the distribution of expected stimuli is incorporated with noisy observations in order to improve performance. However, it is unclear what information about the prior distribution is acquired by the perceptual system over short periods of time and how this information is utilized in the process of perceptual decision making. Here we address this question using a simple two-tone discrimination task. We find that the “contraction bias”, in which small magnitudes are overestimated and large magnitudes are underestimated, dominates the pattern of responses of human participants. This contraction bias is consistent with the Bayesian hypothesis in which the true prior information is available to the decision-maker. However, a trial-by-trial analysis of the pattern of responses reveals that the contribution of most recent trials to performance is overweighted compared with the predictions of a standard Bayesian model. Moreover, we study participants' performance in a-typical distributions of stimuli and demonstrate substantial deviations from the ideal Bayesian detector, suggesting that the brain utilizes a heuristic approximation of the Bayesian inference. We propose a biologically plausible model, in which decision in the two-tone discrimination task is based on a comparison between the second tone and an exponentially-decaying average of the first tone and past tones. We show that this model accounts for both the contraction bias and the deviations from the ideal Bayesian detector hypothesis. These findings demonstrate the power of Bayesian-like heuristics in the brain, as well as their limitations in their failure to fully adapt to novel environments. PMID:23133343
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
NASA Astrophysics Data System (ADS)
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.
A Bayesian alternative for multi-objective ecohydrological model specification
NASA Astrophysics Data System (ADS)
Tang, Yating; Marshall, Lucy; Sharma, Ashish; Ajami, Hoori
2018-01-01
Recent studies have identified the importance of vegetation processes in terrestrial hydrologic systems. Process-based ecohydrological models combine hydrological, physical, biochemical and ecological processes of the catchments, and as such are generally more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov chain Monte Carlo (MCMC) techniques. The Bayesian approach offers an appealing alternative to traditional multi-objective hydrologic model calibrations by defining proper prior distributions that can be considered analogous to the ad-hoc weighting often prescribed in multi-objective calibration. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological modeling framework based on a traditional Pareto-based model calibration technique. In our study, a Pareto-based multi-objective optimization and a formal Bayesian framework are implemented in a conceptual ecohydrological model that combines a hydrological model (HYMOD) and a modified Bucket Grassland Model (BGM). Simulations focused on one objective (streamflow/LAI) and multiple objectives (streamflow and LAI) with different emphasis defined via the prior distribution of the model error parameters. Results show more reliable outputs for both predicted streamflow and LAI using Bayesian multi-objective calibration with specified prior distributions for error parameters based on results from the Pareto front in the ecohydrological modeling. The methodology implemented here provides insight into the usefulness of multiobjective Bayesian calibration for ecohydrologic systems and the importance of appropriate prior distributions in such approaches.
Lo, Benjamin W. Y.; Macdonald, R. Loch; Baker, Andrew; Levine, Mitchell A. H.
2013-01-01
Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH). Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients). Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs). Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique) denoted cut-off points for poor prognosis at greater than 2.5 clusters. Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication. PMID:23690884
Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.
2014-01-01
The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736
Efficient Bayesian experimental design for contaminant source identification
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Zeng, Lingzao; Chen, Cheng; Chen, Dingjiang; Wu, Laosheng
2015-01-01
In this study, an efficient full Bayesian approach is developed for the optimal sampling well location design and source parameters identification of groundwater contaminants. An information measure, i.e., the relative entropy, is employed to quantify the information gain from concentration measurements in identifying unknown parameters. In this approach, the sampling locations that give the maximum expected relative entropy are selected as the optimal design. After the sampling locations are determined, a Bayesian approach based on Markov Chain Monte Carlo (MCMC) is used to estimate unknown parameters. In both the design and estimation, the contaminant transport equation is required to be solved many times to evaluate the likelihood. To reduce the computational burden, an interpolation method based on the adaptive sparse grid is utilized to construct a surrogate for the contaminant transport equation. The approximated likelihood can be evaluated directly from the surrogate, which greatly accelerates the design and estimation process. The accuracy and efficiency of our approach are demonstrated through numerical case studies. It is shown that the methods can be used to assist in both single sampling location and monitoring network design for contaminant source identifications in groundwater.
NASA Astrophysics Data System (ADS)
Krishnanathan, Kirubhakaran; Anderson, Sean R.; Billings, Stephen A.; Kadirkamanathan, Visakan
2016-11-01
In this paper, we derive a system identification framework for continuous-time nonlinear systems, for the first time using a simulation-focused computational Bayesian approach. Simulation approaches to nonlinear system identification have been shown to outperform regression methods under certain conditions, such as non-persistently exciting inputs and fast-sampling. We use the approximate Bayesian computation (ABC) algorithm to perform simulation-based inference of model parameters. The framework has the following main advantages: (1) parameter distributions are intrinsically generated, giving the user a clear description of uncertainty, (2) the simulation approach avoids the difficult problem of estimating signal derivatives as is common with other continuous-time methods, and (3) as noted above, the simulation approach improves identification under conditions of non-persistently exciting inputs and fast-sampling. Term selection is performed by judging parameter significance using parameter distributions that are intrinsically generated as part of the ABC procedure. The results from a numerical example demonstrate that the method performs well in noisy scenarios, especially in comparison to competing techniques that rely on signal derivative estimation.
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
A fast combination method in DSmT and its application to recommender system
Liu, Yihai
2018-01-01
In many applications involving epistemic uncertainties usually modeled by belief functions, it is often necessary to approximate general (non-Bayesian) basic belief assignments (BBAs) to subjective probabilities (called Bayesian BBAs). This necessity occurs if one needs to embed the fusion result in a system based on the probabilistic framework and Bayesian inference (e.g. tracking systems), or if one needs to make a decision in the decision making problems. In this paper, we present a new fast combination method, called modified rigid coarsening (MRC), to obtain the final Bayesian BBAs based on hierarchical decomposition (coarsening) of the frame of discernment. Regarding this method, focal elements with probabilities are coarsened efficiently to reduce computational complexity in the process of combination by using disagreement vector and a simple dichotomous approach. In order to prove the practicality of our approach, this new approach is applied to combine users’ soft preferences in recommender systems (RSs). Additionally, in order to make a comprehensive performance comparison, the proportional conflict redistribution rule #6 (PCR6) is regarded as a baseline in a range of experiments. According to the results of experiments, MRC is more effective in accuracy of recommendations compared to original Rigid Coarsening (RC) method and comparable in computational time. PMID:29351297
Bayesian outcome-based strategy classification.
Lee, Michael D
2016-03-01
Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014) recently developed a method for making inferences about the decision processes people use in multi-attribute forced choice tasks. Their paper makes a number of worthwhile theoretical and methodological contributions. Theoretically, they provide an insightful psychological motivation for a probabilistic extension of the widely-used "weighted additive" (WADD) model, and show how this model, as well as other important models like "take-the-best" (TTB), can and should be expressed in terms of meaningful priors. Methodologically, they develop an inference approach based on the Minimum Description Length (MDL) principles that balances both the goodness-of-fit and complexity of the decision models they consider. This paper aims to preserve these useful contributions, but provide a complementary Bayesian approach with some theoretical and methodological advantages. We develop a simple graphical model, implemented in JAGS, that allows for fully Bayesian inferences about which models people use to make decisions. To demonstrate the Bayesian approach, we apply it to the models and data considered by Hilbig and Moshagen (Psychonomic Bulletin & Review, 21, 1431-1443, 2014), showing how a prior predictive analysis of the models, and posterior inferences about which models people use and the parameter settings at which they use them, can contribute to our understanding of human decision making.
A Bayesian hierarchical approach to comparative audit for carotid surgery.
Kuhan, G; Marshall, E C; Abidia, A F; Chetter, I C; McCollum, P T
2002-12-01
the aim of this study was to illustrate how a Bayesian hierarchical modelling approach can aid the reliable comparison of outcome rates between surgeons. retrospective analysis of prospective and retrospective data. binary outcome data (death/stroke within 30 days), together with information on 15 possible risk factors specific for CEA were available on 836 CEAs performed by four vascular surgeons from 1992-99. The median patient age was 68 (range 38-86) years and 60% were men. the model was developed using the WinBUGS software. After adjusting for patient-level risk factors, a cross-validatory approach was adopted to identify "divergent" performance. A ranking exercise was also carried out. the overall observed 30-day stroke/death rate was 3.9% (33/836). The model found diabetes, stroke and heart disease to be significant risk factors. There was no significant difference between the predicted and observed outcome rates for any surgeon (Bayesian p -value>0.05). Each surgeon had a median rank of 3 with associated 95% CI 1.0-5.0, despite the variability of observed stroke/death rate from 2.9-4.4%. After risk adjustment, there was very little residual between-surgeon variability in outcome rate. Bayesian hierarchical models can help to accurately quantify the uncertainty associated with surgeons' performance and rank.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Bayesian approach for assessing non-inferiority in a three-arm trial with pre-specified margin.
Ghosh, Samiran; Ghosh, Santu; Tiwari, Ram C
2016-02-28
Non-inferiority trials are becoming increasingly popular for comparative effectiveness research. However, inclusion of the placebo arm, whenever possible, gives rise to a three-arm trial which has lesser burdensome assumptions than a standard two-arm non-inferiority trial. Most of the past developments in a three-arm trial consider defining a pre-specified fraction of unknown effect size of reference drug, that is, without directly specifying a fixed non-inferiority margin. However, in some recent developments, a more direct approach is being considered with pre-specified fixed margin albeit in the frequentist setup. Bayesian paradigm provides a natural path to integrate historical and current trials' information via sequential learning. In this paper, we propose a Bayesian approach for simultaneous testing of non-inferiority and assay sensitivity in a three-arm trial with normal responses. For the experimental arm, in absence of historical information, non-informative priors are assumed under two situations, namely when (i) variance is known and (ii) variance is unknown. A Bayesian decision criteria is derived and compared with the frequentist method using simulation studies. Finally, several published clinical trial examples are reanalyzed to demonstrate the benefit of the proposed procedure. Copyright © 2015 John Wiley & Sons, Ltd.