Preferential sampling and Bayesian geostatistics: Statistical modeling and examples.
Cecconi, Lorenzo; Grisotto, Laura; Catelan, Dolores; Lagazio, Corrado; Berrocal, Veronica; Biggeri, Annibale
2016-08-01
Preferential sampling refers to any situation in which the spatial process and the sampling locations are not stochastically independent. In this paper, we present two examples of geostatistical analysis in which the usual assumption of stochastic independence between the point process and the measurement process is violated. To account for preferential sampling, we specify a flexible and general Bayesian geostatistical model that includes a shared spatial random component. We apply the proposed model to two different case studies that allow us to highlight three different modeling and inferential aspects of geostatistical modeling under preferential sampling: (1) continuous or finite spatial sampling frame; (2) underlying causal model and relevant covariates; and (3) inferential goals related to mean prediction surface or prediction uncertainty.
Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
Bayesian Geostatistical Modeling of Malaria Indicator Survey Data in Angola
Gosoniu, Laura; Veta, Andre Mia; Vounatsou, Penelope
2010-01-01
The 2006–2007 Angola Malaria Indicator Survey (AMIS) is the first nationally representative household survey in the country assessing coverage of the key malaria control interventions and measuring malaria-related burden among children under 5 years of age. In this paper, the Angolan MIS data were analyzed to produce the first smooth map of parasitaemia prevalence based on contemporary nationwide empirical data in the country. Bayesian geostatistical models were fitted to assess the effect of interventions after adjusting for environmental, climatic and socio-economic factors. Non-linear relationships between parasitaemia risk and environmental predictors were modeled by categorizing the covariates and by employing two non-parametric approaches, the B-splines and the P-splines. The results of the model validation showed that the categorical model was able to better capture the relationship between parasitaemia prevalence and the environmental factors. Model fit and prediction were handled within a Bayesian framework using Markov chain Monte Carlo (MCMC) simulations. Combining estimates of parasitaemia prevalence with the number of children under we obtained estimates of the number of infected children in the country. The population-adjusted prevalence ranges from in Namibe province to in Malanje province. The odds of parasitaemia in children living in a household with at least ITNs per person was by 41% lower (CI: 14%, 60%) than in those with fewer ITNs. The estimates of the number of parasitaemic children produced in this paper are important for planning and implementing malaria control interventions and for monitoring the impact of prevention and control activities. PMID:20351775
Giardina, Federica; Gosoniu, Laura; Konate, Lassana; Diouf, Mame Birame; Perry, Robert; Gaye, Oumar; Faye, Ousmane; Vounatsou, Penelope
2012-01-01
The Research Center for Human Development in Dakar (CRDH) with the technical assistance of ICF Macro and the National Malaria Control Programme (NMCP) conducted in 2008/2009 the Senegal Malaria Indicator Survey (SMIS), the first nationally representative household survey collecting parasitological data and malaria-related indicators. In this paper, we present spatially explicit parasitaemia risk estimates and number of infected children below 5 years. Geostatistical Zero-Inflated Binomial models (ZIB) were developed to take into account the large number of zero-prevalence survey locations (70%) in the data. Bayesian variable selection methods were incorporated within a geostatistical framework in order to choose the best set of environmental and climatic covariates associated with the parasitaemia risk. Model validation confirmed that the ZIB model had a better predictive ability than the standard Binomial analogue. Markov chain Monte Carlo (MCMC) methods were used for inference. Several insecticide treated nets (ITN) coverage indicators were calculated to assess the effectiveness of interventions. After adjusting for climatic and socio-economic factors, the presence of at least one ITN per every two household members and living in urban areas reduced the odds of parasitaemia by 86% and 81% respectively. Posterior estimates of the ORs related to the wealth index show a decreasing trend with the quintiles. Infection odds appear to be increasing with age. The population-adjusted prevalence ranges from 0.12% in Thillé-Boubacar to 13.1% in Dabo. Tambacounda has the highest population-adjusted predicted prevalence (8.08%) whereas the region with the highest estimated number of infected children under the age of 5 years is Kolda (13940). The contemporary map and estimates of malaria burden identify the priority areas for future control interventions and provide baseline information for monitoring and evaluation. Zero-Inflated formulations are more appropriate in
Hagos, Seifu; Hailemariam, Damen; WoldeHanna, Tasew; Lindtjørn, Bernt
2017-01-01
Background Understanding the spatial distribution of stunting and underlying factors operating at meso-scale is of paramount importance for intervention designing and implementations. Yet, little is known about the spatial distribution of stunting and some discrepancies are documented on the relative importance of reported risk factors. Therefore, the present study aims at exploring the spatial distribution of stunting at meso- (district) scale, and evaluates the effect of spatial dependency on the identification of risk factors and their relative contribution to the occurrence of stunting and severe stunting in a rural area of Ethiopia. Methods A community based cross sectional study was conducted to measure the occurrence of stunting and severe stunting among children aged 0–59 months. Additionally, we collected relevant information on anthropometric measures, dietary habits, parent and child-related demographic and socio-economic status. Latitude and longitude of surveyed households were also recorded. Local Anselin Moran's I was calculated to investigate the spatial variation of stunting prevalence and identify potential local pockets (hotspots) of high prevalence. Finally, we employed a Bayesian geo-statistical model, which accounted for spatial dependency structure in the data, to identify potential risk factors for stunting in the study area. Results Overall, the prevalence of stunting and severe stunting in the district was 43.7% [95%CI: 40.9, 46.4] and 21.3% [95%CI: 19.5, 23.3] respectively. We identified statistically significant clusters of high prevalence of stunting (hotspots) in the eastern part of the district and clusters of low prevalence (cold spots) in the western. We found out that the inclusion of spatial structure of the data into the Bayesian model has shown to improve the fit for stunting model. The Bayesian geo-statistical model indicated that the risk of stunting increased as the child’s age increased (OR 4.74; 95% Bayesian credible
Kara G. Eby
2010-08-01
At the Idaho National Laboratory (INL) Cs-137 concentrations above the U.S. Environmental Protection Agency risk-based threshold of 0.23 pCi/g may increase the risk of human mortality due to cancer. As a leader in nuclear research, the INL has been conducting nuclear activities for decades. Elevated anthropogenic radionuclide levels including Cs-137 are a result of atmospheric weapons testing, the Chernobyl accident, and nuclear activities occurring at the INL site. Therefore environmental monitoring and long-term surveillance of Cs-137 is required to evaluate risk. However, due to the large land area involved, frequent and comprehensive monitoring is limited. Developing a spatial model that predicts Cs-137 concentrations at unsampled locations will enhance the spatial characterization of Cs-137 in surface soils, provide guidance for an efficient monitoring program, and pinpoint areas requiring mitigation strategies. The predictive model presented herein is based on applied geostatistics using a Bayesian analysis of environmental characteristics across the INL site, which provides kriging spatial maps of both Cs-137 estimates and prediction errors. Comparisons are presented of two different kriging methods, showing that the use of secondary information (i.e., environmental characteristics) can provide improved prediction performance in some areas of the INL site.
A Bayesian geostatistical transfer function approach to tracer test analysis
NASA Astrophysics Data System (ADS)
Fienen, Michael N.; Luo, Jian; Kitanidis, Peter K.
2006-07-01
Reactive transport modeling is often used in support of bioremediation and chemical treatment planning and design. There remains a pressing need for practical and efficient models that do not require (or assume attainable) the high level of characterization needed by complex numerical models. We focus on a linear systems or transfer function approach to the problem of reactive tracer transport in a heterogeneous saprolite aquifer. Transfer functions are obtained through the Bayesian geostatistical inverse method applied to tracer injection histories and breakthrough curves. We employ nonparametric transfer functions, which require minimal assumptions about shape and structure. The resulting flexibility empowers the data to determine the nature of the transfer function with minimal prior assumptions. Nonnegativity is enforced through a reflected Brownian motion stochastic model. The inverse method enables us to quantify uncertainty and to generate conditional realizations of the transfer function. Complex information about a hydrogeologic system is distilled into a relatively simple but rigorously obtained function that describes the transport behavior of the system between two wells. The resulting transfer functions are valuable in reactive transport models based on traveltime and streamline methods. The information contained in the data, particularly in the case of strong heterogeneity, is not overextended but is fully used. This is the first application of Bayesian geostatistical inversion to transfer functions in hydrogeology but the methodology can be extended to any linear system.
Model Selection for Geostatistical Models
Hoeting, Jennifer A.; Davis, Richard A.; Merton, Andrew A.; Thompson, Sandra E.
2006-02-01
We consider the problem of model selection for geospatial data. Spatial correlation is typically ignored in the selection of explanatory variables and this can influence model selection results. For example, the inclusion or exclusion of particular explanatory variables may not be apparent when spatial correlation is ignored. To address this problem, we consider the Akaike Information Criterion (AIC) as applied to a geostatistical model. We offer a heuristic derivation of the AIC in this context and provide simulation results that show that using AIC for a geostatistical model is superior to the often used approach of ignoring spatial correlation in the selection of explanatory variables. These ideas are further demonstrated via a model for lizard abundance. We also employ the principle of minimum description length (MDL) to variable selection for the geostatistical model. The effect of sampling design on the selection of explanatory covariates is also explored.
Model selection for geostatistical models.
Hoeting, Jennifer A; Davis, Richard A; Merton, Andrew A; Thompson, Sandra E
2006-02-01
We consider the problem of model selection for geospatial data. Spatial correlation is often ignored in the selection of explanatory variables, and this can influence model selection results. For example, the importance of particular explanatory variables may not be apparent when spatial correlation is ignored. To address this problem, we consider the Akaike Information Criterion (AIC) as applied to a geostatistical model. We offer a heuristic derivation of the AIC in this context and provide simulation results that show that using AIC for a geostatistical model is superior to the often-used traditional approach of ignoring spatial correlation in the selection of explanatory variables. These ideas are further demonstrated via a model for lizard abundance. We also apply the principle of minimum description length (MDL) to variable selection for the geostatistical model. The effect of sampling design on the selection of explanatory covariates is also explored. R software to implement the geostatistical model selection methods described in this paper is available in the Supplement.
An interactive Bayesian geostatistical inverse protocol for hydraulic tomography
Fienen, Michael N.; Clemo, Tom; Kitanidis, Peter K.
2008-01-01
Hydraulic tomography is a powerful technique for characterizing heterogeneous hydrogeologic parameters. An explicit trade-off between characterization based on measurement misfit and subjective characterization using prior information is presented. We apply a Bayesian geostatistical inverse approach that is well suited to accommodate a flexible model with the level of complexity driven by the data and explicitly considering uncertainty. Prior information is incorporated through the selection of a parameter covariance model characterizing continuity and providing stability. Often, discontinuities in the parameter field, typically caused by geologic contacts between contrasting lithologic units, necessitate subdivision into zones across which there is no correlation among hydraulic parameters. We propose an interactive protocol in which zonation candidates are implied from the data and are evaluated using cross validation and expert knowledge. Uncertainty introduced by limited knowledge of dynamic regional conditions is mitigated by using drawdown rather than native head values. An adjoint state formulation of MODFLOW-2000 is used to calculate sensitivities which are used both for the solution to the inverse problem and to guide protocol decisions. The protocol is tested using synthetic two-dimensional steady state examples in which the wells are located at the edge of the region of interest.
Geostatistical Modeling of Pore Velocity
Devary, J.L.; Doctor, P.G.
1981-06-01
A significant part of evaluating a geologic formation as a nuclear waste repository involves the modeling of contaminant transport in the surrounding media in the event the repository is breached. The commonly used contaminant transport models are deterministic. However, the spatial variability of hydrologic field parameters introduces uncertainties into contaminant transport predictions. This paper discusses the application of geostatistical techniques to the modeling of spatially varying hydrologic field parameters required as input to contaminant transport analyses. Kriging estimation techniques were applied to Hanford Reservation field data to calculate hydraulic conductivity and the ground-water potential gradients. These quantities were statistically combined to estimate the groundwater pore velocity and to characterize the pore velocity estimation error. Combining geostatistical modeling techniques with product error propagation techniques results in an effective stochastic characterization of groundwater pore velocity, a hydrologic parameter required for contaminant transport analyses.
Bayesian geostatistics in health cartography: the perspective of malaria
Patil, Anand P.; Gething, Peter W.; Piel, Frédéric B.; Hay, Simon I.
2011-01-01
Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision. PMID:21420361
Bayesian geostatistics in health cartography: the perspective of malaria.
Patil, Anand P; Gething, Peter W; Piel, Frédéric B; Hay, Simon I
2011-06-01
Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision.
Fienen, Michael N.; D'Oria, Marco; Doherty, John E.; Hunt, Randall J.
2013-01-01
The application bgaPEST is a highly parameterized inversion software package implementing the Bayesian Geostatistical Approach in a framework compatible with the parameter estimation suite PEST. Highly parameterized inversion refers to cases in which parameters are distributed in space or time and are correlated with one another. The Bayesian aspect of bgaPEST is related to Bayesian probability theory in which prior information about parameters is formally revised on the basis of the calibration dataset used for the inversion. Conceptually, this approach formalizes the conditionality of estimated parameters on the speciﬁc data and model available. The geostatistical component of the method refers to the way in which prior information about the parameters is used. A geostatistical autocorrelation function is used to enforce structure on the parameters to avoid overﬁtting and unrealistic results. Bayesian Geostatistical Approach is designed to provide the smoothest solution that is consistent with the data. Optionally, users can specify a level of ﬁt or estimate a balance between ﬁt and model complexity informed by the data. Groundwater and surface-water applications are used as examples in this text, but the possible uses of bgaPEST extend to any distributed parameter applications.
NASA Astrophysics Data System (ADS)
Troldborg, M.; Nowak, W.; Binning, P. J.; Bjerg, P. L.
2012-12-01
Estimates of mass discharge (mass/time) are increasingly being used when assessing risks of groundwater contamination and designing remedial systems at contaminated sites. Mass discharge estimates are, however, prone to rather large uncertainties as they integrate uncertain spatial distributions of both concentration and groundwater flow velocities. For risk assessments or any other decisions that are being based on mass discharge estimates, it is essential to address these uncertainties. We present a novel Bayesian geostatistical approach for quantifying the uncertainty of the mass discharge across a multilevel control plane. The method decouples the flow and transport simulation and has the advantage of avoiding the heavy computational burden of three-dimensional numerical flow and transport simulation coupled with geostatistical inversion. It may therefore be of practical relevance to practitioners compared to existing methods that are either too simple or computationally demanding. The method is based on conditional geostatistical simulation and accounts for i) heterogeneity of both the flow field and the concentration distribution through Bayesian geostatistics (including the uncertainty in covariance functions), ii) measurement uncertainty, and iii) uncertain source zone geometry and transport parameters. The method generates multiple equally likely realizations of the spatial flow and concentration distribution, which all honour the measured data at the control plane. The flow realizations are generated by analytical co-simulation of the hydraulic conductivity and the hydraulic gradient across the control plane. These realizations are made consistent with measurements of both hydraulic conductivity and head at the site. An analytical macro-dispersive transport solution is employed to simulate the mean concentration distribution across the control plane, and a geostatistical model of the Box-Cox transformed concentration data is used to simulate observed
NASA Astrophysics Data System (ADS)
Troldborg, Mads; Nowak, Wolfgang; Lange, Ida V.; Santos, Marta C.; Binning, Philip J.; Bjerg, Poul L.
2012-09-01
Mass discharge estimates are increasingly being used when assessing risks of groundwater contamination and designing remedial systems at contaminated sites. Such estimates are, however, rather uncertain as they integrate uncertain spatial distributions of both concentration and groundwater flow. Here a geostatistical simulation method for quantifying the uncertainty of the mass discharge across a multilevel control plane is presented. The method accounts for (1) heterogeneity of both the flow field and the concentration distribution through Bayesian geostatistics, (2) measurement uncertainty, and (3) uncertain source zone and transport parameters. The method generates conditional realizations of the spatial flow and concentration distribution. An analytical macrodispersive transport solution is employed to simulate the mean concentration distribution, and a geostatistical model of the Box-Cox transformed concentration data is used to simulate observed deviations from this mean solution. By combining the flow and concentration realizations, a mass discharge probability distribution is obtained. The method has the advantage of avoiding the heavy computational burden of three-dimensional numerical flow and transport simulation coupled with geostatistical inversion. It may therefore be of practical relevance to practitioners compared to existing methods that are either too simple or computationally demanding. The method is demonstrated on a field site contaminated with chlorinated ethenes. For this site, we show that including a physically meaningful concentration trend and the cosimulation of hydraulic conductivity and hydraulic gradient across the transect helps constrain the mass discharge uncertainty. The number of sampling points required for accurate mass discharge estimation and the relative influence of different data types on mass discharge uncertainty is discussed.
NASA Astrophysics Data System (ADS)
Painter, S. L.; Jiang, Y.; Woodbury, A. D.
2002-12-01
The Edwards Aquifer, a highly heterogeneous karst aquifer located in south central Texas, is the sole source of drinking water for more than one million people. Hydraulic conductivity (K) measurements in the Edwards Aquifer are sparse, highly variable (log-K variance of 6.4), and are mostly from single-well drawdown tests that are appropriate for the spatial scale of a few meters. To support ongoing efforts to develop a groundwater management (MODFLOW) model of the San Antonio segment of the Edwards Aquifer, a multistep procedure was developed to assign hydraulic parameters to the 402 m x 402 m computational cells intended for the management model. The approach used a combination of nonparametric geostatistical analysis, stochastic simulation, numerical upscaling, and automatic model calibration based on Bayesian updating [1,2]. Indicator correlograms reveal a nested spatial structure in the well-test K of the confined zone, with practical correlation ranges of 3,600 and 15,000 meters and a large nugget effect. The fitted geostatistical model was used in unconditional stochastic simulations by the sequential indicator simulation method. The resulting realizations of K, defined at the scale of the well tests, were then numerically upscaled to the block scale. A new geostatistical model was fitted to the upscaled values. The upscaled model was then used to cokrige the block-scale K based on the well-test K. The resulting K map was then converted to transmissivity (T) using deterministically mapped aquifer thickness. When tested in a forward groundwater model, the upscaled T reproduced hydraulic heads better than a simple kriging of the well-test values (mean error of -3.9 meter and mean-absolute-error of 12 meters, as compared with -13 and 17 meters for the simple kriging). As the final step in the study, the upscaled T map was used as the prior distribution in an inverse procedure based on Bayesian updating [1,2]. When input to the forward groundwater model, the
Boysen, Courtney; Davis, Elizabeth G.; Beard, Laurie A.; Lubbers, Brian V.; Raghavan, Ram K.
2015-01-01
Kansas witnessed an unprecedented outbreak in Corynebacterium pseudotuberculosis infection among horses, a disease commonly referred to as pigeon fever during fall 2012. Bayesian geostatistical models were developed to identify key environmental and climatic risk factors associated with C. pseudotuberculosis infection in horses. Positive infection status among horses (cases) was determined by positive test results for characteristic abscess formation, positive bacterial culture on purulent material obtained from a lanced abscess (n = 82), or positive serologic evidence of exposure to organism (≥1:512)(n = 11). Horses negative for these tests (n = 172)(controls) were considered free of infection. Information pertaining to horse demographics and stabled location were obtained through review of medical records and/or contact with horse owners via telephone. Covariate information for environmental and climatic determinants were obtained from USDA (soil attributes), USGS (land use/land cover), and NASA MODIS and NASA Prediction of Worldwide Renewable Resources (climate). Candidate covariates were screened using univariate regression models followed by Bayesian geostatistical models with and without covariates. The best performing model indicated a protective effect for higher soil moisture content (OR = 0.53, 95% CrI = 0.25, 0.71), and detrimental effects for higher land surface temperature (≥35°C) (OR = 2.81, 95% CrI = 2.21, 3.85) and habitat fragmentation (OR = 1.31, 95% CrI = 1.27, 2.22) for C. pseudotuberculosis infection status in horses, while age, gender and breed had no effect. Preventative and ecoclimatic significance of these findings are discussed. PMID:26473728
Boysen, Courtney; Davis, Elizabeth G; Beard, Laurie A; Lubbers, Brian V; Raghavan, Ram K
2015-01-01
Kansas witnessed an unprecedented outbreak in Corynebacterium pseudotuberculosis infection among horses, a disease commonly referred to as pigeon fever during fall 2012. Bayesian geostatistical models were developed to identify key environmental and climatic risk factors associated with C. pseudotuberculosis infection in horses. Positive infection status among horses (cases) was determined by positive test results for characteristic abscess formation, positive bacterial culture on purulent material obtained from a lanced abscess (n = 82), or positive serologic evidence of exposure to organism (≥ 1:512)(n = 11). Horses negative for these tests (n = 172)(controls) were considered free of infection. Information pertaining to horse demographics and stabled location were obtained through review of medical records and/or contact with horse owners via telephone. Covariate information for environmental and climatic determinants were obtained from USDA (soil attributes), USGS (land use/land cover), and NASA MODIS and NASA Prediction of Worldwide Renewable Resources (climate). Candidate covariates were screened using univariate regression models followed by Bayesian geostatistical models with and without covariates. The best performing model indicated a protective effect for higher soil moisture content (OR = 0.53, 95% CrI = 0.25, 0.71), and detrimental effects for higher land surface temperature (≥ 35°C) (OR = 2.81, 95% CrI = 2.21, 3.85) and habitat fragmentation (OR = 1.31, 95% CrI = 1.27, 2.22) for C. pseudotuberculosis infection status in horses, while age, gender and breed had no effect. Preventative and ecoclimatic significance of these findings are discussed.
A PC-Windows-Based program for geostatistical modeling application
Wu, G.G.; Yang, A.P.
1994-12-31
This paper describes a technically advanced, user-friendly, PC-Windows{sup TM} based reservoir simulation tool (SIMTOOLS) that allows construction of realistic reservoir models using a geostatistical approach. This PC-Windows based product has three application tools: Digitizing, mapping, and geostatistics. It has been designed primarily to enable reservoir engineers to apply the geostatistical gridding technique in mapping and reservoir simulation practices.
Geostatistics: models and tools for the earth sciences
Journel, A.G.
1986-01-01
The probability construct underlying geostatistical methodology is recalled, stressing that stationary is a property of the model rather than of the phenomenon being represented. Geostatistics is more then interpolation and kriging(s) is more than linear interpolation through ordinary kriging. A few common misconceptions are addressed.
[Geostatistical modeling of Ascaris lumbricoides infection].
Fortes, Bruno de Paula Menezes Drumond; Ortiz Valencia, Luis Iván; Ribeiro, Simone do Vale; Medronho, Roberto de Andrade
2004-01-01
The following study intends to model the spatial distribution of ascariasis, through the use of geoprocessing and geostatistic analysis. The database used in the study was taken from the PAISQUA project, including a coproparasitologic and domiciliary survey, conducted in 19 selected census tracts of Rio de Janeiro State, Brazil, randomly selecting a group of 1,550 children aged 1 to 9 years old plotting them in their respective domicile's centroids. Risk maps of Ascaris lumbricoides were generated by indicator kriging. The estimated and observed values from the cross-validation were compared using a ROC curve. An isotropic spherical semivariogram model with a range of 30m and nugget effect of 50% was employed in ordinary indicator kriging to create a map of probability of A. lumbricoides infection. The area under the ROC curve indicated a significant global accuracy. The occurrence of disease could be estimated in the study area, and a risk map was elaborated through the use ordinary kriging. The spatial statistics analysis has proven itself adequate for predicting the occurrence of ascariasis, unrestricted to the regions political boundaries.
High Performance Geostatistical Modeling of Biospheric Resources
NASA Astrophysics Data System (ADS)
Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.
2004-12-01
We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
Quantifying natural delta variability using a multiple-point geostatistics prior uncertainty model
NASA Astrophysics Data System (ADS)
Scheidt, Céline; Fernandes, Anjali M.; Paola, Chris; Caers, Jef
2016-10-01
We address the question of quantifying uncertainty associated with autogenic pattern variability in a channelized transport system by means of a modern geostatistical method. This question has considerable relevance for practical subsurface applications as well, particularly those related to uncertainty quantification relying on Bayesian approaches. Specifically, we show how the autogenic variability in a laboratory experiment can be represented and reproduced by a multiple-point geostatistical prior uncertainty model. The latter geostatistical method requires selection of a limited set of training images from which a possibly infinite set of geostatistical model realizations, mimicking the training image patterns, can be generated. To that end, we investigate two methods to determine how many training images and what training images should be provided to reproduce natural autogenic variability. The first method relies on distance-based clustering of overhead snapshots of the experiment; the second method relies on a rate of change quantification by means of a computer vision algorithm termed the demon algorithm. We show quantitatively that with either training image selection method, we can statistically reproduce the natural variability of the delta formed in the experiment. In addition, we study the nature of the patterns represented in the set of training images as a representation of the "eigenpatterns" of the natural system. The eigenpattern in the training image sets display patterns consistent with previous physical interpretations of the fundamental modes of this type of delta system: a highly channelized, incisional mode; a poorly channelized, depositional mode; and an intermediate mode between the two.
Fienen, M.; Hunt, R.; Krabbenhoft, D.; Clemo, T.
2009-01-01
Flow path delineation is a valuable tool for interpreting the subsurface hydrogeochemical environment. Different types of data, such as groundwater flow and transport, inform different aspects of hydrogeologie parameter values (hydraulic conductivity in this case) which, in turn, determine flow paths. This work combines flow and transport information to estimate a unified set of hydrogeologic parameters using the Bayesian geostatistical inverse approach. Parameter flexibility is allowed by using a highly parameterized approach with the level of complexity informed by the data. Despite the effort to adhere to the ideal of minimal a priori structure imposed on the problem, extreme contrasts in parameters can result in the need to censor correlation across hydrostratigraphic bounding surfaces. These partitions segregate parameters into faci??s associations. With an iterative approach in which partitions are based on inspection of initial estimates, flow path interpretation is progressively refined through the inclusion of more types of data. Head observations, stable oxygen isotopes (18O/16O) ratios), and tritium are all used to progressively refine flow path delineation on an isthmus between two lakes in the Trout Lake watershed, northern Wisconsin, United States. Despite allowing significant parameter freedom by estimating many distributed parameter values, a smooth field is obtained. Copyright 2009 by the American Geophysical Union.
Model Diagnostics for Bayesian Networks
ERIC Educational Resources Information Center
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
Gstat: a program for geostatistical modelling, prediction and simulation
NASA Astrophysics Data System (ADS)
Pebesma, Edzer J.; Wesseling, Cees G.
1998-01-01
Gstat is a computer program for variogram modelling, and geostatistical prediction and simulation. It provides a generic implementation of the multivariable linear model with trends modelled as a linear function of coordinate polynomials or of user-defined base functions, and independent or dependent, geostatistically modelled, residuals. Simulation in gstat comprises conditional or unconditional (multi-) Gaussian sequential simulation of point values or block averages, or (multi-) indicator sequential simulation. Besides many of the popular options found in other geostatistical software packages, gstat offers the unique combination of (i) an interactive user interface for modelling variograms and generalized covariances (residual variograms), that uses the device-independent plotting program gnuplot for graphical display, (ii) support for several ascii and binary data and map file formats for input and output, (iii) a concise, intuitive and flexible command language, (iv) user customization of program defaults, (v) no built-in limits, and (vi) free, portable ANSI-C source code. This paper describes the class of problems gstat can solve, and addresses aspects of efficiency and implementation, managing geostatistical projects, and relevant technical details.
Kennedy, Paula L; Woodbury, Allan D
2002-01-01
In ground water flow and transport modeling, the heterogeneous nature of porous media has a considerable effect on the resulting flow and solute transport. Some method of generating the heterogeneous field from a limited dataset of uncertain measurements is required. Bayesian updating is one method that interpolates from an uncertain dataset using the statistics of the underlying probability distribution function. In this paper, Bayesian updating was used to determine the heterogeneous natural log transmissivity field for a carbonate and a sandstone aquifer in southern Manitoba. It was determined that the transmissivity in m2/sec followed a natural log normal distribution for both aquifers with a mean of -7.2 and - 8.0 for the carbonate and sandstone aquifers, respectively. The variograms were calculated using an estimator developed by Li and Lake (1994). Fractal nature was not evident in the variogram from either aquifer. The Bayesian updating heterogeneous field provided good results even in cases where little data was available. A large transmissivity zone in the sandstone aquifer was created by the Bayesian procedure, which is not a reflection of any deterministic consideration, but is a natural outcome of updating a prior probability distribution function with observations. The statistical model returns a result that is very reasonable; that is homogeneous in regions where little or no information is available to alter an initial state. No long range correlation trends or fractal behavior of the log-transmissivity field was observed in either aquifer over a distance of about 300 km.
Geostatistical model to estimate in stream pollutant loads and concentrations.
NASA Astrophysics Data System (ADS)
Polus, E.; Flipo, N.; de Fouquet, C.; Poulin, M.
2009-04-01
Models that estimate loads and concentrations of pollutants in streams can roughly be classified into two categories: physically-based and stochastic models. While the first ones tend to reproduce physical processes that occur in streams, the stochastic models consider loads and concentrations as random variables. This work is interesting in such models and particularly in geostatistical models, which provide an estimate of loads and concentrations and the joint measurement of uncertainty also: the estimation variance. Along a stream network that can be modelled as a graph, most of usual geostatistical covariance or variogram models are not valid anymore. Based on recent models applied on tree graphs, we present a covariance or variogram construction combining one-dimensional Random Functions (RF) defined on each path between sources and the outlet. The model properties are examined, namely the consistency conditions at the confluences for different variables. In practice, the scarcity of spatial data makes a precise inference of covariances difficult. Can then a phenomenological model be used to guide the geostatistical modelling? To answer this question the example of a portion of the Seine River (France) is examined, where both measurement data and the outputs of the physically-based model ProSe are used. The comparison between both data sets shows an excellent agreement for discharges and a consistent one for nitrate concentrations. Nevertheless, a detailed exploratory analysis brings to light the importance of the boundary conditions, which ones are not consistent with the downstream measurements. The agreement between data and modelled values can be improved thanks to a reconstruction of consistent boundary conditions by cokriging. This is an example of the usefulness of using jointly physically-based models and geostatistics. The next step is a joint modelling of discharges, loads and concentrations along the stream network. This modelling should improve the
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
NASA Astrophysics Data System (ADS)
Hu, L.; Montzka, S. A.; Miller, B.; Andrews, A. E.; Miller, J. B.; Lehman, S.; Sweeney, C.; Miller, S. M.; Thoning, K. W.; Siso, C.; Atlas, E. L.; Blake, D. R.; De Gouw, J. A.; Gilman, J.; Dutton, G. S.; Elkins, J. W.; Hall, B. D.; Chen, H.; Fischer, M. L.; Mountain, M. E.; Nehrkorn, T.; Biraud, S.; Tans, P. P.
2015-12-01
Global atmospheric observations suggest substantial ongoing emissions of carbon tetrachloride (CCl4) despite a 100% phase-out of production for dispersive uses since 1996 in developed countries and 2010 in other countries. Little progress has been made in understanding the causes of these ongoing emissions or identifying their contributing sources. In this study, we employed multiple inverse modeling techniques (i.e. Bayesian and geostatistical inversions) to assimilate CCl4 mole fractions observed from the National Oceanic and Atmospheric Administration (NOAA) flask-air sampling network over the US, and quantify its national and regional emissions during 2008 - 2012. Average national total emissions of CCl4 between 2008 and 2012 determined from these observations and an ensemble of inversions range between 2.1 and 6.1 Gg yr-1. This emission is substantially larger than the mean of 0.06 Gg/yr reported to the US EPA Toxics Release Inventory over these years, suggesting that under-reported emissions or non-reporting sources make up the bulk of CCl4 emissions from the US. But while the inventory does not account for the magnitude of observationally-derived CCl4 emissions, the regional distribution of derived and inventory emissions is similar. Furthermore, when considered relative to the distribution of uncapped landfills or population, the variability in measured mole fractions was most consistent with the distribution of industrial sources (i.e., those from the Toxics Release Inventory). Our results suggest that emissions from the US only account for a small fraction of the global on-going emissions of CCl4 (30 - 80 Gg yr-1 over this period). Finally, to ascertain the importance of the US emissions relative to the unaccounted global emission rate we considered multiple approaches to extrapolate our results to other countries and the globe.
Fractal and geostatistical methods for modeling of a fracture network
Chiles, J.P.
1988-08-01
The modeling of fracture networks is useful for fluid flow and rock mechanics studies. About 6600 fracture traces were recorded on drifts of a uranium mine in a granite massif. The traces have an extension of 0.20-20 m. The network was studied by fractal and by geostatistical methods but can be considered neither as a fractal with a constant dimension nor a set of purely randomly located fractures. Two kinds of generalization of conventional models can still provide more flexibility for the characterization of the network: (a) a nonscaling fractal model with variable similarity dimension (for a 2-D network of traces, the dimension varying from 2 for the 10-m scale to 1 for the centimeter scale, (b) a parent-daughter model with a regionalized density; the geostatistical study allows a 3-D model to be established where: fractures are assumed to be discs; fractures are grouped in clusters or swarms; and fracturation density is regionalized (with two ranges at about 30 and 300 m). The fractal model is easy to fit and to simulate along a line, but 2-D and 3-D simulations are more difficult. The geostatistical model is more complex, but easy to simulate, even in 3-D.
Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics
NASA Astrophysics Data System (ADS)
Hristopulos, Dionissios T.
2015-12-01
Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.
Chen, Xingyuan; Murakami, Haruko; Hahn, Melanie S.; Hammond, Glenn E.; Rockhold, Mark L.; Zachara, John M.; Rubin, Yoram
2012-06-01
Tracer testing under natural or forced gradient flow holds the potential to provide useful information for characterizing subsurface properties, through monitoring, modeling and interpretation of the tracer plume migration in an aquifer. Non-reactive tracer experiments were conducted at the Hanford 300 Area, along with constant-rate injection tests and electromagnetic borehole flowmeter (EBF) profiling. A Bayesian data assimilation technique, the method of anchored distributions (MAD) [Rubin et al., 2010], was applied to assimilate the experimental tracer test data with the other types of data and to infer the three-dimensional heterogeneous structure of the hydraulic conductivity in the saturated zone of the Hanford formation. In this study, the Bayesian prior information on the underlying random hydraulic conductivity field was obtained from previous field characterization efforts using the constant-rate injection tests and the EBF data. The posterior distribution of the conductivity field was obtained by further conditioning the field on the temporal moments of tracer breakthrough curves at various observation wells. MAD was implemented with the massively-parallel three-dimensional flow and transport code PFLOTRAN to cope with the highly transient flow boundary conditions at the site and to meet the computational demands of MAD. A synthetic study proved that the proposed method could effectively invert tracer test data to capture the essential spatial heterogeneity of the three-dimensional hydraulic conductivity field. Application of MAD to actual field data shows that the hydrogeological model, when conditioned on the tracer test data, can reproduce the tracer transport behavior better than the field characterized without the tracer test data. This study successfully demonstrates that MAD can sequentially assimilate multi-scale multi-type field data through a consistent Bayesian framework.
NASA Astrophysics Data System (ADS)
Chen, Xingyuan; Murakami, Haruko; Hahn, Melanie S.; Hammond, Glenn E.; Rockhold, Mark L.; Zachara, John M.; Rubin, Yoram
2012-06-01
Tracer tests performed under natural or forced gradient flow conditions can provide useful information for characterizing subsurface properties, through monitoring, modeling, and interpretation of the tracer plume migration in an aquifer. Nonreactive tracer experiments were conducted at the Hanford 300 Area, along with constant-rate injection tests and electromagnetic borehole flowmeter tests. A Bayesian data assimilation technique, the method of anchored distributions (MAD) (Rubin et al., 2010), was applied to assimilate the experimental tracer test data with the other types of data and to infer the three-dimensional heterogeneous structure of the hydraulic conductivity in the saturated zone of the Hanford formation.In this study, the Bayesian prior information on the underlying random hydraulic conductivity field was obtained from previous field characterization efforts using constant-rate injection and borehole flowmeter test data. The posterior distribution of the conductivity field was obtained by further conditioning the field on the temporal moments of tracer breakthrough curves at various observation wells. MAD was implemented with the massively parallel three-dimensional flow and transport code PFLOTRAN to cope with the highly transient flow boundary conditions at the site and to meet the computational demands of MAD. A synthetic study proved that the proposed method could effectively invert tracer test data to capture the essential spatial heterogeneity of the three-dimensional hydraulic conductivity field. Application of MAD to actual field tracer data at the Hanford 300 Area demonstrates that inverting for spatial heterogeneity of hydraulic conductivity under transient flow conditions is challenging and more work is needed.
Slater, Hannah; Michael, Edwin
2013-01-01
There is increasing interest to control or eradicate the major neglected tropical diseases. Accurate modelling of the geographic distributions of parasitic infections will be crucial to this endeavour. We used 664 community level infection prevalence data collated from the published literature in conjunction with eight environmental variables, altitude and population density, and a multivariate Bayesian generalized linear spatial model that allows explicit accounting for spatial autocorrelation and incorporation of uncertainty in input data and model parameters, to construct the first spatially-explicit map describing LF prevalence distribution in Africa. We also ran the best-fit model against predictions made by the HADCM3 and CCCMA climate models for 2050 to predict the likely distributions of LF under future climate and population changes. We show that LF prevalence is strongly influenced by spatial autocorrelation between locations but is only weakly associated with environmental covariates. Infection prevalence, however, is found to be related to variations in population density. All associations with key environmental/demographic variables appear to be complex and non-linear. LF prevalence is predicted to be highly heterogenous across Africa, with high prevalences (>20%) estimated to occur primarily along coastal West and East Africa, and lowest prevalences predicted for the central part of the continent. Error maps, however, indicate a need for further surveys to overcome problems with data scarcity in the latter and other regions. Analysis of future changes in prevalence indicates that population growth rather than climate change per se will represent the dominant factor in the predicted increase/decrease and spread of LF on the continent. We indicate that these results could play an important role in aiding the development of strategies that are best able to achieve the goals of parasite elimination locally and globally in a manner that may also account
Slater, Hannah; Michael, Edwin
2013-01-01
There is increasing interest to control or eradicate the major neglected tropical diseases. Accurate modelling of the geographic distributions of parasitic infections will be crucial to this endeavour. We used 664 community level infection prevalence data collated from the published literature in conjunction with eight environmental variables, altitude and population density, and a multivariate Bayesian generalized linear spatial model that allows explicit accounting for spatial autocorrelation and incorporation of uncertainty in input data and model parameters, to construct the first spatially-explicit map describing LF prevalence distribution in Africa. We also ran the best-fit model against predictions made by the HADCM3 and CCCMA climate models for 2050 to predict the likely distributions of LF under future climate and population changes. We show that LF prevalence is strongly influenced by spatial autocorrelation between locations but is only weakly associated with environmental covariates. Infection prevalence, however, is found to be related to variations in population density. All associations with key environmental/demographic variables appear to be complex and non-linear. LF prevalence is predicted to be highly heterogenous across Africa, with high prevalences (>20%) estimated to occur primarily along coastal West and East Africa, and lowest prevalences predicted for the central part of the continent. Error maps, however, indicate a need for further surveys to overcome problems with data scarcity in the latter and other regions. Analysis of future changes in prevalence indicates that population growth rather than climate change per se will represent the dominant factor in the predicted increase/decrease and spread of LF on the continent. We indicate that these results could play an important role in aiding the development of strategies that are best able to achieve the goals of parasite elimination locally and globally in a manner that may also account
Model-Based Geostatistical Mapping of the Prevalence of Onchocerca volvulus in West Africa
O’Hanlon, Simon J.; Slater, Hannah C.; Cheke, Robert A.; Boatin, Boakye A.; Coffeng, Luc E.; Pion, Sébastien D. S.; Boussinesq, Michel; Zouré, Honorat G. M.; Stolk, Wilma A.; Basáñez, María-Gloria
2016-01-01
Background The initial endemicity (pre-control prevalence) of onchocerciasis has been shown to be an important determinant of the feasibility of elimination by mass ivermectin distribution. We present the first geostatistical map of microfilarial prevalence in the former Onchocerciasis Control Programme in West Africa (OCP) before commencement of antivectorial and antiparasitic interventions. Methods and Findings Pre-control microfilarial prevalence data from 737 villages across the 11 constituent countries in the OCP epidemiological database were used as ground-truth data. These 737 data points, plus a set of statistically selected environmental covariates, were used in a Bayesian model-based geostatistical (B-MBG) approach to generate a continuous surface (at pixel resolution of 5 km x 5km) of microfilarial prevalence in West Africa prior to the commencement of the OCP. Uncertainty in model predictions was measured using a suite of validation statistics, performed on bootstrap samples of held-out validation data. The mean Pearson’s correlation between observed and estimated prevalence at validation locations was 0.693; the mean prediction error (average difference between observed and estimated values) was 0.77%, and the mean absolute prediction error (average magnitude of difference between observed and estimated values) was 12.2%. Within OCP boundaries, 17.8 million people were deemed to have been at risk, 7.55 million to have been infected, and mean microfilarial prevalence to have been 45% (range: 2–90%) in 1975. Conclusions and Significance This is the first map of initial onchocerciasis prevalence in West Africa using B-MBG. Important environmental predictors of infection prevalence were identified and used in a model out-performing those without spatial random effects or environmental covariates. Results may be compared with recent epidemiological mapping efforts to find areas of persisting transmission. These methods may be extended to areas where
Bayesian Model Averaging for Propensity Score Analysis
ERIC Educational Resources Information Center
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Bayesian stable isotope mixing models
In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixtur...
NASA Astrophysics Data System (ADS)
Reyes, J.; Vizuete, W.; Serre, M. L.; Xu, Y.
2015-12-01
The EPA employs a vast monitoring network to measure ambient PM2.5 concentrations across the United States with one of its goals being to quantify exposure within the population. However, there are several areas of the country with sparse monitoring spatially and temporally. One means to fill in these monitoring gaps is to use PM2.5 modeled estimates from Chemical Transport Models (CTMs) specifically the Community Multi-scale Air Quality (CMAQ) model. CMAQ is able to provide complete spatial coverage but is subject to systematic and random error due to model uncertainty. Due to the deterministic nature of CMAQ, often these uncertainties are not quantified. Much effort is employed to quantify the efficacy of these models through different metrics of model performance. Currently evaluation is specific to only locations with observed data. Multiyear studies across the United States are challenging because the error and model performance of CMAQ are not uniform over such large space/time domains. Error changes regionally and temporally. Because of the complex mix of species that constitute PM2.5, CMAQ error is also a function of increasing PM2.5 concentration. To address this issue we introduce a model performance evaluation for PM2.5 CMAQ that is regionalized and non-linear. This model performance evaluation leads to error quantification for each CMAQ grid. Areas and time periods of error being better qualified. The regionalized error correction approach is non-linear and is therefore more flexible at characterizing model performance than approaches that rely on linearity assumptions and assume homoscedasticity of CMAQ predictions errors. Corrected CMAQ data are then incorporated into the modern geostatistical framework of Bayesian Maximum Entropy (BME). Through cross validation it is shown that incorporating error-corrected CMAQ data leads to more accurate estimates than just using observed data by themselves.
Geostatistical Modeling of Evolving Landscapes by Means of Image Quilting
NASA Astrophysics Data System (ADS)
Mendes, J. H.; Caers, J.; Scheidt, C.
2015-12-01
Realistic geological representation of subsurface heterogeneity remains an important outstanding challenge. While many geostatistical methods exist for representing sedimentary systems, such as multiple-point geostatistics, rule-based methods or Boolean methods, the question of what the prior uncertainty on parameters (or training images) of such algorithms are, remains outstanding. In this initial work, we investigate the use of flume experiments to constrain better such prior uncertainty and to start understanding what information should be provided to geostatistical algorithms. In particular, we study the use of image quilting as a novel multiple-point method for generating fast geostatistical realizations once a training image is provided. Image quilting is a method emanating from computer graphics where patterns are extracted from training images and then stochastically quilted along a raster path to create stochastic variation of the stated training image. In this initial study, we use a flume experiment and extract 10 training images as representative for the variability of the evolving landscape over a period of 136 minutes. The training images consists of wet/dry regions obtained from overhead shots taken over the flume experiment. To investigate whether such image quilting reproduces the same variability of the evolving landscape in terms of wet/dry regions, we generate multiple realizations with all 10 training images and compare that variability with the variability seen in the entire flume experiment. By proper tuning of the quilting parameters we find generally reasonable agreement with the flume experiment.
Bayesian Model Averaging for Propensity Score Analysis.
Kaplan, David; Chen, Jianshen
2014-01-01
This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the model-averaged propensity score estimates produced by the R package BMA but that ignores uncertainty in the propensity score. We also provide a fully Bayesian model averaging approach via Markov chain Monte Carlo sampling (MCMC) to account for uncertainty in both parameters and models. A detailed study of our approach examines the differences in the causal estimate when incorporating noninformative versus informative priors in the model averaging stage. We examine these approaches under common methods of propensity score implementation. In addition, we evaluate the impact of changing the size of Occam's window used to narrow down the range of possible models. We also assess the predictive performance of both Bayesian model averaging propensity score approaches and compare it with the case without Bayesian model averaging. Overall, results show that both Bayesian model averaging propensity score approaches recover the treatment effect estimates well and generally provide larger uncertainty estimates, as expected. Both Bayesian model averaging approaches offer slightly better prediction of the propensity score compared with the Bayesian approach with a single propensity score equation. Covariate balance checks for the case study show that both Bayesian model averaging approaches offer good balance. The fully Bayesian model averaging approach also provides posterior probability intervals of the balance indices.
Bayesian Networks for Social Modeling
Whitney, Paul D.; White, Amanda M.; Walsh, Stephen J.; Dalton, Angela C.; Brothers, Alan J.
2011-03-28
This paper describes a body of work developed over the past five years. The work addresses the use of Bayesian network (BN) models for representing and predicting social/organizational behaviors. The topics covered include model construction, validation, and use. These topics show the bulk of the lifetime of such model, beginning with construction, moving to validation and other aspects of model ‘critiquing’, and finally demonstrating how the modeling approach might be used to inform policy analysis. To conclude, we discuss limitations of using BN for this activity and suggest remedies to address those limitations. The primary benefits of using a well-developed computational, mathematical, and statistical modeling structure, such as BN, are 1) there are significant computational, theoretical and capability bases on which to build 2) ability to empirically critique the model, and potentially evaluate competing models for a social/behavioral phenomena.
Brain lesion detection in MRI with fuzzy and geostatistical models.
Pham, Tuan D
2010-01-01
Automated image detection of white matter changes of the brain is essentially helpful in providing a quantitative measure for studying the association of white matter lesions with other types of biomedical data. Such study allows the possibility of several medical hypothesis validations which lead to therapeutic treatment and prevention. This paper presents a new clustering-based segmentation approach for detecting white matter changes in magnetic resonance imaging with particular reference to cognitive decline in the elderly. The proposed method is formulated using the principles of fuzzy c-means algorithm and geostatistics.
Modeling Diagnostic Assessments with Bayesian Networks
ERIC Educational Resources Information Center
Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego
2007-01-01
This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…
NASA Astrophysics Data System (ADS)
Yan, Hongxiang; Moradkhani, Hamid
2016-08-01
Assimilation of satellite soil moisture and streamflow data into a distributed hydrologic model has received increasing attention over the past few years. This study provides a detailed analysis of the joint and separate assimilation of streamflow and Advanced Scatterometer (ASCAT) surface soil moisture into a distributed Sacramento Soil Moisture Accounting (SAC-SMA) model, with the use of recently developed particle filter-Markov chain Monte Carlo (PF-MCMC) method. Performance is assessed over the Salt River Watershed in Arizona, which is one of the watersheds without anthropogenic effects in Model Parameter Estimation Experiment (MOPEX). A total of five data assimilation (DA) scenarios are designed and the effects of the locations of streamflow gauges and the ASCAT soil moisture on the predictions of soil moisture and streamflow are assessed. In addition, a geostatistical model is introduced to overcome the significantly biased satellite soil moisture and also discontinuity issue. The results indicate that: (1) solely assimilating outlet streamflow can lead to biased soil moisture estimation; (2) when the study area can only be partially covered by the satellite data, the geostatistical approach can estimate the soil moisture for those uncovered grid cells; (3) joint assimilation of streamflow and soil moisture from geostatistical modeling can further improve the surface soil moisture prediction. This study recommends that the geostatistical model is a helpful tool to aid the remote sensing technique and the hydrologic DA study.
NASA Astrophysics Data System (ADS)
Kosack, Christian; Vogt, Christian; Rath, Volker; Marquart, Gabriele
2010-05-01
The knowledge of the permeability distribution at depth is of primary concern for any geothermal reservoir engineering. However, permeability might change over orders of magnitude even for a single rock type and is additionally controlled by tectonic or engineered fracturing of the rocks. During reservoir exploration pumping tests are regularly performed where tracer marked water is pumped in one borehole and retrieved at one or a few others. At the European Enhanced Geothermal System (EGS) test site at Soultz-sous-Forêts three wells had been drilled in the granitic bedrock down to 4 to 5 km and were hydraulically stimulated to enhance the hydraulic connectivity between the wells. In July 2005, a tracer circulation test was carried out in order to estimate the changes of the hydraulic properties. Therefore a tracer was injected into the well GPK3 for 19 hours at a rate of 0.015 m3 s-1 and a concentration of 0.389 mol m-3. Tracer concentration was measured in the production wells over the following 5 months, while the produced water was re-injected into GPK3. This experiment demonstrated a good hydraulic connection between GPK3 and one of the production wells, GPK2, while a very low connectivity was observed in the other one, GPK4. We tested three different approaches simulating the pumping experiment with the numerical simulator shemat_suite in a simplified 3D model of the site in order to study their respective potential to estimate a reliable permeability distribution for the Soultz reservoir: A full-physics gradient-based Bayesian inversion, a massive Monte Carlo approach with geostatistic analysis, and an Ensemble-Kalman-Filter (EnKF) assimilation. A common feature in all models is a high permeability zone which acts as main flow area and transports most of the tracer. It is assumed to be associated with the fault zone cutting through the boreholes GPK2 and GPK3. With the Bayesian Inversion we were able to estimate a parameter set consisting of porosity
Bayesian Models of Individual Differences
Powell, Georgie; Meredith, Zoe; McMillin, Rebecca; Freeman, Tom C. A.
2016-01-01
According to Bayesian models, perception and cognition depend on the optimal combination of noisy incoming evidence with prior knowledge of the world. Individual differences in perception should therefore be jointly determined by a person’s sensitivity to incoming evidence and his or her prior expectations. It has been proposed that individuals with autism have flatter prior distributions than do nonautistic individuals, which suggests that prior variance is linked to the degree of autistic traits in the general population. We tested this idea by studying how perceived speed changes during pursuit eye movement and at low contrast. We found that individual differences in these two motion phenomena were predicted by differences in thresholds and autistic traits when combined in a quantitative Bayesian model. Our findings therefore support the flatter-prior hypothesis and suggest that individual differences in prior expectations are more systematic than previously thought. In order to be revealed, however, individual differences in sensitivity must also be taken into account. PMID:27770059
Integrated geostatistics for modeling fluid contacts and shales in Prudhoe Bay
Perez, G.; Chopra, A.K.; Severson, C.D.
1997-12-01
Geostatistics techniques are being used increasingly to model reservoir heterogeneity at a wide range of scales. A variety of techniques is now available with differing underlying assumptions, complexity, and applications. This paper introduces a novel method of geostatistics to model dynamic gas-oil contacts and shales in the Prudhoe Bay reservoir. The method integrates reservoir description and surveillance data within the same geostatistical framework. Surveillance logs and shale data are transformed to indicator variables. These variables are used to evaluate vertical and horizontal spatial correlation and cross-correlation of gas and shale at different times and to develop variogram models. Conditional simulation techniques are used to generate multiple three-dimensional (3D) descriptions of gas and shales that provide a measure of uncertainty. These techniques capture the complex 3D distribution of gas-oil contacts through time. The authors compare results of the geostatistical method with conventional techniques as well as with infill wells drilled after the study. Predicted gas-oil contacts and shale distributions are in close agreement with gas-oil contacts observed at infill wells.
Properties of the Bayesian Knowledge Tracing Model
ERIC Educational Resources Information Center
van de Sande, Brett
2013-01-01
Bayesian Knowledge Tracing is used very widely to model student learning. It comes in two different forms: The first form is the Bayesian Knowledge Tracing "hidden Markov model" which predicts the probability of correct application of a skill as a function of the number of previous opportunities to apply that skill and the model…
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Geostatistical applications in ground-water modeling in south-central Kansas
Ma, T.-S.; Sophocleous, M.; Yu, Y.-S.
1999-01-01
This paper emphasizes the supportive role of geostatistics in applying ground-water models. Field data of 1994 ground-water level, bedrock, and saltwater-freshwater interface elevations in south-central Kansas were collected and analyzed using the geostatistical approach. Ordinary kriging was adopted to estimate initial conditions for ground-water levels and topography of the Permian bedrock at the nodes of a finite difference grid used in a three-dimensional numerical model. Cokriging was used to estimate initial conditions for the saltwater-freshwater interface. An assessment of uncertainties in the estimated data is presented. The kriged and cokriged estimation variances were analyzed to evaluate the adequacy of data employed in the modeling. Although water levels and bedrock elevations are well described by spherical semivariogram models, additional data are required for better cokriging estimation of the interface data. The geostatistically analyzed data were employed in a numerical model of the Siefkes site in the project area. Results indicate that the computed chloride concentrations and ground-water drawdowns reproduced the observed data satisfactorily.This paper emphasizes the supportive role of geostatistics in applying ground-water models. Field data of 1994 ground-water level, bedrock, and saltwater-freshwater interface elevations in south-central Kansas were collected and analyzed using the geostatistical approach. Ordinary kriging was adopted to estimate initial conditions for ground-water levels and topography of the Permian bedrock at the nodes of a finite difference grid used in a three-dimensional numerical model. Cokriging was used to estimate initial conditions for the saltwater-freshwater interface. An assessment of uncertainties in the estimated data is presented. The kriged and cokriged estimation variances were analyzed to evaluate the adequacy of data employed in the modeling. Although water levels and bedrock elevations are well described
A geostatistical methodology to assess the accuracy of unsaturated flow models
Smoot, J.L.; Williams, R.E.
1996-04-01
The Pacific Northwest National Laboratory spatiotemporal movement of water injected into (PNNL) has developed a Hydrologic unsaturated sediments at the Hanford Site in Evaluation Methodology (HEM) to assist the Washington State was used to develop a new U.S. Nuclear Regulatory Commission in method for evaluating mathematical model evaluating the potential that infiltrating meteoric predictions. Measured water content data were water will produce leachate at commercial low- interpolated geostatistically to a 16 x 16 x 36 level radioactive waste disposal sites. Two key grid at several time intervals. Then a issues are raised in the HEM: (1) evaluation of mathematical model was used to predict water mathematical models that predict facility content at the same grid locations at the selected performance, and (2) estimation of the times. Node-by-node comparison of the uncertainty associated with these mathematical mathematical model predictions with the model predictions. The technical objective of geostatistically interpolated values was this research is to adapt geostatistical tools conducted. The method facilitates a complete commonly used for model parameter estimation accounting and categorization of model error at to the problem of estimating the spatial every node. The comparison suggests that distribution of the dependent variable to be model results generally are within measurement calculated by the model. To fulfill this error. The worst model error occurs in silt objective, a database describing the lenses and is in excess of measurement error.
Validation and comparison of geostatistical and spline models for spatial stream networks.
Rushworth, A M; Peterson, E E; Ver Hoef, J M; Bowman, A W
2015-08-01
Scientists need appropriate spatial-statistical models to account for the unique features of stream network data. Recent advances provide a growing methodological toolbox for modelling these data, but general-purpose statistical software has only recently emerged, with little information about when to use different approaches. We implemented a simulation study to evaluate and validate geostatistical models that use continuous distances, and penalised spline models that use a finite discrete approximation for stream networks. Data were simulated from the geostatistical model, with performance measured by empirical prediction and fixed effects estimation. We found that both models were comparable in terms of squared error, with a slight advantage for the geostatistical models. Generally, both methods were unbiased and had valid confidence intervals. The most marked differences were found for confidence intervals on fixed-effect parameter estimates, where, for small sample sizes, the spline models underestimated variance. However, the penalised spline models were always more computationally efficient, which may be important for real-time prediction and estimation. Thus, decisions about which method to use must be influenced by the size and format of the data set, in addition to the characteristics of the environmental process and the modelling goals. ©2015 The Authors. Environmetrics published by John Wiley & Sons, Ltd.
Bayesian Calibration of Microsimulation Models.
Rutter, Carolyn M; Miglioretti, Diana L; Savarino, James E
2009-12-01
Microsimulation models that describe disease processes synthesize information from multiple sources and can be used to estimate the effects of screening and treatment on cancer incidence and mortality at a population level. These models are characterized by simulation of individual event histories for an idealized population of interest. Microsimulation models are complex and invariably include parameters that are not well informed by existing data. Therefore, a key component of model development is the choice of parameter values. Microsimulation model parameter values are selected to reproduce expected or known results though the process of model calibration. Calibration may be done by perturbing model parameters one at a time or by using a search algorithm. As an alternative, we propose a Bayesian method to calibrate microsimulation models that uses Markov chain Monte Carlo. We show that this approach converges to the target distribution and use a simulation study to demonstrate its finite-sample performance. Although computationally intensive, this approach has several advantages over previously proposed methods, including the use of statistical criteria to select parameter values, simultaneous calibration of multiple parameters to multiple data sources, incorporation of information via prior distributions, description of parameter identifiability, and the ability to obtain interval estimates of model parameters. We develop a microsimulation model for colorectal cancer and use our proposed method to calibrate model parameters. The microsimulation model provides a good fit to the calibration data. We find evidence that some parameters are identified primarily through prior distributions. Our results underscore the need to incorporate multiple sources of variability (i.e., due to calibration data, unknown parameters, and estimated parameters and predicted values) when calibrating and applying microsimulation models.
Bayesian model selection and isocurvature perturbations
NASA Astrophysics Data System (ADS)
Beltrán, María; García-Bellido, Juan; Lesgourgues, Julien; Liddle, Andrew R.; Slosar, Anže
2005-03-01
Present cosmological data are well explained assuming purely adiabatic perturbations, but an admixture of isocurvature perturbations is also permitted. We use a Bayesian framework to compare the performance of cosmological models including isocurvature modes with the purely adiabatic case; this framework automatically and consistently penalizes models which use more parameters to fit the data. We compute the Bayesian evidence for fits to a data set comprised of WMAP and other microwave anisotropy data, the galaxy power spectrum from 2dFGRS and SDSS, and Type Ia supernovae luminosity distances. We find that Bayesian model selection favors the purely adiabatic models, but so far only at low significance.
NASA Astrophysics Data System (ADS)
Linde, Niklas; Lochbühler, Tobias; Dogan, Mine; Van Dam, Remke L.
2015-12-01
We propose a new framework to compare alternative geostatistical descriptions of a given site. Multiple realizations of each of the considered geostatistical models and their corresponding tomograms (based on inversion of noise-contaminated simulated data) are used as a multivariate training image. The training image is scanned with a direct sampling algorithm to obtain conditional realizations of hydraulic conductivity that are not only in agreement with the geostatistical model, but also honor the spatially varying resolution of the site-specific tomogram. Model comparison is based on the quality of the simulated geophysical data from the ensemble of conditional realizations. The tomogram in this study is obtained by inversion of cross-hole ground-penetrating radar (GPR) first-arrival travel time data acquired at the MAcro-Dispersion Experiment (MADE) site in Mississippi (USA). Various heterogeneity descriptions ranging from multi-Gaussian fields to fields with complex multiple-point statistics inferred from outcrops are considered. Under the assumption that the relationship between porosity and hydraulic conductivity inferred from local measurements is valid, we find that conditioned multi-Gaussian realizations and derivatives thereof can explain the crosshole geophysical data. A training image based on an aquifer analog from Germany was found to be in better agreement with the geophysical data than the one based on the local outcrop, which appears to under-represent high hydraulic conductivity zones. These findings are only based on the information content in a single resolution-limited tomogram and extending the analysis to tracer or higher resolution surface GPR data might lead to different conclusions (e.g., that discrete facies boundaries are necessary). Our framework makes it possible to identify inadequate geostatistical models and petrophysical relationships, effectively narrowing the space of possible heterogeneity representations.
Karagiannis-Voules, Dimitrios-Alexios; Odermatt, Peter; Biedermann, Patricia; Khieu, Virak; Schär, Fabian; Muth, Sinuon; Utzinger, Jürg; Vounatsou, Penelope
2015-01-01
Soil-transmitted helminth infections are intimately connected with poverty. Yet, there is a paucity of using socioeconomic proxies in spatially explicit risk profiling. We compiled household-level socioeconomic data pertaining to sanitation, drinking-water, education and nutrition from readily available Demographic and Health Surveys, Multiple Indicator Cluster Surveys and World Health Surveys for Cambodia and aggregated the data at village level. We conducted a systematic review to identify parasitological surveys and made every effort possible to extract, georeference and upload the data in the open source Global Neglected Tropical Diseases database. Bayesian geostatistical models were employed to spatially align the village-aggregated socioeconomic predictors with the soil-transmitted helminth infection data. The risk of soil-transmitted helminth infection was predicted at a grid of 1×1km covering Cambodia. Additionally, two separate individual-level spatial analyses were carried out, for Takeo and Preah Vihear provinces, to assess and quantify the association between soil-transmitted helminth infection and socioeconomic indicators at an individual level. Overall, we obtained socioeconomic proxies from 1624 locations across the country. Surveys focussing on soil-transmitted helminth infections were extracted from 16 sources reporting data from 238 unique locations. We found that the risk of soil-transmitted helminth infection from 2000 onwards was considerably lower than in surveys conducted earlier. Population-adjusted prevalences for school-aged children from 2000 onwards were 28.7% for hookworm, 1.5% for Ascaris lumbricoides and 0.9% for Trichuris trichiura. Surprisingly, at the country-wide analyses, we did not find any significant association between soil-transmitted helminth infection and village-aggregated socioeconomic proxies. Based also on the individual-level analyses we conclude that socioeconomic proxies might not be good predictors at an
A conceptual sedimentological-geostatistical model of aquifer heterogeneity based on outcrop studies
Davis, J.M.
1994-01-01
Three outcrop studies were conducted in deposits of different depositional environments. At each site, permeability measurements were obtained with an air-minipermeameter developed as part of this study. In addition, the geological units were mapped with either surveying, photographs, or both. Geostatistical analysis of the permeability data was performed to estimate the characteristics of the probability distribution function and the spatial correlation structure. The information obtained from the geological mapping was then compared with the results of the geostatistical analysis for any relationships that may exist. The main field site was located in the Albuquerque Basin of central New Mexico at an outcrop of the Pliocene-Pleistocene Sierra Ladrones Formation. The second study was conducted on the walls of waste pits in alluvial fan deposits at the Nevada Test Site. The third study was conducted on an outcrop of an eolian deposit (miocene) south of Socorro, New Mexico. The results of the three studies were then used to construct a conceptual model relating depositional environment to geostatistical models of heterogeneity. The model presented is largely qualitative but provides a basis for further hypothesis formulation and testing.
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Illman, Walter A.; Wöhling, Thomas; Nowak, Wolfgang
2015-12-01
Groundwater modelers face the challenge of how to assign representative parameter values to the studied aquifer. Several approaches are available to parameterize spatial heterogeneity in aquifer parameters. They differ in their conceptualization and complexity, ranging from homogeneous models to heterogeneous random fields. While it is common practice to invest more effort into data collection for models with a finer resolution of heterogeneities, there is a lack of advice which amount of data is required to justify a certain level of model complexity. In this study, we propose to use concepts related to Bayesian model selection to identify this balance. We demonstrate our approach on the characterization of a heterogeneous aquifer via hydraulic tomography in a sandbox experiment (Illman et al., 2010). We consider four increasingly complex parameterizations of hydraulic conductivity: (1) Effective homogeneous medium, (2) geology-based zonation, (3) interpolation by pilot points, and (4) geostatistical random fields. First, we investigate the shift in justified complexity with increasing amount of available data by constructing a model confusion matrix. This matrix indicates the maximum level of complexity that can be justified given a specific experimental setup. Second, we determine which parameterization is most adequate given the observed drawdown data. Third, we test how the different parameterizations perform in a validation setup. The results of our test case indicate that aquifer characterization via hydraulic tomography does not necessarily require (or justify) a geostatistical description. Instead, a zonation-based model might be a more robust choice, but only if the zonation is geologically adequate.
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
ERIC Educational Resources Information Center
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Bayesian modeling of unknown diseases for biosurveillance.
Shen, Yanna; Cooper, Gregory F
2009-11-14
This paper investigates Bayesian modeling of unknown causes of events in the context of disease-outbreak detection. We introduce a Bayesian approach that models and detects both (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A key contribution of this paper is that it introduces a Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has broad applicability in medical informatics, where the space of known causes of outcomes of interest is seldom complete.
Current Challenges in Bayesian Model Choice
NASA Astrophysics Data System (ADS)
Clyde, M. A.; Berger, J. O.; Bullard, F.; Ford, E. B.; Jefferys, W. H.; Luo, R.; Paulo, R.; Loredo, T.
2007-11-01
Model selection (and the related issue of model uncertainty) arises in many astronomical problems, and, in particular, has been one of the focal areas of the Exoplanet working group under the SAMSI (Statistics and Applied Mathematical Sciences Institute) Astrostatistcs Exoplanet program. We provide an overview of the Bayesian approach to model selection and highlight the challenges involved in implementing Bayesian model choice in four stylized problems. We review some of the current methods used by statisticians and astronomers and present recent developments in the area. We discuss the applicability, computational challenges, and performance of suggested methods and conclude with recommendations and open questions.
Hydraulic Conductivity Estimation using Bayesian Model Averaging and Generalized Parameterization
NASA Astrophysics Data System (ADS)
Tsai, F. T.; Li, X.
2006-12-01
Non-uniqueness in parameterization scheme is an inherent problem in groundwater inverse modeling due to limited data. To cope with the non-uniqueness problem of parameterization, we introduce a Bayesian Model Averaging (BMA) method to integrate a set of selected parameterization methods. The estimation uncertainty in BMA includes the uncertainty in individual parameterization methods as the within-parameterization variance and the uncertainty from using different parameterization methods as the between-parameterization variance. Moreover, the generalized parameterization (GP) method is considered in the geostatistical framework in this study. The GP method aims at increasing the flexibility of parameterization through the combination of a zonation structure and an interpolation method. The use of BMP with GP avoids over-confidence in a single parameterization method. A normalized least-squares estimation (NLSE) is adopted to calculate the posterior probability for each GP. We employee the adjoint state method for the sensitivity analysis on the weighting coefficients in the GP method. The adjoint state method is also applied to the NLSE problem. The proposed methodology is implemented to the Alamitos Barrier Project (ABP) in California, where the spatially distributed hydraulic conductivity is estimated. The optimal weighting coefficients embedded in GP are identified through the maximum likelihood estimation (MLE) where the misfits between the observed and calculated groundwater heads are minimized. The conditional mean and conditional variance of the estimated hydraulic conductivity distribution using BMA are obtained to assess the estimation uncertainty.
Can Geostatistical Models Represent Nature's Variability? An Analysis Using Flume Experiments
NASA Astrophysics Data System (ADS)
Scheidt, C.; Fernandes, A. M.; Paola, C.; Caers, J.
2015-12-01
The lack of understanding in the Earth's geological and physical processes governing sediment deposition render subsurface modeling subject to large uncertainty. Geostatistics is often used to model uncertainty because of its capability to stochastically generate spatially varying realizations of the subsurface. These methods can generate a range of realizations of a given pattern - but how representative are these of the full natural variability? And how can we identify the minimum set of images that represent this natural variability? Here we use this minimum set to define the geostatistical prior model: a set of training images that represent the range of patterns generated by autogenic variability in the sedimentary environment under study. The proper definition of the prior model is essential in capturing the variability of the depositional patterns. This work starts with a set of overhead images from an experimental basin that showed ongoing autogenic variability. We use the images to analyze the essential characteristics of this suite of patterns. In particular, our goal is to define a prior model (a minimal set of selected training images) such that geostatistical algorithms, when applied to this set, can reproduce the full measured variability. A necessary prerequisite is to define a measure of variability. In this study, we measure variability using a dissimilarity distance between the images. The distance indicates whether two snapshots contain similar depositional patterns. To reproduce the variability in the images, we apply an MPS algorithm to the set of selected snapshots of the sedimentary basin that serve as training images. The training images are chosen from among the initial set by using the distance measure to ensure that only dissimilar images are chosen. Preliminary investigations show that MPS can reproduce fairly accurately the natural variability of the experimental depositional system. Furthermore, the selected training images provide
Posterior Predictive Model Checking in Bayesian Networks
ERIC Educational Resources Information Center
Crawford, Aaron
2014-01-01
This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex…
An Integrated Bayesian Model for DIF Analysis
ERIC Educational Resources Information Center
Soares, Tufi M.; Goncalves, Flavio B.; Gamerman, Dani
2009-01-01
In this article, an integrated Bayesian model for differential item functioning (DIF) analysis is proposed. The model is integrated in the sense of modeling the responses along with the DIF analysis. This approach allows DIF detection and explanation in a simultaneous setup. Previous empirical studies and/or subjective beliefs about the item…
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhaïl
2013-04-01
The present research deals with the exploration and modeling of a complex dataset of 200 measurement points of sediment pollution by heavy metals in Lake Geneva. The fundamental idea was to use multivariate Artificial Neural Networks (ANN) along with geostatistical models and tools in order to improve the accuracy and the interpretability of data modeling. The results obtained with ANN were compared to those of traditional geostatistical algorithms like ordinary (co)kriging and (co)kriging with an external drift. Exploratory data analysis highlighted a great variety of relationships (i.e. linear, non-linear, independence) between the 11 variables of the dataset (i.e. Cadmium, Mercury, Zinc, Copper, Titanium, Chromium, Vanadium and Nickel as well as the spatial coordinates of the measurement points and their depth). Then, exploratory spatial data analysis (i.e. anisotropic variography, local spatial correlations and moving window statistics) was carried out. It was shown that the different phenomena to be modeled were characterized by high spatial anisotropies, complex spatial correlation structures and heteroscedasticity. A feature selection procedure based on General Regression Neural Networks (GRNN) was also applied to create subsets of variables enabling to improve the predictions during the modeling phase. The basic modeling was conducted using a Multilayer Perceptron (MLP) which is a workhorse of ANN. MLP models are robust and highly flexible tools which can incorporate in a nonlinear manner different kind of high-dimensional information. In the present research, the input layer was made of either two (spatial coordinates) or three neurons (when depth as auxiliary information could possibly capture an underlying trend) and the output layer was composed of one (univariate MLP) to eight neurons corresponding to the heavy metals of the dataset (multivariate MLP). MLP models with three input neurons can be referred to as Artificial Neural Networks with EXternal
Survey of Bayesian Models for Modelling of Stochastic Temporal Processes
Ng, B
2006-10-12
This survey gives an overview of popular generative models used in the modeling of stochastic temporal systems. In particular, this survey is organized into two parts. The first part discusses the discrete-time representations of dynamic Bayesian networks and dynamic relational probabilistic models, while the second part discusses the continuous-time representation of continuous-time Bayesian networks.
Modelling the presence of disease under spatial misalignment using Bayesian latent Gaussian models.
Barber, Xavier; Conesa, David; Lladosa, Silvia; López-Quílez, Antonio
2016-04-18
Modelling patterns of the spatial incidence of diseases using local environmental factors has been a growing problem in the last few years. Geostatistical models have become popular lately because they allow estimating and predicting the underlying disease risk and relating it with possible risk factors. Our approach to these models is based on the fact that the presence/absence of a disease can be expressed with a hierarchical Bayesian spatial model that incorporates the information provided by the geographical and environmental characteristics of the region of interest. Nevertheless, our main interest here is to tackle the misalignment problem arising when information about possible covariates are partially (or totally) different than those of the observed locations and those in which we want to predict. As a result, we present two different models depending on the fact that there is uncertainty on the covariates or not. In both cases, Bayesian inference on the parameters and prediction of presence/absence in new locations are made by considering the model as a latent Gaussian model, which allows the use of the integrated nested Laplace approximation. In particular, the spatial effect is implemented with the stochastic partial differential equation approach. The methodology is evaluated on the presence of the Fasciola hepatica in Galicia, a North-West region of Spain.
Building on crossvalidation for increasing the quality of geostatistical modeling
Olea, R.A.
2012-01-01
The random function is a mathematical model commonly used in the assessment of uncertainty associated with a spatially correlated attribute that has been partially sampled. There are multiple algorithms for modeling such random functions, all sharing the requirement of specifying various parameters that have critical influence on the results. The importance of finding ways to compare the methods and setting parameters to obtain results that better model uncertainty has increased as these algorithms have grown in number and complexity. Crossvalidation has been used in spatial statistics, mostly in kriging, for the analysis of mean square errors. An appeal of this approach is its ability to work with the same empirical sample available for running the algorithms. This paper goes beyond checking estimates by formulating a function sensitive to conditional bias. Under ideal conditions, such function turns into a straight line, which can be used as a reference for preparing measures of performance. Applied to kriging, deviations from the ideal line provide sensitivity to the semivariogram lacking in crossvalidation of kriging errors and are more sensitive to conditional bias than analyses of errors. In terms of stochastic simulation, in addition to finding better parameters, the deviations allow comparison of the realizations resulting from the applications of different methods. Examples show improvements of about 30% in the deviations and approximately 10% in the square root of mean square errors between reasonable starting modelling and the solutions according to the new criteria. ?? 2011 US Government.
Joint space-time geostatistical model for air quality surveillance
NASA Astrophysics Data System (ADS)
Russo, A.; Soares, A.; Pereira, M. J.
2009-04-01
Air pollution and peoples' generalized concern about air quality are, nowadays, considered to be a global problem. Although the introduction of rigid air pollution regulations has reduced pollution from industry and power stations, the growing number of cars on the road poses a new pollution problem. Considering the characteristics of the atmospheric circulation and also the residence times of certain pollutants in the atmosphere, a generalized and growing interest on air quality issues led to research intensification and publication of several articles with quite different levels of scientific depth. As most natural phenomena, air quality can be seen as a space-time process, where space-time relationships have usually quite different characteristics and levels of uncertainty. As a result, the simultaneous integration of space and time is not an easy task to perform. This problem is overcome by a variety of methodologies. The use of stochastic models and neural networks to characterize space-time dispersion of air quality is becoming a common practice. The main objective of this work is to produce an air quality model which allows forecasting critical concentration episodes of a certain pollutant by means of a hybrid approach, based on the combined use of neural network models and stochastic simulations. A stochastic simulation of the spatial component with a space-time trend model is proposed to characterize critical situations, taking into account data from the past and a space-time trend from the recent past. To identify near future critical episodes, predicted values from neural networks are used at each monitoring station. In this paper, we describe the design of a hybrid forecasting tool for ambient NO2 concentrations in Lisbon, Portugal.
Hierarchical Bayesian Models of Subtask Learning
ERIC Educational Resources Information Center
Anglim, Jeromy; Wynton, Sarah K. A.
2015-01-01
The current study used Bayesian hierarchical methods to challenge and extend previous work on subtask learning consistency. A general model of individual-level subtask learning was proposed focusing on power and exponential functions with constraints to test for inconsistency. To study subtask learning, we developed a novel computer-based booking…
NASA Astrophysics Data System (ADS)
Atzberger, C.; Richter, K.
2009-09-01
The robust and accurate retrieval of vegetation biophysical variables using radiative transfer models (RTM) is seriously hampered by the ill-posedness of the inverse problem. With this research we further develop our previously published (object-based) inversion approach [Atzberger (2004)]. The object-based RTM inversion takes advantage of the geostatistical fact that the biophysical characteristics of nearby pixel are generally more similar than those at a larger distance. A two-step inversion based on PROSPECT+SAIL generated look-up-tables is presented that can be easily implemented and adapted to other radiative transfer models. The approach takes into account the spectral signatures of neighboring pixel and optimizes a common value of the average leaf angle (ALA) for all pixel of a given image object, such as an agricultural field. Using a large set of leaf area index (LAI) measurements (n = 58) acquired over six different crops of the Barrax test site, Spain), we demonstrate that the proposed geostatistical regularization yields in most cases more accurate and spatially consistent results compared to the traditional (pixel-based) inversion. Pros and cons of the approach are discussed and possible future extensions presented.
Objective Bayesian model selection for Cox regression.
Held, Leonhard; Gravestock, Isaac; Sabanés Bové, Daniel
2016-12-20
There is now a large literature on objective Bayesian model selection in the linear model based on the g-prior. The methodology has been recently extended to generalized linear models using test-based Bayes factors. In this paper, we show that test-based Bayes factors can also be applied to the Cox proportional hazards model. If the goal is to select a single model, then both the maximum a posteriori and the median probability model can be calculated. For clinical prediction of survival, we shrink the model-specific log hazard ratio estimates with subsequent calculation of the Breslow estimate of the cumulative baseline hazard function. A Bayesian model average can also be employed. We illustrate the proposed methodology with the analysis of survival data on primary biliary cirrhosis patients and the development of a clinical prediction model for future cardiovascular events based on data from the Second Manifestations of ARTerial disease (SMART) cohort study. Cross-validation is applied to compare the predictive performance with alternative model selection approaches based on Harrell's c-Index, the calibration slope and the integrated Brier score. Finally, a novel application of Bayesian variable selection to optimal conditional prediction via landmarking is described. Copyright © 2016 John Wiley & Sons, Ltd.
Local Geostatistical Models and Big Data in Hydrological and Ecological Applications
NASA Astrophysics Data System (ADS)
Hristopulos, Dionissios
2015-04-01
The advent of the big data era creates new opportunities for environmental and ecological modelling but also presents significant challenges. The availability of remote sensing images and low-cost wireless sensor networks implies that spatiotemporal environmental data to cover larger spatial domains at higher spatial and temporal resolution for longer time windows. Handling such voluminous data presents several technical and scientific challenges. In particular, the geostatistical methods used to process spatiotemporal data need to overcome the dimensionality curse associated with the need to store and invert large covariance matrices. There are various mathematical approaches for addressing the dimensionality problem, including change of basis, dimensionality reduction, hierarchical schemes, and local approximations. We present a Stochastic Local Interaction (SLI) model that can be used to model local correlations in spatial data. SLI is a random field model suitable for data on discrete supports (i.e., regular lattices or irregular sampling grids). The degree of localization is determined by means of kernel functions and appropriate bandwidths. The strength of the correlations is determined by means of coefficients. In the "plain vanilla" version the parameter set involves scale and rigidity coefficients as well as a characteristic length. The latter determines in connection with the rigidity coefficient the correlation length of the random field. The SLI model is based on statistical field theory and extends previous research on Spartan spatial random fields [2,3] from continuum spaces to explicitly discrete supports. The SLI kernel functions employ adaptive bandwidths learned from the sampling spatial distribution [1]. The SLI precision matrix is expressed explicitly in terms of the model parameter and the kernel function. Hence, covariance matrix inversion is not necessary for parameter inference that is based on leave-one-out cross validation. This property
NASA Astrophysics Data System (ADS)
Huysmans, Marijke; Dassargues, Alain
2014-05-01
In heterogeneous environments with complex geological structures, analysis of pumping and tracer tests is often problematic. Standard interpretation methods do not account for heterogeneity or simulate this heterogeneity introducing empirical zonation of the calibrated parameters or using variogram-based geostatistical techniques that are often not able to describe realistic heterogeneity in complex geological environments where e.g. sedimentary structures, multi-facies deposits, structures with large connectivity or curvi-linear structures can be present. Multiple-point geostatistics aims to overcome the limitations of the variogram and can be applied in different research domains to simulate heterogeneity in complex environments. In this project, multiple-point geostatistics is applied to the interpretation of pumping tests and a tracer test in an actual case of a sandy heterogeneous aquifer. This study allows to deduce the main advantages and disadvantages of this technique compared to variogram-based techniques for interpretation of pumping tests and tracer tests. A pumping test and a tracer test were performed in the same sandbar deposit consisting of cross-bedded units composed of materials with different grain sizes and hydraulic conductivities. The pumping test and the tracer test are analyzed with a local 3D groundwater model in which fine-scale sedimentary heterogeneity is modelled using multiple-point geostatistics. To reduce CPU and RAM requirements of the multiple-point geostatistical simulation steps, edge properties indicating the presence of irregularly-shaped surfaces are directly simulated. Results show that for the pumping test as well as for the tracer test, incorporating heterogeneity results in a better fit between observed and calculated drawdowns/concentrations. The improvement of the fit is however not as large as expected. In this paper, the reasons for these somewhat unsatisfactory results are explored and recommendations for future
Hierarchical Bayesian model updating for structural identification
NASA Astrophysics Data System (ADS)
Behmanesh, Iman; Moaveni, Babak; Lombaert, Geert; Papadimitriou, Costas
2015-12-01
A new probabilistic finite element (FE) model updating technique based on Hierarchical Bayesian modeling is proposed for identification of civil structural systems under changing ambient/environmental conditions. The performance of the proposed technique is investigated for (1) uncertainty quantification of model updating parameters, and (2) probabilistic damage identification of the structural systems. Accurate estimation of the uncertainty in modeling parameters such as mass or stiffness is a challenging task. Several Bayesian model updating frameworks have been proposed in the literature that can successfully provide the "parameter estimation uncertainty" of model parameters with the assumption that there is no underlying inherent variability in the updating parameters. However, this assumption may not be valid for civil structures where structural mass and stiffness have inherent variability due to different sources of uncertainty such as changing ambient temperature, temperature gradient, wind speed, and traffic loads. Hierarchical Bayesian model updating is capable of predicting the overall uncertainty/variability of updating parameters by assuming time-variability of the underlying linear system. A general solution based on Gibbs Sampler is proposed to estimate the joint probability distributions of the updating parameters. The performance of the proposed Hierarchical approach is evaluated numerically for uncertainty quantification and damage identification of a 3-story shear building model. Effects of modeling errors and incomplete modal data are considered in the numerical study.
Normativity, interpretation, and Bayesian models
Oaksford, Mike
2014-01-01
It has been suggested that evaluative normativity should be expunged from the psychology of reasoning. A broadly Davidsonian response to these arguments is presented. It is suggested that two distinctions, between different types of rationality, are more permeable than this argument requires and that the fundamental objection is to selecting theories that make the most rational sense of the data. It is argued that this is inevitable consequence of radical interpretation where understanding others requires assuming they share our own norms of reasoning. This requires evaluative normativity and it is shown that when asked to evaluate others’ arguments participants conform to rational Bayesian norms. It is suggested that logic and probability are not in competition and that the variety of norms is more limited than the arguments against evaluative normativity suppose. Moreover, the universality of belief ascription suggests that many of our norms are universal and hence evaluative. It is concluded that the union of evaluative normativity and descriptive psychology implicit in Davidson and apparent in the psychology of reasoning is a good thing. PMID:24860519
NASA Astrophysics Data System (ADS)
Muthusamy, Manoranjan; Schellart, Alma; Tait, Simon; Heuvelink, Gerard B. M.
2017-02-01
In this study we develop a method to estimate the spatially averaged rainfall intensity together with associated level of uncertainty using geostatistical upscaling. Rainfall data collected from a cluster of eight paired rain gauges in a 400 m × 200 m urban catchment are used in combination with spatial stochastic simulation to obtain optimal predictions of the spatially averaged rainfall intensity at any point in time within the urban catchment. The uncertainty in the prediction of catchment average rainfall intensity is obtained for multiple combinations of intensity ranges and temporal averaging intervals. The two main challenges addressed in this study are scarcity of rainfall measurement locations and non-normality of rainfall data, both of which need to be considered when adopting a geostatistical approach. Scarcity of measurement points is dealt with by pooling sample variograms of repeated rainfall measurements with similar characteristics. Normality of rainfall data is achieved through the use of normal score transformation. Geostatistical models in the form of variograms are derived for transformed rainfall intensity. Next spatial stochastic simulation which is robust to nonlinear data transformation is applied to produce realisations of rainfall fields. These realisations in transformed space are first back-transformed and next spatially aggregated to derive a random sample of the spatially averaged rainfall intensity. Results show that the prediction uncertainty comes mainly from two sources: spatial variability of rainfall and measurement error. At smaller temporal averaging intervals both these effects are high, resulting in a relatively high uncertainty in prediction. With longer temporal averaging intervals the uncertainty becomes lower due to stronger spatial correlation of rainfall data and relatively smaller measurement error. Results also show that the measurement error increases with decreasing rainfall intensity resulting in a higher
Sparse Event Modeling with Hierarchical Bayesian Kernel Methods
2016-01-05
events (and subsequently, their likelihood of occurrence) based on historical evidence of the counts of previous event occurrences. The novel Bayesian...Aug-2014 22-May-2015 Approved for Public Release; Distribution Unlimited Final Report: Sparse Event Modeling with Hierarchical Bayesian Kernel Methods...Sparse Event Modeling with Hierarchical Bayesian Kernel Methods Report Title The research objective of this proposal was to develop a predictive Bayesian
Bayesian network modelling of upper gastrointestinal bleeding
NASA Astrophysics Data System (ADS)
Aisha, Nazziwa; Shohaimi, Shamarina; Adam, Mohd Bakri
2013-09-01
Bayesian networks are graphical probabilistic models that represent causal and other relationships between domain variables. In the context of medical decision making, these models have been explored to help in medical diagnosis and prognosis. In this paper, we discuss the Bayesian network formalism in building medical support systems and we learn a tree augmented naive Bayes Network (TAN) from gastrointestinal bleeding data. The accuracy of the TAN in classifying the source of gastrointestinal bleeding into upper or lower source is obtained. The TAN achieves a high classification accuracy of 86% and an area under curve of 92%. A sensitivity analysis of the model shows relatively high levels of entropy reduction for color of the stool, history of gastrointestinal bleeding, consistency and the ratio of blood urea nitrogen to creatinine. The TAN facilitates the identification of the source of GIB and requires further validation.
Geostatistical three-dimensional modeling of oolite shoals, St. Louis Limestone, southwest Kansas
Qi, L.; Carr, T.R.; Goldstein, R.H.
2007-01-01
In the Hugoton embayment of southwestern Kansas, reservoirs composed of relatively thin (<4 m; <13.1 ft) oolitic deposits within the St. Louis Limestone have produced more than 300 million bbl of oil. The geometry and distribution of oolitic deposits control the heterogeneity of the reservoirs, resulting in exploration challenges and relatively low recovery. Geostatistical three-dimensional (3-D) models were constructed to quantify the geometry and spatial distribution of oolitic reservoirs, and the continuity of flow units within Big Bow and Sand Arroyo Creek fields. Lithofacies in uncored wells were predicted from digital logs using a neural network. The tilting effect from the Laramide orogeny was removed to construct restored structural surfaces at the time of deposition. Well data and structural maps were integrated to build 3-D models of oolitic reservoirs using stochastic simulations with geometry data. Three-dimensional models provide insights into the distribution, the external and internal geometry of oolitic deposits, and the sedimentologic processes that generated reservoir intervals. The structural highs and general structural trend had a significant impact on the distribution and orientation of the oolitic complexes. The depositional pattern and connectivity analysis suggest an overall aggradation of shallow-marine deposits during pulses of relative sea level rise followed by deepening near the top of the St. Louis Limestone. Cemented oolitic deposits were modeled as barriers and baffles and tend to concentrate at the edge of oolitic complexes. Spatial distribution of porous oolitic deposits controls the internal geometry of rock properties. Integrated geostatistical modeling methods can be applicable to other complex carbonate or siliciclastic reservoirs in shallow-marine settings. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
Enhancing multiple-point geostatistical modeling: 1. Graph theory and pattern adjustment
NASA Astrophysics Data System (ADS)
Tahmasebi, Pejman; Sahimi, Muhammad
2016-03-01
In recent years, higher-order geostatistical methods have been used for modeling of a wide variety of large-scale porous media, such as groundwater aquifers and oil reservoirs. Their popularity stems from their ability to account for qualitative data and the great flexibility that they offer for conditioning the models to hard (quantitative) data, which endow them with the capability for generating realistic realizations of porous formations with very complex channels, as well as features that are mainly a barrier to fluid flow. One group of such models consists of pattern-based methods that use a set of data points for generating stochastic realizations by which the large-scale structure and highly-connected features are reproduced accurately. The cross correlation-based simulation (CCSIM) algorithm, proposed previously by the authors, is a member of this group that has been shown to be capable of simulating multimillion cell models in a matter of a few CPU seconds. The method is, however, sensitive to pattern's specifications, such as boundaries and the number of replicates. In this paper the original CCSIM algorithm is reconsidered and two significant improvements are proposed for accurately reproducing large-scale patterns of heterogeneities in porous media. First, an effective boundary-correction method based on the graph theory is presented by which one identifies the optimal cutting path/surface for removing the patchiness and discontinuities in the realization of a porous medium. Next, a new pattern adjustment method is proposed that automatically transfers the features in a pattern to one that seamlessly matches the surrounding patterns. The original CCSIM algorithm is then combined with the two methods and is tested using various complex two- and three-dimensional examples. It should, however, be emphasized that the methods that we propose in this paper are applicable to other pattern-based geostatistical simulation methods.
Geostatistical modeling of a portion of the alluvial aquifer of Mexico City
NASA Astrophysics Data System (ADS)
Morales-Casique, E.; Medina-Ortega, P.; Escolero-Fuentes, O.; Hernandez Espriu, A.
2012-12-01
Mexico City is one of the largest cities in the world and the pressure exerted on water resources generates problems such as intensive groundwater exploitation, subsidence and groundwater pollution. Most of the main aquifer under exploitation underlies lacustrine sediments and it is composed of a highly heterogeneous mixture of alluvial deposits and volcanic rocks. Lithological records from 113 production water wells are analyzed using indicator geostatistics. The different lithological categories are grouped into four hydrofacies, where a hydrofacies is a set of lithological categories which have similar hydraulic properties. An exponential variogram model was fitted to each hydrofacies by minimizing cross validation errors. The data is then kriged to obtain the three-dimensional distribution of each hydrofacies within the alluvial aquifer of Mexico City.
Bayesian Model Selection for Group Studies
Stephan, Klaas Enno; Penny, Will D.; Daunizeau, Jean; Moran, Rosalyn J.; Friston, Karl J.
2009-01-01
Bayesian model selection (BMS) is a powerful method for determining the most likely among a set of competing hypotheses about the mechanisms that generated observed data. BMS has recently found widespread application in neuroimaging, particularly in the context of dynamic causal modelling (DCM). However, so far, combining BMS results from several subjects has relied on simple (fixed effects) metrics, e.g. the group Bayes factor (GBF), that do not account for group heterogeneity or outliers. In this paper, we compare the GBF with two random effects methods for BMS at the between-subject or group level. These methods provide inference on model-space using a classical and Bayesian perspective respectively. First, a classical (frequentist) approach uses the log model evidence as a subject-specific summary statistic. This enables one to use analysis of variance to test for differences in log-evidences over models, relative to inter-subject differences. We then consider the same problem in Bayesian terms and describe a novel hierarchical model, which is optimised to furnish a probability density on the models themselves. This new variational Bayes method rests on treating the model as a random variable and estimating the parameters of a Dirichlet distribution which describes the probabilities for all models considered. These probabilities then define a multinomial distribution over model space, allowing one to compute how likely it is that a specific model generated the data of a randomly chosen subject as well as the exceedance probability of one model being more likely than any other model. Using empirical and synthetic data, we show that optimising a conditional density of the model probabilities, given the log-evidences for each model over subjects, is more informative and appropriate than both the GBF and frequentist tests of the log-evidences. In particular, we found that the hierarchical Bayesian approach is considerably more robust than either of the other
Bayesian model selection analysis of WMAP3
Parkinson, David; Mukherjee, Pia; Liddle, Andrew R.
2006-06-15
We present a Bayesian model selection analysis of WMAP3 data using our code CosmoNest. We focus on the density perturbation spectral index n{sub S} and the tensor-to-scalar ratio r, which define the plane of slow-roll inflationary models. We find that while the Bayesian evidence supports the conclusion that n{sub S}{ne}1, the data are not yet powerful enough to do so at a strong or decisive level. If tensors are assumed absent, the current odds are approximately 8 to 1 in favor of n{sub S}{ne}1 under our assumptions, when WMAP3 data is used together with external data sets. WMAP3 data on its own is unable to distinguish between the two models. Further, inclusion of r as a parameter weakens the conclusion against the Harrison-Zel'dovich case (n{sub S}=1, r=0), albeit in a prior-dependent way. In appendices we describe the CosmoNest code in detail, noting its ability to supply posterior samples as well as to accurately compute the Bayesian evidence. We make a first public release of CosmoNest, now available at www.cosmonest.org.
Forward modeling of gravity data using geostatistically generated subsurface density variations
Phelps, Geoffrey
2016-01-01
Using geostatistical models of density variations in the subsurface, constrained by geologic data, forward models of gravity anomalies can be generated by discretizing the subsurface and calculating the cumulative effect of each cell (pixel). The results of such stochastically generated forward gravity anomalies can be compared with the observed gravity anomalies to find density models that match the observed data. These models have an advantage over forward gravity anomalies generated using polygonal bodies of homogeneous density because generating numerous realizations explores a larger region of the solution space. The stochastic modeling can be thought of as dividing the forward model into two components: that due to the shape of each geologic unit and that due to the heterogeneous distribution of density within each geologic unit. The modeling demonstrates that the internally heterogeneous distribution of density within each geologic unit can contribute significantly to the resulting calculated forward gravity anomaly. Furthermore, the stochastic models match observed statistical properties of geologic units, the solution space is more broadly explored by producing a suite of successful models, and the likelihood of a particular conceptual geologic model can be compared. The Vaca Fault near Travis Air Force Base, California, can be successfully modeled as a normal or strike-slip fault, with the normal fault model being slightly more probable. It can also be modeled as a reverse fault, although this structural geologic configuration is highly unlikely given the realizations we explored.
NASA Astrophysics Data System (ADS)
Goovaerts, P.; Avruskin, G.; Meliker, J.; Slotnick, M.; Jacquez, G.; Nriagu, J.
2003-12-01
Assessment of the health risks associated with exposure to elevated levels of arsenic in drinking water has become the subject of considerable interest and some controversy in both regulatory and public health communities. The objective of this research is to explore the factors that have contributed to the observed geographic co-clustering in bladder cancer mortality and arsenic concentrations in drinking water in Michigan. A corner stone is the building of a probabilistic space-time model of arsenic concentrations, accounting for information collected at private residential wells and the hydrogeochemistry of the area. Because of the small changes in concentration observed in time, the study has focused on the spatial variability of arsenic, which can be considerable over very short distances. Various geostatistical techniques, based either on lognormal or indicator transforms of the data to accommodate the highly skewed distribution, have been compared using a cross validation procedure. The most promising approach involves a soft indicator coding of arsenic measurements, which allows one to account for data below the detection limit and the magnitude of measurement errors. Prior probabilities of exceeding various arsenic thresholds are also derived from secondary information, such as type of bedrock and surficial material, well casing depth, using logistic regression. Both well and secondary data are combined using kriging, leading to a non-parametric assessment of the uncertainty attached to arsenic concentration at each node of a 500m grid. This geostatistical model can be used to map either the expected arsenic concentration, the probability that it exceeds any giventhreshold, or the variance of the prediction indicating where supplementary information should be collected. The accuracy and precision of these local probability distributions is assessed using cross validation.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Bayesian Nonparametric Models for Multiway Data Analysis.
Xu, Zenglin; Yan, Feng; Qi, Yuan
2015-02-01
Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches-such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)-amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g., missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models for multiway data analysis. We name these models InfTucker. These new models essentially conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or t processes with nonlinear covariance functions. Moreover, on network data, our models reduce to nonparametric stochastic blockmodels and can be used to discover latent groups and predict missing interactions. To learn the models efficiently from data, we develop a variational inference technique and explore properties of the Kronecker product for computational efficiency. Compared with a classical variational implementation, this technique reduces both time and space complexities by several orders of magnitude. On real multiway and network data, our new models achieved significantly higher prediction accuracy than state-of-art tensor decomposition methods and blockmodels.
Bayesian variable selection for latent class models.
Ghosh, Joyee; Herring, Amy H; Siega-Riz, Anna Maria
2011-09-01
In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.
A Bayesian Shrinkage Approach for AMMI Models
de Oliveira, Luciano Antonio; Nuvunga, Joel Jorge; Pamplona, Andrezza Kéllen Alves
2015-01-01
Linear-bilinear models, especially the additive main effects and multiplicative interaction (AMMI) model, are widely applicable to genotype-by-environment interaction (GEI) studies in plant breeding programs. These models allow a parsimonious modeling of GE interactions, retaining a small number of principal components in the analysis. However, one aspect of the AMMI model that is still debated is the selection criteria for determining the number of multiplicative terms required to describe the GE interaction pattern. Shrinkage estimators have been proposed as selection criteria for the GE interaction components. In this study, a Bayesian approach was combined with the AMMI model with shrinkage estimators for the principal components. A total of 55 maize genotypes were evaluated in nine different environments using a complete blocks design with three replicates. The results show that the traditional Bayesian AMMI model produces low shrinkage of singular values but avoids the usual pitfalls in determining the credible intervals in the biplot. On the other hand, Bayesian shrinkage AMMI models have difficulty with the credible interval for model parameters, but produce stronger shrinkage of the principal components, converging to GE matrices that have more shrinkage than those obtained using mixed models. This characteristic allowed more parsimonious models to be chosen, and resulted in models being selected that were similar to those obtained by the Cornelius F-test (α = 0.05) in traditional AMMI models and cross validation based on leave-one-out. This characteristic allowed more parsimonious models to be chosen and more GEI pattern retained on the first two components. The resulting model chosen by posterior distribution of singular value was also similar to those produced by the cross-validation approach in traditional AMMI models. Our method enables the estimation of credible interval for AMMI biplot plus the choice of AMMI model based on direct posterior
A Bayesian Shrinkage Approach for AMMI Models.
da Silva, Carlos Pereira; de Oliveira, Luciano Antonio; Nuvunga, Joel Jorge; Pamplona, Andrezza Kéllen Alves; Balestre, Marcio
2015-01-01
Linear-bilinear models, especially the additive main effects and multiplicative interaction (AMMI) model, are widely applicable to genotype-by-environment interaction (GEI) studies in plant breeding programs. These models allow a parsimonious modeling of GE interactions, retaining a small number of principal components in the analysis. However, one aspect of the AMMI model that is still debated is the selection criteria for determining the number of multiplicative terms required to describe the GE interaction pattern. Shrinkage estimators have been proposed as selection criteria for the GE interaction components. In this study, a Bayesian approach was combined with the AMMI model with shrinkage estimators for the principal components. A total of 55 maize genotypes were evaluated in nine different environments using a complete blocks design with three replicates. The results show that the traditional Bayesian AMMI model produces low shrinkage of singular values but avoids the usual pitfalls in determining the credible intervals in the biplot. On the other hand, Bayesian shrinkage AMMI models have difficulty with the credible interval for model parameters, but produce stronger shrinkage of the principal components, converging to GE matrices that have more shrinkage than those obtained using mixed models. This characteristic allowed more parsimonious models to be chosen, and resulted in models being selected that were similar to those obtained by the Cornelius F-test (α = 0.05) in traditional AMMI models and cross validation based on leave-one-out. This characteristic allowed more parsimonious models to be chosen and more GEI pattern retained on the first two components. The resulting model chosen by posterior distribution of singular value was also similar to those produced by the cross-validation approach in traditional AMMI models. Our method enables the estimation of credible interval for AMMI biplot plus the choice of AMMI model based on direct posterior
Model Comparison of Bayesian Semiparametric and Parametric Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Xia, Ye-Mao; Pan, Jun-Hao; Lee, Sik-Yum
2011-01-01
Structural equation models have wide applications. One of the most important issues in analyzing structural equation models is model comparison. This article proposes a Bayesian model comparison statistic, namely the "L[subscript nu]"-measure for both semiparametric and parametric structural equation models. For illustration purposes, we consider…
Schur, Nadine; Hürlimann, Eveline; Garba, Amadou; Traoré, Mamadou S.; Ndir, Omar; Ratard, Raoult C.; Tchuem Tchuenté, Louis-Albert; Kristensen, Thomas K.; Utzinger, Jürg; Vounatsou, Penelope
2011-01-01
Background Schistosomiasis is a water-based disease that is believed to affect over 200 million people with an estimated 97% of the infections concentrated in Africa. However, these statistics are largely based on population re-adjusted data originally published by Utroska and colleagues more than 20 years ago. Hence, these estimates are outdated due to large-scale preventive chemotherapy programs, improved sanitation, water resources development and management, among other reasons. For planning, coordination, and evaluation of control activities, it is essential to possess reliable schistosomiasis prevalence maps. Methodology We analyzed survey data compiled on a newly established open-access global neglected tropical diseases database (i) to create smooth empirical prevalence maps for Schistosoma mansoni and S. haematobium for individuals aged ≤20 years in West Africa, including Cameroon, and (ii) to derive country-specific prevalence estimates. We used Bayesian geostatistical models based on environmental predictors to take into account potential clustering due to common spatially structured exposures. Prediction at unobserved locations was facilitated by joint kriging. Principal Findings Our models revealed that 50.8 million individuals aged ≤20 years in West Africa are infected with either S. mansoni, or S. haematobium, or both species concurrently. The country prevalence estimates ranged between 0.5% (The Gambia) and 37.1% (Liberia) for S. mansoni, and between 17.6% (The Gambia) and 51.6% (Sierra Leone) for S. haematobium. We observed that the combined prevalence for both schistosome species is two-fold lower in Gambia than previously reported, while we found an almost two-fold higher estimate for Liberia (58.3%) than reported before (30.0%). Our predictions are likely to overestimate overall country prevalence, since modeling was based on children and adolescents up to the age of 20 years who are at highest risk of infection. Conclusion/Significance We
NASA Astrophysics Data System (ADS)
Blessent, Daniela; Therrien, René; Lemieux, Jean-Michel
2011-12-01
This paper presents numerical simulations of a series of hydraulic interference tests conducted in crystalline bedrock at Olkiluoto (Finland), a potential site for the disposal of the Finnish high-level nuclear waste. The tests are in a block of crystalline bedrock of about 0.03 km3 that contains low-transmissivity fractures. Fracture density, orientation, and fracture transmissivity are estimated from Posiva Flow Log (PFL) measurements in boreholes drilled in the rock block. On the basis of those data, a geostatistical approach relying on a transitional probability and Markov chain models is used to define a conceptual model based on stochastic fractured rock facies. Four facies are defined, from sparsely fractured bedrock to highly fractured bedrock. Using this conceptual model, three-dimensional groundwater flow is then simulated to reproduce interference pumping tests in either open or packed-off boreholes. Hydraulic conductivities of the fracture facies are estimated through automatic calibration using either hydraulic heads or both hydraulic heads and PFL flow rates as targets for calibration. The latter option produces a narrower confidence interval for the calibrated hydraulic conductivities, therefore reducing the associated uncertainty and demonstrating the usefulness of the measured PFL flow rates. Furthermore, the stochastic facies conceptual model is a suitable alternative to discrete fracture network models to simulate fluid flow in fractured geological media.
Model feedback in Bayesian propensity score estimation.
Zigler, Corwin M; Watts, Krista; Yeh, Robert W; Wang, Yun; Coull, Brent A; Dominici, Francesca
2013-03-01
Methods based on the propensity score comprise one set of valuable tools for comparative effectiveness research and for estimating causal effects more generally. These methods typically consist of two distinct stages: (1) a propensity score stage where a model is fit to predict the propensity to receive treatment (the propensity score), and (2) an outcome stage where responses are compared in treated and untreated units having similar values of the estimated propensity score. Traditional techniques conduct estimation in these two stages separately; estimates from the first stage are treated as fixed and known for use in the second stage. Bayesian methods have natural appeal in these settings because separate likelihoods for the two stages can be combined into a single joint likelihood, with estimation of the two stages carried out simultaneously. One key feature of joint estimation in this context is "feedback" between the outcome stage and the propensity score stage, meaning that quantities in a model for the outcome contribute information to posterior distributions of quantities in the model for the propensity score. We provide a rigorous assessment of Bayesian propensity score estimation to show that model feedback can produce poor estimates of causal effects absent strategies that augment propensity score adjustment with adjustment for individual covariates. We illustrate this phenomenon with a simulation study and with a comparative effectiveness investigation of carotid artery stenting versus carotid endarterectomy among 123,286 Medicare beneficiaries hospitlized for stroke in 2006 and 2007.
BAYESIAN MODEL DETERMINATION FOR GEOSTATISTICAL REGRESSION MODELS. (R829095C001)
The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Howes, Rosalind E.; Piel, Frédéric B.; Patil, Anand P.; Nyangiri, Oscar A.; Gething, Peter W.; Dewi, Mewahyu; Hogg, Mariana M.; Battle, Katherine E.; Padilla, Carmencita D.; Baird, J. Kevin; Hay, Simon I.
2012-01-01
Background Primaquine is a key drug for malaria elimination. In addition to being the only drug active against the dormant relapsing forms of Plasmodium vivax, primaquine is the sole effective treatment of infectious P. falciparum gametocytes, and may interrupt transmission and help contain the spread of artemisinin resistance. However, primaquine can trigger haemolysis in patients with a deficiency in glucose-6-phosphate dehydrogenase (G6PDd). Poor information is available about the distribution of individuals at risk of primaquine-induced haemolysis. We present a continuous evidence-based prevalence map of G6PDd and estimates of affected populations, together with a national index of relative haemolytic risk. Methods and Findings Representative community surveys of phenotypic G6PDd prevalence were identified for 1,734 spatially unique sites. These surveys formed the evidence-base for a Bayesian geostatistical model adapted to the gene's X-linked inheritance, which predicted a G6PDd allele frequency map across malaria endemic countries (MECs) and generated population-weighted estimates of affected populations. Highest median prevalence (peaking at 32.5%) was predicted across sub-Saharan Africa and the Arabian Peninsula. Although G6PDd prevalence was generally lower across central and southeast Asia, rarely exceeding 20%, the majority of G6PDd individuals (67.5% median estimate) were from Asian countries. We estimated a G6PDd allele frequency of 8.0% (interquartile range: 7.4–8.8) across MECs, and 5.3% (4.4–6.7) within malaria-eliminating countries. The reliability of the map is contingent on the underlying data informing the model; population heterogeneity can only be represented by the available surveys, and important weaknesses exist in the map across data-sparse regions. Uncertainty metrics are used to quantify some aspects of these limitations in the map. Finally, we assembled a database of G6PDd variant occurrences to inform a national-level index of
Experience With Bayesian Image Based Surface Modeling
NASA Technical Reports Server (NTRS)
Stutz, John C.
2005-01-01
Bayesian surface modeling from images requires modeling both the surface and the image generation process, in order to optimize the models by comparing actual and generated images. Thus it differs greatly, both conceptually and in computational difficulty, from conventional stereo surface recovery techniques. But it offers the possibility of using any number of images, taken under quite different conditions, and by different instruments that provide independent and often complementary information, to generate a single surface model that fuses all available information. I describe an implemented system, with a brief introduction to the underlying mathematical models and the compromises made for computational efficiency. I describe successes and failures achieved on actual imagery, where we went wrong and what we did right, and how our approach could be improved. Lastly I discuss how the same approach can be extended to distinct types of instruments, to achieve true sensor fusion.
A Hierarchical Bayesian Model for Crowd Emotions
Urizar, Oscar J.; Baig, Mirza S.; Barakova, Emilia I.; Regazzoni, Carlo S.; Marcenaro, Lucio; Rauterberg, Matthias
2016-01-01
Estimation of emotions is an essential aspect in developing intelligent systems intended for crowded environments. However, emotion estimation in crowds remains a challenging problem due to the complexity in which human emotions are manifested and the capability of a system to perceive them in such conditions. This paper proposes a hierarchical Bayesian model to learn in unsupervised manner the behavior of individuals and of the crowd as a single entity, and explore the relation between behavior and emotions to infer emotional states. Information about the motion patterns of individuals are described using a self-organizing map, and a hierarchical Bayesian network builds probabilistic models to identify behaviors and infer the emotional state of individuals and the crowd. This model is trained and tested using data produced from simulated scenarios that resemble real-life environments. The conducted experiments tested the efficiency of our method to learn, detect and associate behaviors with emotional states yielding accuracy levels of 74% for individuals and 81% for the crowd, similar in performance with existing methods for pedestrian behavior detection but with novel concepts regarding the analysis of crowds. PMID:27458366
Hopes and Cautions in Implementing Bayesian Structural Equation Modeling
ERIC Educational Resources Information Center
MacCallum, Robert C.; Edwards, Michael C.; Cai, Li
2012-01-01
Muthen and Asparouhov (2012) have proposed and demonstrated an approach to model specification and estimation in structural equation modeling (SEM) using Bayesian methods. Their contribution builds on previous work in this area by (a) focusing on the translation of conventional SEM models into a Bayesian framework wherein parameters fixed at zero…
Jones, Matt; Love, Bradley C
2011-08-01
The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls
Warnery, E; Ielsch, G; Lajaunie, C; Cale, E; Wackernagel, H; Debayle, C; Guillevic, J
2015-01-01
information, which is exhaustive throughout France, could help in estimating the telluric gamma dose rates. Such an approach is possible using multivariate geostatistics and cokriging. Multi-collocated cokriging has been performed on 1*1 km(2) cells over the domain. This model used gamma dose rate measurement results and GUP classes. Our results provide useful information on the variability of the natural terrestrial gamma radiation in France ('natural background') and exposure data for epidemiological studies and risk assessment from low dose chronic exposures.
Bayesian Inference for Nonnegative Matrix Factorisation Models
Cemgil, Ali Taylan
2009-01-01
We describe nonnegative matrix factorisation (NMF) with a Kullback-Leibler (KL) error measure in a statistical framework, with a hierarchical generative model consisting of an observation and a prior component. Omitting the prior leads to the standard KL-NMF algorithms as special cases, where maximum likelihood parameter estimation is carried out via the Expectation-Maximisation (EM) algorithm. Starting from this view, we develop full Bayesian inference via variational Bayes or Monte Carlo. Our construction retains conjugacy and enables us to develop more powerful models while retaining attractive features of standard NMF such as monotonic convergence and easy implementation. We illustrate our approach on model order selection and image reconstruction. PMID:19536273
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
Merging Digital Surface Models Implementing Bayesian Approaches
NASA Astrophysics Data System (ADS)
Sadeq, H.; Drummond, J.; Li, Z.
2016-06-01
In this research different DSMs from different sources have been merged. The merging is based on a probabilistic model using a Bayesian Approach. The implemented data have been sourced from very high resolution satellite imagery sensors (e.g. WorldView-1 and Pleiades). It is deemed preferable to use a Bayesian Approach when the data obtained from the sensors are limited and it is difficult to obtain many measurements or it would be very costly, thus the problem of the lack of data can be solved by introducing a priori estimations of data. To infer the prior data, it is assumed that the roofs of the buildings are specified as smooth, and for that purpose local entropy has been implemented. In addition to the a priori estimations, GNSS RTK measurements have been collected in the field which are used as check points to assess the quality of the DSMs and to validate the merging result. The model has been applied in the West-End of Glasgow containing different kinds of buildings, such as flat roofed and hipped roofed buildings. Both quantitative and qualitative methods have been employed to validate the merged DSM. The validation results have shown that the model was successfully able to improve the quality of the DSMs and improving some characteristics such as the roof surfaces, which consequently led to better representations. In addition to that, the developed model has been compared with the well established Maximum Likelihood model and showed similar quantitative statistical results and better qualitative results. Although the proposed model has been applied on DSMs that were derived from satellite imagery, it can be applied to any other sourced DSMs.
The applications of model-based geostatistics in helminth epidemiology and control.
Magalhães, Ricardo J Soares; Clements, Archie C A; Patil, Anand P; Gething, Peter W; Brooker, Simon
2011-01-01
Funding agencies are dedicating substantial resources to tackle helminth infections. Reliable maps of the distribution of helminth infection can assist these efforts by targeting control resources to areas of greatest need. The ability to define the distribution of infection at regional, national and subnational levels has been enhanced greatly by the increased availability of good quality survey data and the use of model-based geostatistics (MBG), enabling spatial prediction in unsampled locations. A major advantage of MBG risk mapping approaches is that they provide a flexible statistical platform for handling and representing different sources of uncertainty, providing plausible and robust information on the spatial distribution of infections to inform the design and implementation of control programmes. Focussing on schistosomiasis and soil-transmitted helminthiasis, with additional examples for lymphatic filariasis and onchocerciasis, we review the progress made to date with the application of MBG tools in large-scale, real-world control programmes and propose a general framework for their application to inform integrative spatial planning of helminth disease control programmes.
A Bayesian Analysis of Finite Mixtures in the LISREL Model.
ERIC Educational Resources Information Center
Zhu, Hong-Tu; Lee, Sik-Yum
2001-01-01
Proposes a Bayesian framework for estimating finite mixtures of the LISREL model. The model augments the observed data of the manifest variables with the latent variables and allocation variables and uses the Gibbs sampler to obtain the Bayesian solution. Discusses other associated statistical inferences. (SLD)
Bayesian model of Snellen visual acuity.
Nestares, Oscar; Navarro, Rafael; Antona, Beatriz
2003-07-01
A Bayesian model of Snellen visual acuity (VA) has been developed that, as far as we know, is the first one that includes the three main stages of VA: (1) optical degradations, (2) neural image representation and contrast thresholding, and (3) character recognition. The retinal image of a Snellen test chart is obtained from experimental wave-aberration data. Then a subband image decomposition with a set of visual channels tuned to different spatial frequencies and orientations is applied to the retinal image, as in standard computational models of early cortical image representation. A neural threshold is applied to the contrast responses to include the effect of the neural contrast sensitivity. The resulting image representation is the base of a Bayesian pattern-recognition method robust to the presence of optical aberrations. The model is applied to images containing sets of letter optotypes at different scales, and the number of correct answers is obtained at each scale; the final output is the decimal Snellen VA. The model has no free parameters to adjust. The main input data are the eye's optical aberrations, and standard values are used for all other parameters, including the Stiles-Crawford effect, visual channels, and neural contrast threshold, when no subject specific values are available. When aberrations are large, Snellen VA involving pattern recognition differs from grating acuity, which is based on a simpler detection (or orientation-discrimination) task and hence is basically unaffected by phase distortions introduced by the optical transfer function. A preliminary test of the model in one subject produced close agreement between actual measurements and predicted VA values. Two examples are also included: (1) application of the method to the prediction of the VAin refractive-surgery patients and (2) simulation of the VA attainable by correcting ocular aberrations.
NASA Astrophysics Data System (ADS)
You, Jiong; Pei, Zhiyuan
2015-01-01
With the development of remote sensing technology, its applications in agriculture monitoring systems, crop mapping accuracy, and spatial distribution are more and more being explored by administrators and users. Uncertainty in crop mapping is profoundly affected by the spatial pattern of spectral reflectance values obtained from the applied remote sensing data. Errors in remotely sensed crop cover information and the propagation in derivative products need to be quantified and handled correctly. Therefore, this study discusses the methods of error modeling for uncertainty characterization in crop mapping using GF-1 multispectral imagery. An error modeling framework based on geostatistics is proposed, which introduced the sequential Gaussian simulation algorithm to explore the relationship between classification errors and the spectral signature from remote sensing data source. On this basis, a misclassification probability model to produce a spatially explicit classification error probability surface for the map of a crop is developed, which realizes the uncertainty characterization for crop mapping. In this process, trend surface analysis was carried out to generate a spatially varying mean response and the corresponding residual response with spatial variation for the spectral bands of GF-1 multispectral imagery. Variogram models were employed to measure the spatial dependence in the spectral bands and the derived misclassification probability surfaces. Simulated spectral data and classification results were quantitatively analyzed. Through experiments using data sets from a region in the low rolling country located at the Yangtze River valley, it was found that GF-1 multispectral imagery can be used for crop mapping with a good overall performance, the proposal error modeling framework can be used to quantify the uncertainty in crop mapping, and the misclassification probability model can summarize the spatial variation in map accuracy and is helpful for
NASA Astrophysics Data System (ADS)
Lei, Qinghua; Latham, John-Paul; Tsang, Chin-Fu; Xiang, Jiansheng; Lang, Philipp
2015-07-01
A new approach to upscaling two-dimensional fracture network models is proposed for preserving geostatistical and geomechanical characteristics of a smaller-scale "source" fracture pattern. First, the scaling properties of an outcrop system are examined in terms of spatial organization, lengths, connectivity, and normal/shear displacements using fractal geometry and power law relations. The fracture pattern is observed to be nonfractal with the fractal dimension D ≈ 2, while its length distribution tends to follow a power law with the exponent 2 < a < 3. To introduce a realistic distribution of fracture aperture and shear displacement, a geomechanical model using the combined finite-discrete element method captures the response of a fractured rock sample with a domain size L = 2 m under in situ stresses. Next, a novel scheme accommodating discrete-time random walks in recursive self-referencing lattices is developed to nucleate and propagate fractures together with their stress- and scale-dependent attributes into larger domains of up to 54 m × 54 m. The advantages of this approach include preserving the nonplanarity of natural cracks, capturing the existence of long fractures, retaining the realism of variable apertures, and respecting the stress dependency of displacement-length correlations. Hydraulic behavior of multiscale growth realizations is modeled by single-phase flow simulation, where distinct permeability scaling trends are observed for different geomechanical scenarios. A transition zone is identified where flow structure shifts from extremely channeled to distributed as the network scale increases. The results of this paper have implications for upscaling network characteristics for reservoir simulation.
NASA Astrophysics Data System (ADS)
Wingle, William L.; Poeter, Eileen P.; McKenna, Sean A.
1999-05-01
UNCERT is a 2D and 3D geostatistics, uncertainty analysis and visualization software package applied to ground water flow and contaminant transport modeling. It is a collection of modules that provides tools for linear regression, univariate statistics, semivariogram analysis, inverse-distance gridding, trend-surface analysis, simple and ordinary kriging and discrete conditional indicator simulation. Graphical user interfaces for MODFLOW and MT3D, ground water flow and contaminant transport models, are provided for streamlined data input and result analysis. Visualization tools are included for displaying data input and output. These include, but are not limited to, 2D and 3D scatter plots, histograms, box and whisker plots, 2D contour maps, surface renderings of 2D gridded data and 3D views of gridded data. By design, UNCERT's graphical user interface and visualization tools facilitate model design and analysis. There are few built in restrictions on data set sizes and each module (with two exceptions) can be run in either graphical or batch mode. UNCERT is in the public domain and is available from the World Wide Web with complete on-line and printable (PDF) documentation. UNCERT is written in ANSI-C with a small amount of FORTRAN77, for UNIX workstations running X-Windows and Motif (or Lesstif). This article discusses the features of each module and demonstrates how they can be used individually and in combination. The tools are applicable to a wide range of fields and are currently used by researchers in the ground water, mining, mathematics, chemistry and geophysics, to name a few disciplines.
Bayesian model selection for LISA pathfinder
NASA Astrophysics Data System (ADS)
Karnesis, Nikolaos; Nofrarias, Miquel; Sopuerta, Carlos F.; Gibert, Ferran; Armano, Michele; Audley, Heather; Congedo, Giuseppe; Diepholz, Ingo; Ferraioli, Luigi; Hewitson, Martin; Hueller, Mauro; Korsakova, Natalia; McNamara, Paul W.; Plagnol, Eric; Vitale, Stefano
2014-03-01
The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise models and to test key technologies for future space-based gravitational-wave observatories similar to the eLISA concept. The data analysis team has developed complex three-dimensional models of the LISA Technology Package (LTP) experiment onboard the LPF. These models are used for simulations, but, more importantly, they will be used for parameter estimation purposes during flight operations. One of the tasks of the data analysis team is to identify the physical effects that contribute significantly to the properties of the instrument noise. A way of approaching this problem is to recover the essential parameters of a LTP model fitting the data. Thus, we want to define the simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, one has to estimate the so-called Bayes factor between two competing models. In our analysis, we use three main different methods to estimate it: the reversible jump Markov chain Monte Carlo method, the Schwarz criterion, and the Laplace approximation. They are applied to simulated LPF experiments in which the most probable LTP model that explains the observations is recovered. The same type of analysis presented in this paper is expected to be followed during flight operations. Moreover, the correlation of the output of the aforementioned methods with the design of the experiment is explored.
A multivariate Bayesian model for embryonic growth.
Willemsen, Sten P; Eilers, Paul H C; Steegers-Theunissen, Régine P M; Lesaffre, Emmanuel
2015-04-15
Most longitudinal growth curve models evaluate the evolution of each of the anthropometric measurements separately. When applied to a 'reference population', this exercise leads to univariate reference curves against which new individuals can be evaluated. However, growth should be evaluated in totality, that is, by evaluating all body characteristics jointly. Recently, Cole et al. suggested the Superimposition by Translation and Rotation (SITAR) model, which expresses individual growth curves by three subject-specific parameters indicating their deviation from a flexible overall growth curve. This model allows the characterization of normal growth in a flexible though compact manner. In this paper, we generalize the SITAR model in a Bayesian way to multiple dimensions. The multivariate SITAR model allows us to create multivariate reference regions, which is advantageous for prediction. The usefulness of the model is illustrated on longitudinal measurements of embryonic growth obtained in the first semester of pregnancy, collected in the ongoing Rotterdam Predict study. Further, we demonstrate how the model can be used to find determinants of embryonic growth.
Bayesian methods for spatial upscaling of process-based forest ecosystem models
NASA Astrophysics Data System (ADS)
van Oijen, M.; Cameron, D.; Reinds, G.; Thomson, A.
2010-12-01
not proportional to carbon accumulation itself. In neither study was uncertainty quantification comprehensive. We therefore conclude with an overview of different upscaling methods to discuss the way forward towards a complete Bayesian framework. Six different methods of spatial upscaling are identified. The methods fall in three classes: (i) direct applications of the point-support model, (ii) extension of the point-support model with a geostatistical model, (iii) replacement of the original model with an emulator. Gaussian Process modelling can be used both for upscaling and emulation. The Bayesian perspective shows how output uncertainty can be quantified for each upscaling method. Reinds, G.J., Van Oijen, M. et al. (2008). Bayesian calibration of the VSD soil acidification model using European forest monitoring data. Geoderma 146: 475-488. Van Oijen, M. et al. (2005). Bayesian calibration of process-based forest models: bridging the gap between models and data. Tree Phys. 25: 915-927. Van Oijen, M. & Thomson, A. (2010). Towards Bayesian uncertainty quantification for forestry models used in the United Kingdom Greenhouse Gas Inventory for land use, land use change, and forestry. Clim. Change DOI:10.1007/s10584-010-9917-3.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
A Tutorial Introduction to Bayesian Models of Cognitive Development
ERIC Educational Resources Information Center
Perfors, Amy; Tenenbaum, Joshua B.; Griffiths, Thomas L.; Xu, Fei
2011-01-01
We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the "what", the "how", and the "why" of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for…
Implementing Relevance Feedback in the Bayesian Network Retrieval Model.
ERIC Educational Resources Information Center
de Campos, Luis M.; Fernandez-Luna, Juan M.; Huete, Juan F.
2003-01-01
Discussion of relevance feedback in information retrieval focuses on a proposal for the Bayesian Network Retrieval Model. Bases the proposal on the propagation of partial evidences in the Bayesian network, representing new information obtained from the user's relevance judgments to compute the posterior relevance probabilities of the documents…
Bayesian Student Modeling and the Problem of Parameter Specification.
ERIC Educational Resources Information Center
Millan, Eva; Agosta, John Mark; Perez de la Cruz, Jose Luis
2001-01-01
Discusses intelligent tutoring systems and the application of Bayesian networks to student modeling. Considers reasons for not using Bayesian networks, including the computational complexity of the algorithms and the difficulty of knowledge acquisition, and proposes an approach to simplify knowledge acquisition that applies causal independence to…
NASA Astrophysics Data System (ADS)
Matiatos, Ioannis; Varouhakis, Emmanouil A.; Papadopoulou, Maria P.
2015-04-01
level and nitrate concentrations were produced and compared with those obtained from groundwater and mass transport numerical models. Preliminary results showed similar efficiency of the spatiotemporal geostatistical method with the numerical models. However data requirements of the former model were significantly less. Advantages and disadvantages of the methods performance were analysed and discussed indicating the characteristics of the different approaches.
Bayesian analysis. II. Signal detection and model selection
NASA Astrophysics Data System (ADS)
Bretthorst, G. Larry
In the preceding. paper, Bayesian analysis was applied to the parameter estimation problem, given quadrature NMR data. Here Bayesian analysis is extended to the problem of selecting the model which is most probable in view of the data and all the prior information. In addition to the analytic calculation, two examples are given. The first example demonstrates how to use Bayesian probability theory to detect small signals in noise. The second example uses Bayesian probability theory to compute the probability of the number of decaying exponentials in simulated T1 data. The Bayesian answer to this question is essentially a microcosm of the scientific method and a quantitative statement of Ockham's razor: theorize about possible models, compare these to experiment, and select the simplest model that "best" fits the data.
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Bayesian analysis of the backreaction models
Kurek, Aleksandra; Bolejko, Krzysztof; Szydlowski, Marek
2010-03-15
We present a Bayesian analysis of four different types of backreaction models, which are based on the Buchert equations. In this approach, one considers a solution to the Einstein equations for a general matter distribution and then an average of various observable quantities is taken. Such an approach became of considerable interest when it was shown that it could lead to agreement with observations without resorting to dark energy. In this paper we compare the {Lambda}CDM model and the backreaction models with type Ia supernovae, baryon acoustic oscillations, and cosmic microwave background data, and find that the former is favored. However, the tested models were based on some particular assumptions about the relation between the average spatial curvature and the backreaction, as well as the relation between the curvature and curvature index. In this paper we modified the latter assumption, leaving the former unchanged. We find that, by varying the relation between the curvature and curvature index, we can obtain a better fit. Therefore, some further work is still needed--in particular, the relation between the backreaction and the curvature should be revisited in order to fully determine the feasibility of the backreaction models to mimic dark energy.
Hierarchical Bayesian models of subtask learning.
Anglim, Jeromy; Wynton, Sarah K A
2015-07-01
The current study used Bayesian hierarchical methods to challenge and extend previous work on subtask learning consistency. A general model of individual-level subtask learning was proposed focusing on power and exponential functions with constraints to test for inconsistency. To study subtask learning, we developed a novel computer-based booking task, which logged participant actions, enabling measurement of strategy use and subtask performance. Model comparison was performed using deviance information criterion (DIC), posterior predictive checks, plots of model fits, and model recovery simulations. Results showed that although learning tended to be monotonically decreasing and decelerating, and approaching an asymptote for all subtasks, there was substantial inconsistency in learning curves both at the group- and individual-levels. This inconsistency was most apparent when constraining both the rate and the ratio of learning to asymptote to be equal across subtasks, thereby giving learning curves only 1 parameter for scaling. The inclusion of 6 strategy covariates provided improved prediction of subtask performance capturing different subtask learning processes and subtask trade-offs. In addition, strategy use partially explained the inconsistency in subtask learning. Overall, the model provided a more nuanced representation of how complex tasks can be decomposed in terms of simpler learning mechanisms.
Scale Mixture Models with Applications to Bayesian Inference
NASA Astrophysics Data System (ADS)
Qin, Zhaohui S.; Damien, Paul; Walker, Stephen
2003-11-01
Scale mixtures of uniform distributions are used to model non-normal data in time series and econometrics in a Bayesian framework. Heteroscedastic and skewed data models are also tackled using scale mixture of uniform distributions.
Stochastic model updating utilizing Bayesian approach and Gaussian process model
NASA Astrophysics Data System (ADS)
Wan, Hua-Ping; Ren, Wei-Xin
2016-03-01
Stochastic model updating (SMU) has been increasingly applied in quantifying structural parameter uncertainty from responses variability. SMU for parameter uncertainty quantification refers to the problem of inverse uncertainty quantification (IUQ), which is a nontrivial task. Inverse problem solved with optimization usually brings about the issues of gradient computation, ill-conditionedness, and non-uniqueness. Moreover, the uncertainty present in response makes the inverse problem more complicated. In this study, Bayesian approach is adopted in SMU for parameter uncertainty quantification. The prominent strength of Bayesian approach for IUQ problem is that it solves IUQ problem in a straightforward manner, which enables it to avoid the previous issues. However, when applied to engineering structures that are modeled with a high-resolution finite element model (FEM), Bayesian approach is still computationally expensive since the commonly used Markov chain Monte Carlo (MCMC) method for Bayesian inference requires a large number of model runs to guarantee the convergence. Herein we reduce computational cost in two aspects. On the one hand, the fast-running Gaussian process model (GPM) is utilized to approximate the time-consuming high-resolution FEM. On the other hand, the advanced MCMC method using delayed rejection adaptive Metropolis (DRAM) algorithm that incorporates local adaptive strategy with global adaptive strategy is employed for Bayesian inference. In addition, we propose the use of the powerful variance-based global sensitivity analysis (GSA) in parameter selection to exclude non-influential parameters from calibration parameters, which yields a reduced-order model and thus further alleviates the computational burden. A simulated aluminum plate and a real-world complex cable-stayed pedestrian bridge are presented to illustrate the proposed framework and verify its feasibility.
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Bayesian Case-deletion Model Complexity and Information Criterion
Zhu, Hongtu; Ibrahim, Joseph G.; Chen, Qingxia
2015-01-01
We establish a connection between Bayesian case influence measures for assessing the influence of individual observations and Bayesian predictive methods for evaluating the predictive performance of a model and comparing different models fitted to the same dataset. Based on such a connection, we formally propose a new set of Bayesian case-deletion model complexity (BCMC) measures for quantifying the effective number of parameters in a given statistical model. Its properties in linear models are explored. Adding some functions of BCMC to a conditional deviance function leads to a Bayesian case-deletion information criterion (BCIC) for comparing models. We systematically investigate some properties of BCIC and its connection with other information criteria, such as the Deviance Information Criterion (DIC). We illustrate the proposed methodology on linear mixed models with simulations and a real data example. PMID:26180578
Constructive Epistemic Modeling: A Hierarchical Bayesian Model Averaging Method
NASA Astrophysics Data System (ADS)
Tsai, F. T. C.; Elshall, A. S.
2014-12-01
Constructive epistemic modeling is the idea that our understanding of a natural system through a scientific model is a mental construct that continually develops through learning about and from the model. Using the hierarchical Bayesian model averaging (HBMA) method [1], this study shows that segregating different uncertain model components through a BMA tree of posterior model probabilities, model prediction, within-model variance, between-model variance and total model variance serves as a learning tool [2]. First, the BMA tree of posterior model probabilities permits the comparative evaluation of the candidate propositions of each uncertain model component. Second, systemic model dissection is imperative for understanding the individual contribution of each uncertain model component to the model prediction and variance. Third, the hierarchical representation of the between-model variance facilitates the prioritization of the contribution of each uncertain model component to the overall model uncertainty. We illustrate these concepts using the groundwater modeling of a siliciclastic aquifer-fault system. The sources of uncertainty considered are from geological architecture, formation dip, boundary conditions and model parameters. The study shows that the HBMA analysis helps in advancing knowledge about the model rather than forcing the model to fit a particularly understanding or merely averaging several candidate models. [1] Tsai, F. T.-C., and A. S. Elshall (2013), Hierarchical Bayesian model averaging for hydrostratigraphic modeling: Uncertainty segregation and comparative evaluation. Water Resources Research, 49, 5520-5536, doi:10.1002/wrcr.20428. [2] Elshall, A.S., and F. T.-C. Tsai (2014). Constructive epistemic modeling of groundwater flow with geological architecture and boundary condition uncertainty under Bayesian paradigm, Journal of Hydrology, 517, 105-119, doi: 10.1016/j.jhydrol.2014.05.027.
Bayesian analysis of a disability model for lung cancer survival.
Armero, C; Cabras, S; Castellanos, M E; Perra, S; Quirós, A; Oruezábal, M J; Sánchez-Rubio, J
2016-02-01
Bayesian reasoning, survival analysis and multi-state models are used to assess survival times for Stage IV non-small-cell lung cancer patients and the evolution of the disease over time. Bayesian estimation is done using minimum informative priors for the Weibull regression survival model, leading to an automatic inferential procedure. Markov chain Monte Carlo methods have been used for approximating posterior distributions and the Bayesian information criterion has been considered for covariate selection. In particular, the posterior distribution of the transition probabilities, resulting from the multi-state model, constitutes a very interesting tool which could be useful to help oncologists and patients make efficient and effective decisions.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Lee, K.H.
1997-09-01
Numerical and geostatistical analyses show that the artificial smoothing effect of kriging removes high permeability flow paths from hydrogeologic data sets, reducing simulated contaminant transport rates in heterogeneous vadose zone systems. therefore, kriging alone is not recommended for estimating the spatial distribution of soil hydraulic properties for contaminant transport analysis at vadose zone sites. Vadose zone transport if modeled more effectively by combining kriging with stochastic simulation to better represent the high degree of spatial variability usually found in the hydraulic properties of field soils. However, kriging is a viable technique for estimating the initial mass distribution of contaminants in the subsurface.
Calibrating Bayesian Network Representations of Social-Behavioral Models
Whitney, Paul D.; Walsh, Stephen J.
2010-04-08
While human behavior has long been studied, recent and ongoing advances in computational modeling present opportunities for recasting research outcomes in human behavior. In this paper we describe how Bayesian networks can represent outcomes of human behavior research. We demonstrate a Bayesian network that represents political radicalization research – and show a corresponding visual representation of aspects of this research outcome. Since Bayesian networks can be quantitatively compared with external observations, the representation can also be used for empirical assessments of the research which the network summarizes. For a political radicalization model based on published research, we show this empirical comparison with data taken from the Minorities at Risk Organizational Behaviors database.
Geostatistical simulations for radon indoor with a nested model including the housing factor.
Cafaro, C; Giovani, C; Garavaglia, M
2016-01-01
The radon prone areas definition is matter of many researches in radioecology, since radon is considered a leading cause of lung tumours, therefore the authorities ask for support to develop an appropriate sanitary prevention strategy. In this paper, we use geostatistical tools to elaborate a definition accounting for some of the available information about the dwellings. Co-kriging is the proper interpolator used in geostatistics to refine the predictions by using external covariates. In advance, co-kriging is not guaranteed to improve significantly the results obtained by applying the common lognormal kriging. Here, instead, such multivariate approach leads to reduce the cross-validation residual variance to an extent which is deemed as satisfying. Furthermore, with the application of Monte Carlo simulations, the paradigm provides a more conservative radon prone areas definition than the one previously made by lognormal kriging.
Bayesian approach to decompression sickness model parameter estimation.
Howle, L E; Weber, P W; Nichols, J M
2017-03-01
We examine both maximum likelihood and Bayesian approaches for estimating probabilistic decompression sickness model parameters. Maximum likelihood estimation treats parameters as fixed values and determines the best estimate through repeated trials, whereas the Bayesian approach treats parameters as random variables and determines the parameter probability distributions. We would ultimately like to know the probability that a parameter lies in a certain range rather than simply make statements about the repeatability of our estimator. Although both represent powerful methods of inference, for models with complex or multi-peaked likelihoods, maximum likelihood parameter estimates can prove more difficult to interpret than the estimates of the parameter distributions provided by the Bayesian approach. For models of decompression sickness, we show that while these two estimation methods are complementary, the credible intervals generated by the Bayesian approach are more naturally suited to quantifying uncertainty in the model parameters.
BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)
We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...
NASA Astrophysics Data System (ADS)
Moradkhani, Hamid; Yan, Hongxiang
2016-04-01
Soil moisture simulation and prediction are increasingly used to characterize agricultural droughts but the process suffers from data scarcity and quality. The satellite soil moisture observations could be used to improve model predictions with data assimilation. Remote sensing products, however, are typically discontinuous in spatial-temporal coverages; while simulated soil moisture products are potentially biased due to the errors in forcing data, parameters, and deficiencies of model physics. This study attempts to provide a detailed analysis of the joint and separate assimilation of streamflow and Advanced Scatterometer (ASCAT) surface soil moisture into a fully distributed hydrologic model, with the use of recently developed particle filter-Markov chain Monte Carlo (PF-MCMC) method. A geostatistical model is introduced to overcome the satellite soil moisture discontinuity issue where satellite data does not cover the whole study region or is significantly biased, and the dominant land cover is dense vegetation. The results indicate that joint assimilation of soil moisture and streamflow has minimal effect in improving the streamflow prediction, however, the surface soil moisture field is significantly improved. The combination of DA and geostatistical approach can further improve the surface soil moisture prediction.
A Generalizable Hierarchical Bayesian Model for Persistent SAR Change Detection
2012-04-01
6] K. Ranney and M. Soumekh, “Signal subspace change detection in averaged multilook sar imagery,” Geoscience and Remote Sensing, IEEE Transactions on...A Generalizable Hierarchical Bayesian Model for Persistent SAR Change Detection Gregory E. Newstadta, Edmund G. Zelniob, and Alfred O. Hero IIIa...Base, OH, 45433, USA ABSTRACT This paper proposes a hierarchical Bayesian model for multiple-pass, multiple antenna synthetic aperture radar ( SAR
Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area
Murakami, Haruko; Chen, X.; Hahn, Melanie S.; Liu, Yi; Rockhold, Mark L.; Vermeul, Vincent R.; Zachara, John M.; Rubin, Yoram
2010-10-21
This study presents a stochastic, three-dimensional characterization of a heterogeneous hydraulic conductivity field within DOE's Hanford 300 Area site, Washington, by assimilating large-scale, constant-rate injection test data with small-scale, three-dimensional electromagnetic borehole flowmeter (EBF) measurement data. We first inverted the injection test data to estimate the transmissivity field, using zeroth-order temporal moments of pressure buildup curves. We applied a newly developed Bayesian geostatistical inversion framework, the method of anchored distributions (MAD), to obtain a joint posterior distribution of geostatistical parameters and local log-transmissivities at multiple locations. The unique aspects of MAD that make it suitable for this purpose are its ability to integrate multi-scale, multi-type data within a Bayesian framework and to compute a nonparametric posterior distribution. After we combined the distribution of transmissivities with depth-discrete relative-conductivity profile from EBF data, we inferred the three-dimensional geostatistical parameters of the log-conductivity field, using the Bayesian model-based geostatistics. Such consistent use of the Bayesian approach throughout the procedure enabled us to systematically incorporate data uncertainty into the final posterior distribution. The method was tested in a synthetic study and validated using the actual data that was not part of the estimation. Results showed broader and skewed posterior distributions of geostatistical parameters except for the mean, which suggests the importance of inferring the entire distribution to quantify the parameter uncertainty.
Geostatistics and petroleum geology
Hohn, M.E.
1988-01-01
This book examines purpose and use of geostatistics in exploration and development of oil and gas with an emphasis on appropriate and pertinent case studies. It present an overview of geostatistics. Topics covered include: The semivariogram; Linear estimation; Multivariate geostatistics; Nonlinear estimation; From indicator variables to nonparametric estimation; and More detail, less certainty; conditional simulation.
A Practical Primer on Geostatistics
Olea, Ricardo A.
2009-01-01
THE CHALLENGE Most geological phenomena are extraordinarily complex in their interrelationships and vast in their geographical extension. Ordinarily, engineers and geoscientists are faced with corporate or scientific requirements to properly prepare geological models with measurements involving a small fraction of the entire area or volume of interest. Exact description of a system such as an oil reservoir is neither feasible nor economically possible. The results are necessarily uncertain. Note that the uncertainty is not an intrinsic property of the systems; it is the result of incomplete knowledge by the observer. THE AIM OF GEOSTATISTICS The main objective of geostatistics is the characterization of spatial systems that are incompletely known, systems that are common in geology. A key difference from classical statistics is that geostatistics uses the sampling location of every measurement. Unless the measurements show spatial correlation, the application of geostatistics is pointless. Ordinarily the need for additional knowledge goes beyond a few points, which explains the display of results graphically as fishnet plots, block diagrams, and maps. GEOSTATISTICAL METHODS Geostatistics is a collection of numerical techniques for the characterization of spatial attributes using primarily two tools: probabilistic models, which are used for spatial data in a manner similar to the way in which time-series analysis characterizes temporal data, or pattern recognition techniques. The probabilistic models are used as a way to handle uncertainty in results away from sampling locations, making a radical departure from alternative approaches like inverse distance estimation methods. DIFFERENCES WITH TIME SERIES On dealing with time-series analysis, users frequently concentrate their attention on extrapolations for making forecasts. Although users of geostatistics may be interested in extrapolation, the methods work at their best interpolating. This simple difference has
Modeling Non-Gaussian Time Series with Nonparametric Bayesian Model.
Xu, Zhiguang; MacEachern, Steven; Xu, Xinyi
2015-02-01
We present a class of Bayesian copula models whose major components are the marginal (limiting) distribution of a stationary time series and the internal dynamics of the series. We argue that these are the two features with which an analyst is typically most familiar, and hence that these are natural components with which to work. For the marginal distribution, we use a nonparametric Bayesian prior distribution along with a cdf-inverse cdf transformation to obtain large support. For the internal dynamics, we rely on the traditionally successful techniques of normal-theory time series. Coupling the two components gives us a family of (Gaussian) copula transformed autoregressive models. The models provide coherent adjustments of time scales and are compatible with many extensions, including changes in volatility of the series. We describe basic properties of the models, show their ability to recover non-Gaussian marginal distributions, and use a GARCH modification of the basic model to analyze stock index return series. The models are found to provide better fit and improved short-range and long-range predictions than Gaussian competitors. The models are extensible to a large variety of fields, including continuous time models, spatial models, models for multiple series, models driven by external covariate streams, and non-stationary models.
Ashton, Ruth A.; Kefyalew, Takele; Rand, Alison; Sime, Heven; Assefa, Ashenafi; Mekasha, Addis; Edosa, Wasihun; Tesfaye, Gezahegn; Cano, Jorge; Teka, Hiwot; Reithinger, Richard; Pullan, Rachel L.; Drakeley, Chris J.; Brooker, Simon J.
2015-01-01
Ethiopia has a diverse ecology and geography resulting in spatial and temporal variation in malaria transmission. Evidence-based strategies are thus needed to monitor transmission intensity and target interventions. A purposive selection of dried blood spots collected during cross-sectional school-based surveys in Oromia Regional State, Ethiopia, were tested for presence of antibodies against Plasmodium falciparum and P. vivax antigens. Spatially explicit binomial models of seroprevalence were created for each species using a Bayesian framework, and used to predict seroprevalence at 5 km resolution across Oromia. School seroprevalence showed a wider prevalence range than microscopy for both P. falciparum (0–50% versus 0–12.7%) and P. vivax (0–53.7% versus 0–4.5%), respectively. The P. falciparum model incorporated environmental predictors and spatial random effects, while P. vivax seroprevalence first-order trends were not adequately explained by environmental variables, and a spatial smoothing model was developed. This is the first demonstration of serological indicators being used to detect large-scale heterogeneity in malaria transmission using samples from cross-sectional school-based surveys. The findings support the incorporation of serological indicators into periodic large-scale surveillance such as Malaria Indicator Surveys, and with particular utility for low transmission and elimination settings. PMID:25962770
Bayesian graphical models for genomewide association studies.
Verzilli, Claudio J; Stallard, Nigel; Whittaker, John C
2006-07-01
As the extent of human genetic variation becomes more fully characterized, the research community is faced with the challenging task of using this information to dissect the heritable components of complex traits. Genomewide association studies offer great promise in this respect, but their analysis poses formidable difficulties. In this article, we describe a computationally efficient approach to mining genotype-phenotype associations that scales to the size of the data sets currently being collected in such studies. We use discrete graphical models as a data-mining tool, searching for single- or multilocus patterns of association around a causative site. The approach is fully Bayesian, allowing us to incorporate prior knowledge on the spatial dependencies around each marker due to linkage disequilibrium, which reduces considerably the number of possible graphical structures. A Markov chain-Monte Carlo scheme is developed that yields samples from the posterior distribution of graphs conditional on the data from which probabilistic statements about the strength of any genotype-phenotype association can be made. Using data simulated under scenarios that vary in marker density, genotype relative risk of a causative allele, and mode of inheritance, we show that the proposed approach has better localization properties and leads to lower false-positive rates than do single-locus analyses. Finally, we present an application of our method to a quasi-synthetic data set in which data from the CYP2D6 region are embedded within simulated data on 100K single-nucleotide polymorphisms. Analysis is quick (<5 min), and we are able to localize the causative site to a very short interval.
Assessing Fit of Unidimensional Graded Response Models Using Bayesian Methods
ERIC Educational Resources Information Center
Zhu, Xiaowen; Stone, Clement A.
2011-01-01
The posterior predictive model checking method is a flexible Bayesian model-checking tool and has recently been used to assess fit of dichotomous IRT models. This paper extended previous research to polytomous IRT models. A simulation study was conducted to explore the performance of posterior predictive model checking in evaluating different…
Ou, Chunping; St-Hilaire, André; Ouarda, Taha B M J; Conly, F Malcolm; Armstrong, Nicole; Khalil, Bahaa; Proulx-McInnis, Sandra
2012-12-01
The assessment of the adequacy of sampling locations is an important aspect in the validation of an effective and efficient water quality monitoring network. Two geostatistical approaches (e.g., kriging and Moran's I) are presented to assess multiple sampling locations. A flexible and comprehensive framework was developed for the selection of multiple sampling locations of multiple variables which was accomplished by coupling geostatistical approaches with principal component analysis (PCA) and fuzzy optimal model (FOM). The FOM was used in the integrated assessment of both multiple principal components and multiple geostatistical approaches. These integrated methods were successfully applied to the assessment of two independent water quality monitoring networks (WQMNs) of Lake Winnipeg, Canada, which respectively included 14 and 30 stations from 2006 to 2010.
Hwang, Kyu-Baek; Zhang, Byoung-Tak
2005-12-01
Bayesian model averaging (BMA) can resolve the overfitting problem by explicitly incorporating the model uncertainty into the analysis procedure. Hence, it can be used to improve the generalization performance of Bayesian network classifiers. Until now, BMA of Bayesian network classifiers has only been performed in some restricted forms, e.g., the model is averaged given a single node-order, because of its heavy computational burden. However, it can be hard to obtain a good node-order when the available training dataset is sparse. To alleviate this problem, we propose BMA of Bayesian network classifiers over several distinct node-orders obtained using the Markov chain Monte Carlo sampling technique. The proposed method was examined using two synthetic problems and four real-life datasets. First, we show that the proposed method is especially effective when the given dataset is very sparse. The classification accuracy of averaging over multiple node-orders was higher in most cases than that achieved using a single node-order in our experiments. We also present experimental results for test datasets with unobserved variables, where the quality of the averaged node-order is more important. Through these experiments, we show that the difference in classification performance between the cases of multiple node-orders and single node-order is related to the level of noise, confirming the relative benefit of averaging over multiple node-orders for incomplete data. We conclude that BMA of Bayesian network classifiers over multiple node-orders has an apparent advantage when the given dataset is sparse and noisy, despite the method's heavy computational cost.
Technical note: Bayesian calibration of dynamic ruminant nutrition models.
Reed, K F; Arhonditsis, G B; France, J; Kebreab, E
2016-08-01
Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling.
Using consensus bayesian network to model the reactive oxygen species regulatory pathway.
Hu, Liangdong; Wang, Limin
2013-01-01
Bayesian network is one of the most successful graph models for representing the reactive oxygen species regulatory pathway. With the increasing number of microarray measurements, it is possible to construct the bayesian network from microarray data directly. Although large numbers of bayesian network learning algorithms have been developed, when applying them to learn bayesian networks from microarray data, the accuracies are low due to that the databases they used to learn bayesian networks contain too few microarray data. In this paper, we propose a consensus bayesian network which is constructed by combining bayesian networks from relevant literatures and bayesian networks learned from microarray data. It would have a higher accuracy than the bayesian networks learned from one database. In the experiment, we validated the bayesian network combination algorithm on several classic machine learning databases and used the consensus bayesian network to model the Escherichia coli's ROS pathway.
Bayesian Estimation of the Logistic Positive Exponent IRT Model
ERIC Educational Resources Information Center
Bolfarine, Heleno; Bazan, Jorge Luis
2010-01-01
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
Schröder, Winfried
2006-05-01
By the example of environmental monitoring, some applications of geographic information systems (GIS), geostatistics, metadata banking, and Classification and Regression Trees (CART) are presented. These tools are recommended for mapping statistically estimated hot spots of vectors and pathogens. GIS were introduced as tools for spatially modelling the real world. The modelling can be done by mapping objects according to the spatial information content of data. Additionally, this can be supported by geostatistical and multivariate statistical modelling. This is demonstrated by the example of modelling marine habitats of benthic communities and of terrestrial ecoregions. Such ecoregionalisations may be used to predict phenomena based on the statistical relation between measurements of an interesting phenomenon such as, e.g., the incidence of medically relevant species and correlated characteristics of the ecoregions. The combination of meteorological data and data on plant phenology can enhance the spatial resolution of the information on climate change. To this end, meteorological and phenological data have to be correlated. To enable this, both data sets which are from disparate monitoring networks have to be spatially connected by means of geostatistical estimation. This is demonstrated by the example of transformation of site-specific data on plant phenology into surface data. The analysis allows for spatial comparison of the phenology during the two periods 1961-1990 and 1991-2002 covering whole Germany. The changes in both plant phenology and air temperature were proved to be statistically significant. Thus, they can be combined by GIS overlay technique to enhance the spatial resolution of the information on the climate change and use them for the prediction of vector incidences at the regional scale. The localisation of such risk hot spots can be done by geometrically merging surface data on promoting factors. This is demonstrated by the example of the
Bayesian Networks for Modeling Dredging Decisions
2011-10-01
position unless so designated by other authorized documents. DESTROY THIS REPORT WHEN NO LONGER NEEDED. DO NOT RETURN IT TO THE ORIGINATOR. ERDC/EL TR...links within a network often do indicate causality and it is usually best to work from information about... work in this area. ERDC/EL TR-11-14 16 Table 1. Bayesian network applications reviewed in the literature. Author(s) Year Substantive issue
On the Adequacy of Bayesian Evaluations of Categorization Models: Reply to Vanpaemel and Lee (2012)
ERIC Educational Resources Information Center
Wills, Andy J.; Pothos, Emmanuel M.
2012-01-01
Vanpaemel and Lee (2012) argued, and we agree, that the comparison of formal models can be facilitated by Bayesian methods. However, Bayesian methods neither precede nor supplant our proposals (Wills & Pothos, 2012), as Bayesian methods can be applied both to our proposals and to their polar opposites. Furthermore, the use of Bayesian methods to…
Bayesian failure probability model sensitivity study. Final report
Not Available
1986-05-30
The Office of the Manager, National Communications System (OMNCS) has developed a system-level approach for estimating the effects of High-Altitude Electromagnetic Pulse (HEMP) on the connectivity of telecommunications networks. This approach incorporates a Bayesian statistical model which estimates the HEMP-induced failure probabilities of telecommunications switches and transmission facilities. The purpose of this analysis is to address the sensitivity of the Bayesian model. This is done by systematically varying two model input parameters--the number of observations, and the equipment failure rates. Throughout the study, a non-informative prior distribution is used. The sensitivity of the Bayesian model to the noninformative prior distribution is investigated from a theoretical mathematical perspective.
Back to basics for Bayesian model building in genomic selection.
Kärkkäinen, Hanni P; Sillanpää, Mikko J
2012-07-01
Numerous Bayesian methods of phenotype prediction and genomic breeding value estimation based on multilocus association models have been proposed. Computationally the methods have been based either on Markov chain Monte Carlo or on faster maximum a posteriori estimation. The demand for more accurate and more efficient estimation has led to the rapid emergence of workable methods, unfortunately at the expense of well-defined principles for Bayesian model building. In this article we go back to the basics and build a Bayesian multilocus association model for quantitative and binary traits with carefully defined hierarchical parameterization of Student's t and Laplace priors. In this treatment we consider alternative model structures, using indicator variables and polygenic terms. We make the most of the conjugate analysis, enabled by the hierarchical formulation of the prior densities, by deriving the fully conditional posterior densities of the parameters and using the acquired known distributions in building fast generalized expectation-maximization estimation algorithms.
Bayesian Case Influence Measures for Statistical Models with Missing Data
Zhu, Hongtu; Ibrahim, Joseph G.; Cho, Hyunsoon; Tang, Niansheng
2011-01-01
We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology. PMID:23399928
On the Bayesian Nonparametric Generalization of IRT-Type Models
ERIC Educational Resources Information Center
San Martin, Ernesto; Jara, Alejandro; Rolin, Jean-Marie; Mouchart, Michel
2011-01-01
We study the identification and consistency of Bayesian semiparametric IRT-type models, where the uncertainty on the abilities' distribution is modeled using a prior distribution on the space of probability measures. We show that for the semiparametric Rasch Poisson counts model, simple restrictions ensure the identification of a general…
Bayesian non-parametrics and the probabilistic approach to modelling
Ghahramani, Zoubin
2013-01-01
Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman’s coalescent, Dirichlet diffusion trees and Wishart processes. PMID:23277609
A General Bayesian Model for Testlets: Theory and Applications.
ERIC Educational Resources Information Center
Wang, Xiaohui; Bradlow, Eric T.; Wainer, Howard
2002-01-01
Proposes a modified version of commonly employed item response models in a fully Bayesian framework and obtains inferences under the model using Markov chain Monte Carlo techniques. Demonstrates use of the model in a series of simulations and with operational data from the North Carolina Test of Computer Skills and the Test of Spoken English…
Bayesian Network Models for Local Dependence among Observable Outcome Variables
ERIC Educational Resources Information Center
Almond, Russell G.; Mulder, Joris; Hemat, Lisa A.; Yan, Duanli
2009-01-01
Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task, which may be dependent. This article explores four design patterns for modeling locally dependent observations: (a) no context--ignores dependence among observables; (b) compensatory context--introduces…
Semiparametric Thurstonian Models for Recurrent Choices: A Bayesian Analysis
ERIC Educational Resources Information Center
Ansari, Asim; Iyengar, Raghuram
2006-01-01
We develop semiparametric Bayesian Thurstonian models for analyzing repeated choice decisions involving multinomial, multivariate binary or multivariate ordinal data. Our modeling framework has multiple components that together yield considerable flexibility in modeling preference utilities, cross-sectional heterogeneity and parameter-driven…
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors
Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world’s deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014. PMID:28257437
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.
Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
Andrade, A I A S S; Stigter, T Y
2013-04-01
In this study multivariate and geostatistical methods are jointly applied to model the spatial and temporal distribution of arsenic (As) concentrations in shallow groundwater as a function of physicochemical, hydrogeological and land use parameters, as well as to assess the related uncertainty. The study site is located in the Mondego River alluvial body in Central Portugal, where maize, rice and some vegetable crops dominate. In a first analysis scatter plots are used, followed by the application of principal component analysis to two different data matrices, of 112 and 200 samples, with the aim of detecting associations between As levels and other quantitative parameters. In the following phase explanatory models of As are created through factorial regression based on correspondence analysis, integrating both quantitative and qualitative parameters. Finally, these are combined with indicator-geostatistical techniques to create maps indicating the predicted probability of As concentrations in groundwater exceeding the current global drinking water guideline of 10 μg/l. These maps further allow assessing the uncertainty and representativeness of the monitoring network. A clear effect of the redox state on the presence of As is observed, and together with significant correlations with dissolved oxygen, nitrate, sulfate, iron, manganese and alkalinity, points towards the reductive dissolution of Fe (hydr)oxides as the essential mechanism of As release. The association of high As values with rice crop, known to promote reduced environments due to ponding, further corroborates this hypothesis. An additional source of As from fertilizers cannot be excluded, as the correlation with As is higher where rice is associated with vegetables, normally associated with higher fertilization rates. The best explanatory model of As occurrence integrates the parameters season, crop type, well and water depth, nitrate and Eh, though a model without the last two parameters also gives
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling
Li Yupeng Deutsch, Clayton V.
2012-06-15
In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells. In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.
Bayesian Estimation in the One-Parameter Latent Trait Model.
1980-03-01
3 MASSACHUSETTS LNIV AMHERST LAB OF PSYCHOMETRIC AND -- ETC F/G 12/1 BAYESIAN ESTIMATION IN THE ONE-PARA1ETER LATENT TRAIT MODEL. (U) MAR 80 H...TEST CHART VVNN lfl’ ,. [’ COD BAYESIAN ESTIMATION IN THE ONE-PARAMETER LATENT TRAIT MODEL 0 wtHAR IHARAN SWA I NATHAN AND JANICE A. GIFFORD Research...block numbef) latent trait theory Bayesain estimation 20. ABSTRACT (Continue on reveso aide If neceaar and identlfy by Nock mambe) ,-When several
Bayesian Analysis of Order-Statistics Models for Ranking Data.
ERIC Educational Resources Information Center
Yu, Philip L. H.
2000-01-01
Studied the order-statistics models, extending the usual normal order-statistics model into one in which the underlying random variables followed a multivariate normal distribution. Used a Bayesian approach and the Gibbs sampling technique. Applied the proposed method to analyze presidential election data from the American Psychological…
Bayesian Finite Mixtures for Nonlinear Modeling of Educational Data.
ERIC Educational Resources Information Center
Tirri, Henry; And Others
A Bayesian approach for finding latent classes in data is discussed. The approach uses finite mixture models to describe the underlying structure in the data and demonstrate that the possibility of using full joint probability models raises interesting new prospects for exploratory data analysis. The concepts and methods discussed are illustrated…
A Bayesian Approach for Analyzing Longitudinal Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Lu, Zhao-Hua; Hser, Yih-Ing; Lee, Sik-Yum
2011-01-01
This article considers a Bayesian approach for analyzing a longitudinal 2-level nonlinear structural equation model with covariates, and mixed continuous and ordered categorical variables. The first-level model is formulated for measures taken at each time point nested within individuals for investigating their characteristics that are dynamically…
Bayesian Semiparametric Structural Equation Models with Latent Variables
ERIC Educational Resources Information Center
Yang, Mingan; Dunson, David B.
2010-01-01
Structural equation models (SEMs) with latent variables are widely useful for sparse covariance structure modeling and for inferring relationships among latent variables. Bayesian SEMs are appealing in allowing for the incorporation of prior information and in providing exact posterior distributions of unknowns, including the latent variables. In…
Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models
ERIC Educational Resources Information Center
Price, Larry R.
2012-01-01
The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…
Bayesian Estimation of the DINA Model with Gibbs Sampling
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2015-01-01
A Bayesian model formulation of the deterministic inputs, noisy "and" gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas,…
Application of geostatistics to risk assessment.
Thayer, William C; Griffith, Daniel A; Goodrum, Philip E; Diamond, Gary L; Hassett, James M
2003-10-01
Geostatistics offers two fundamental contributions to environmental contaminant exposure assessment: (1) a group of methods to quantitatively describe the spatial distribution of a pollutant and (2) the ability to improve estimates of the exposure point concentration by exploiting the geospatial information present in the data. The second contribution is particularly valuable when exposure estimates must be derived from small data sets, which is often the case in environmental risk assessment. This article addresses two topics related to the use of geostatistics in human and ecological risk assessments performed at hazardous waste sites: (1) the importance of assessing model assumptions when using geostatistics and (2) the use of geostatistics to improve estimates of the exposure point concentration (EPC) in the limited data scenario. The latter topic is approached here by comparing design-based estimators that are familiar to environmental risk assessors (e.g., Land's method) with geostatistics, a model-based estimator. In this report, we summarize the basics of spatial weighting of sample data, kriging, and geostatistical simulation. We then explore the two topics identified above in a case study, using soil lead concentration data from a Superfund site (a skeet and trap range). We also describe several areas where research is needed to advance the use of geostatistics in environmental risk assessment.
Bayesian methods for characterizing unknown parameters of material models
Emery, J. M.; Grigoriu, M. D.; Field Jr., R. V.
2016-02-04
A Bayesian framework is developed for characterizing the unknown parameters of probabilistic models for material properties. In this framework, the unknown parameters are viewed as random and described by their posterior distributions obtained from prior information and measurements of quantities of interest that are observable and depend on the unknown parameters. The proposed Bayesian method is applied to characterize an unknown spatial correlation of the conductivity field in the definition of a stochastic transport equation and to solve this equation by Monte Carlo simulation and stochastic reduced order models (SROMs). As a result, the Bayesian method is also employed tomore » characterize unknown parameters of material properties for laser welds from measurements of peak forces sustained by these welds.« less
Bayesian methods for characterizing unknown parameters of material models
Emery, J. M.; Grigoriu, M. D.; Field Jr., R. V.
2016-02-04
A Bayesian framework is developed for characterizing the unknown parameters of probabilistic models for material properties. In this framework, the unknown parameters are viewed as random and described by their posterior distributions obtained from prior information and measurements of quantities of interest that are observable and depend on the unknown parameters. The proposed Bayesian method is applied to characterize an unknown spatial correlation of the conductivity field in the definition of a stochastic transport equation and to solve this equation by Monte Carlo simulation and stochastic reduced order models (SROMs). As a result, the Bayesian method is also employed to characterize unknown parameters of material properties for laser welds from measurements of peak forces sustained by these welds.
Bayesian log-periodic model for financial crashes
NASA Astrophysics Data System (ADS)
Rodríguez-Caballero, Carlos Vladimir; Knapik, Oskar
2014-10-01
This paper introduces a Bayesian approach in econophysics literature about financial bubbles in order to estimate the most probable time for a financial crash to occur. To this end, we propose using noninformative prior distributions to obtain posterior distributions. Since these distributions cannot be performed analytically, we develop a Markov Chain Monte Carlo algorithm to draw from posterior distributions. We consider three Bayesian models that involve normal and Student's t-distributions in the disturbances and an AR(1)-GARCH(1,1) structure only within the first case. In the empirical part of the study, we analyze a well-known example of financial bubble - the S&P 500 1987 crash - to show the usefulness of the three methods under consideration and crashes of Merval-94, Bovespa-97, IPCMX-94, Hang Seng-97 using the simplest method. The novelty of this research is that the Bayesian models provide 95% credible intervals for the estimated crash time.
Reservoir studies with geostatistics to forecast performance
Tang, R.W.; Behrens, R.A.; Emanuel, A.S. )
1991-05-01
In this paper example geostatistics and streamtube applications are presented for waterflood and CO{sub 2} flood in two low-permeability sandstone reservoirs. Thy hybrid approach of combining fine vertical resolution in cross-sectional models with streamtubes resulted in models that showed water channeling and provided realistic performance estimates. Results indicate that the combination of detailed geostatistical cross sections and fine-grid streamtube models offers a systematic approach for realistic performance forecasts.
Hierarchical Bayesian spatial models for multispecies conservation planning and monitoring.
Carroll, Carlos; Johnson, Devin S; Dunk, Jeffrey R; Zielinski, William J
2010-12-01
Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data's spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and invertebrate taxa of conservation concern (Church's sideband snails [Monadenia churchi], red tree voles [Arborimus longicaudus], and Pacific fishers [Martes pennanti pacifica]) that provide examples of a range of distributional extents and dispersal abilities. We used presence-absence data derived from regional monitoring programs to develop models with both landscape and site-level environmental covariates. We used Markov chain Monte Carlo algorithms and a conditional autoregressive or intrinsic conditional autoregressive model framework to fit spatial models. The fit of Bayesian spatial models was between 35 and 55% better than the fit of nonspatial analogue models. Bayesian spatial models outperformed analogous models developed with maximum entropy (Maxent) methods. Although the best spatial and nonspatial models included similar environmental variables, spatial models provided estimates of residual spatial effects that suggested how ecological processes might structure distribution patterns. Spatial models built from presence-absence data improved fit most for localized endemic species with ranges constrained by poorly known biogeographic factors and for widely distributed species suspected to be strongly affected by unmeasured environmental variables or population processes. By treating spatial effects as a variable of interest rather than a nuisance, hierarchical Bayesian spatial models, especially when they are based on a common broad-scale spatial lattice (here the national Forest Inventory and Analysis grid of 24 km(2) hexagons), can increase the relevance of habitat models to multispecies
Measuring Learning Progressions Using Bayesian Modeling in Complex Assessments
ERIC Educational Resources Information Center
Rutstein, Daisy Wise
2012-01-01
This research examines issues regarding model estimation and robustness in the use of Bayesian Inference Networks (BINs) for measuring Learning Progressions (LPs). It provides background information on LPs and how they might be used in practice. Two simulation studies are performed, along with real data examples. The first study examines the case…
Shortlist B: A Bayesian Model of Continuous Speech Recognition
ERIC Educational Resources Information Center
Norris, Dennis; McQueen, James M.
2008-01-01
A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward…
Geostatistics and petroleum geology
Hohn, M.E.
1988-01-01
The book reviewed is designed as a practical guide to geostatistics or kriging for the petroleum geologists. The author's aim in the book is to explain geostatistics as a working tool for petroleum geologists through extensive use of case-study material mostly drawn from his own research in gas potential evaluation in West Virginia. Theory and mathematics are pared down to immediate needs.
Modeling Unreliable Observations in Bayesian Networks by Credal Networks
NASA Astrophysics Data System (ADS)
Antonucci, Alessandro; Piatti, Alberto
Bayesian networks are probabilistic graphical models widely employed in AI for the implementation of knowledge-based systems. Standard inference algorithms can update the beliefs about a variable of interest in the network after the observation of some other variables. This is usually achieved under the assumption that the observations could reveal the actual states of the variables in a fully reliable way. We propose a procedure for a more general modeling of the observations, which allows for updating beliefs in different situations, including various cases of unreliable, incomplete, uncertain and also missing observations. This is achieved by augmenting the original Bayesian network with a number of auxiliary variables corresponding to the observations. For a flexible modeling of the observational process, the quantification of the relations between these auxiliary variables and those of the original Bayesian network is done by credal sets, i.e., convex sets of probability mass functions. Without any lack of generality, we show how this can be done by simply estimating the bounds of likelihoods of the observations for the different values of the observed variables. Overall, the Bayesian network is transformed into a credal network, for which a standard updating problem has to be solved. Finally, a number of transformations that might simplify the updating of the resulting credal network is provided.
Empirical evaluation of scoring functions for Bayesian network model selection.
Liu, Zhifa; Malone, Brandon; Yuan, Changhe
2012-01-01
In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also
Empirical evaluation of scoring functions for Bayesian network model selection
2012-01-01
In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also
NASA Astrophysics Data System (ADS)
Chahal, M. K.; Brown, D. J.; Brooks, E. S.; Campbell, C.; Cobos, D. R.; Vierling, L. A.
2012-12-01
Estimating soil moisture content continuously over space and time using geo-statistical techniques supports the refinement of process-based watershed hydrology models and the application of soil process models (e.g. biogeochemical models predicting greenhouse gas fluxes) to complex landscapes. In this study, we model soil profile volumetric moisture content for five agricultural fields with loess soils in the Palouse region of Eastern Washington and Northern Idaho. Using a combination of stratification and space-filling techniques, we selected 42 representative and distributed measurement locations in the Cook Agronomy Farm (Pullman, WA) and 12 locations each in four additional grower fields that span the precipitation gradient across the Palouse. At each measurement location, soil moisture was measured on an hourly basis at five different depths (30, 60, 90, 120, and 150 cm) using Decagon 5-TE/5-TM soil moisture sensors (Decagon Devices, Pullman, WA, USA). This data was collected over three years for the Cook Agronomy Farm and one year for each of the grower fields. In addition to ordinary kriging, we explored the correlation of volumetric water content with external, spatially exhaustive indices derived from terrain models, optical remote sensing imagery, and proximal soil sensing data (electromagnetic induction and VisNIR penetrometer)
Uncertainties in ozone concentrations predicted with a Lagrangian photochemical air quality model have been estimated using Bayesian Monte Carlo (BMC) analysis. Bayesian Monte Carlo analysis provides a means of combining subjective "prior" uncertainty estimates developed ...
Flipo, Nicolas; Jeannée, Nicolas; Poulin, Michel; Even, Stéphanie; Ledoux, Emmanuel
2007-03-01
The objective of this work is to combine several approaches to better understand nitrate fate in the Grand Morin aquifers (2700 km(2)), part of the Seine basin. cawaqs results from the coupling of the hydrogeological model newsam with the hydrodynamic and biogeochemical model of river ProSe. cawaqs is coupled with the agronomic model Stics in order to simulate nitrate migration in basins. First, kriging provides a satisfactory representation of aquifer nitrate contamination from local observations, to set initial conditions for the physically based model. Then associated confidence intervals, derived from data using geostatistics, are used to validate cawaqs results. Results and evaluation obtained from the combination of these approaches are given (period 1977-1988). Then cawaqs is used to simulate nitrate fate for a 20-year period (1977-1996). The mean nitrate concentrations increase in aquifers is 0.09 mgN L(-1)yr(-1), resulting from an average infiltration flux of 3500 kgN.km(-2)yr(-1).
Bayesian neural network for rainfall-runoff modeling
NASA Astrophysics Data System (ADS)
Khan, Mohammad Sajjad; Coulibaly, Paulin
2006-07-01
In this paper, a Bayesian learning approach is introduced to train a multilayer feed-forward network for daily river flow and reservoir inflow simulation in a cold region river basin in Canada. In Bayesian approach, uncertainty about the relationship between inputs and outputs is initially taken care of by an assumed prior distribution of parameters (weights and biases). This prior distribution is updated to posterior distribution using a likelihood function following Bayes' theorem while data are observed. This posterior distribution is called the objective function of a network in the Bayesian learning approach. The objective function is maximized using a suitable optimization technique. Once the network is trained, the predictive distribution of the network outputs is obtained by integrating over the posterior distribution of weights. In this study, Gaussian prior distribution and a Gaussian noise model are used in defining posterior distribution. The network has been optimized using a scaled conjugate gradient technique. Posterior distribution of weights is approximated to Gaussian during prediction. Prediction performance of the Bayesian neural network (BNN) is compared with the results obtained from a standard artificial neural network (ANN) model and a widely used conceptual rainfall-runoff model, namely, HBV-96. The BNN model outperformed the conceptual model and slightly outperformed the standard ANN model in simulating mean, peak, and low river flows and reservoir inflows. The significant contribution of the Bayesian method over the conventional ANN approach, among others, is the uncertainty estimation of the outputs in the form of confidence intervals which are particularly needed in practical water resources applications. Prediction confidence limits (or intervals) indicate the extent to which one can rely on predictions for decision making. It is shown that the BNN can provide reliable streamflow and reservoir inflow forecasts without a loss in model
Bayesian analysis of structural equation models with dichotomous variables.
Lee, Sik-Yum; Song, Xin-Yuan
2003-10-15
Structural equation modelling has been used extensively in the behavioural and social sciences for studying interrelationships among manifest and latent variables. Recently, its uses have been well recognized in medical research. This paper introduces a Bayesian approach to analysing general structural equation models with dichotomous variables. In the posterior analysis, the observed dichotomous data are augmented with the hypothetical missing values, which involve the latent variables in the model and the unobserved continuous measurements underlying the dichotomous data. An algorithm based on the Gibbs sampler is developed for drawing the parameters values and the hypothetical missing values from the joint posterior distributions. Useful statistics, such as the Bayesian estimates and their standard error estimates, and the highest posterior density intervals, can be obtained from the simulated observations. A posterior predictive p-value is used to test the goodness-of-fit of the posited model. The methodology is applied to a study of hypertensive patient non-adherence to medication.
Spatial Bayesian hierarchical modelling of extreme sea states
NASA Astrophysics Data System (ADS)
Clancy, Colm; O'Sullivan, John; Sweeney, Conor; Dias, Frédéric; Parnell, Andrew C.
2016-11-01
A Bayesian hierarchical framework is used to model extreme sea states, incorporating a latent spatial process to more effectively capture the spatial variation of the extremes. The model is applied to a 34-year hindcast of significant wave height off the west coast of Ireland. The generalised Pareto distribution is fitted to declustered peaks over a threshold given by the 99.8th percentile of the data. Return levels of significant wave height are computed and compared against those from a model based on the commonly-used maximum likelihood inference method. The Bayesian spatial model produces smoother maps of return levels. Furthermore, this approach greatly reduces the uncertainty in the estimates, thus providing information on extremes which is more useful for practical applications.
Bayesian non parametric modelling of Higgs pair production
NASA Astrophysics Data System (ADS)
Scarpa, Bruno; Dorigo, Tommaso
2017-03-01
Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART) to describe the atoms in the Dirichlet process.
NASA Astrophysics Data System (ADS)
Parasyris, Antonios E.; Spanoudaki, Katerina; Kampanis, Nikolaos A.
2016-04-01
Groundwater level monitoring networks provide essential information for water resources management, especially in areas with significant groundwater exploitation for agricultural and domestic use. Given the high maintenance costs of these networks, development of tools, which can be used by regulators for efficient network design is essential. In this work, a monitoring network optimisation tool is presented. The network optimisation tool couples geostatistical modelling based on the Spartan family variogram with a genetic algorithm method and is applied to Mires basin in Crete, Greece, an area of high socioeconomic and agricultural interest, which suffers from groundwater overexploitation leading to a dramatic decrease of groundwater levels. The purpose of the optimisation tool is to determine which wells to exclude from the monitoring network because they add little or no beneficial information to groundwater level mapping of the area. Unlike previous relevant investigations, the network optimisation tool presented here uses Ordinary Kriging with the recently-established non-differentiable Spartan variogram for groundwater level mapping, which, based on a previous geostatistical study in the area leads to optimal groundwater level mapping. Seventy boreholes operate in the area for groundwater abstraction and water level monitoring. The Spartan variogram gives overall the most accurate groundwater level estimates followed closely by the power-law model. The geostatistical model is coupled to an integer genetic algorithm method programmed in MATLAB 2015a. The algorithm is used to find the set of wells whose removal leads to the minimum error between the original water level mapping using all the available wells in the network and the groundwater level mapping using the reduced well network (error is defined as the 2-norm of the difference between the original mapping matrix with 70 wells and the mapping matrix of the reduced well network). The solution to the
Wang, F.P.; Dai, J.; Kerans, C.
1998-11-01
In part 1 of this paper, the authors discussed the rock-fabric/petrophysical classes for dolomitized carbonate-ramp rocks, the effects of rock fabric and pore type on petrophysical properties, petrophysical models for analyzing wireline logs, the critical scales for defining geologic framework, and 3-D geologic modeling. Part 2 focuses on geophysical and engineering characterizations, including seismic modeling, reservoir geostatistics, stochastic modeling, and reservoir simulation. Synthetic seismograms of 30 to 200 Hz were generated to study the level of seismic resolution required to capture the high-frequency geologic features in dolomitized carbonate-ramp reservoirs. Outcrop data were collected to investigate effects of sampling interval and scale-up of block size on geostatistical parameters. Semivariogram analysis of outcrop data showed that the sill of log permeability decreases and the correlation length increases with an increase of horizontal block size. Permeability models were generated using conventional linear interpolation, stochastic realizations without stratigraphic constraints, and stochastic realizations with stratigraphic constraints. Simulations of a fine-scale Lawyer Canyon outcrop model were used to study the factors affecting waterflooding performance. Simulation results show that waterflooding performance depends strongly on the geometry and stacking pattern of the rock-fabric units and on the location of production and injection wells.
Bayesian and maximum likelihood estimation of hierarchical response time models
Farrell, Simon; Ludwig, Casimir
2008-01-01
Hierarchical (or multilevel) statistical models have become increasingly popular in psychology in the last few years. We consider the application of multilevel modeling to the ex-Gaussian, a popular model of response times. Single-level estimation is compared with hierarchical estimation of parameters of the ex-Gaussian distribution. Additionally, for each approach maximum likelihood (ML) estimation is compared with Bayesian estimation. A set of simulations and analyses of parameter recovery show that although all methods perform adequately well, hierarchical methods are better able to recover the parameters of the ex-Gaussian by reducing the variability in recovered parameters. At each level, little overall difference was observed between the ML and Bayesian methods. PMID:19001592
Bayesian model updating using incomplete modal data without mode matching
NASA Astrophysics Data System (ADS)
Sun, Hao; Büyüköztürk, Oral
2016-04-01
This study investigates a new probabilistic strategy for model updating using incomplete modal data. A hierarchical Bayesian inference is employed to model the updating problem. A Markov chain Monte Carlo technique with adaptive random-work steps is used to draw parameter samples for uncertainty quantification. Mode matching between measured and predicted modal quantities is not required through model reduction. We employ an iterated improved reduced system technique for model reduction. The reduced model retains the dynamic features as close as possible to those of the model before reduction. The proposed algorithm is finally validated by an experimental example.
Bayesian Estimation of Categorical Dynamic Factor Models
ERIC Educational Resources Information Center
Zhang, Zhiyong; Nesselroade, John R.
2007-01-01
Dynamic factor models have been used to analyze continuous time series behavioral data. We extend 2 main dynamic factor model variations--the direct autoregressive factor score (DAFS) model and the white noise factor score (WNFS) model--to categorical DAFS and WNFS models in the framework of the underlying variable method and illustrate them with…
Application of a predictive Bayesian model to environmental accounting.
Anex, R P; Englehardt, J D
2001-03-30
Environmental accounting techniques are intended to capture important environmental costs and benefits that are often overlooked in standard accounting practices. Environmental accounting methods themselves often ignore or inadequately represent large but highly uncertain environmental costs and costs conditioned by specific prior events. Use of a predictive Bayesian model is demonstrated for the assessment of such highly uncertain environmental and contingent costs. The predictive Bayesian approach presented generates probability distributions for the quantity of interest (rather than parameters thereof). A spreadsheet implementation of a previously proposed predictive Bayesian model, extended to represent contingent costs, is described and used to evaluate whether a firm should undertake an accelerated phase-out of its PCB containing transformers. Variability and uncertainty (due to lack of information) in transformer accident frequency and severity are assessed simultaneously using a combination of historical accident data, engineering model-based cost estimates, and subjective judgement. Model results are compared using several different risk measures. Use of the model for incorporation of environmental risk management into a company's overall risk management strategy is discussed.
Application of the Bayesian dynamic survival model in medicine.
He, Jianghua; McGee, Daniel L; Niu, Xufeng
2010-02-10
The Bayesian dynamic survival model (BDSM), a time-varying coefficient survival model from the Bayesian prospective, was proposed in early 1990s but has not been widely used or discussed. In this paper, we describe the model structure of the BDSM and introduce two estimation approaches for BDSMs: the Markov Chain Monte Carlo (MCMC) approach and the linear Bayesian (LB) method. The MCMC approach estimates model parameters through sampling and is computationally intensive. With the newly developed geoadditive survival models and software BayesX, the BDSM is available for general applications. The LB approach is easier in terms of computations but it requires the prespecification of some unknown smoothing parameters. In a simulation study, we use the LB approach to show the effects of smoothing parameters on the performance of the BDSM and propose an ad hoc method for identifying appropriate values for those parameters. We also demonstrate the performance of the MCMC approach compared with the LB approach and a penalized partial likelihood method available in software R packages. A gastric cancer trial is utilized to illustrate the application of the BDSM.
Niazi, Nabeel K; Bishop, Thomas F A; Singh, Balwant
2011-12-15
This study investigated the spatial variability of total and phosphate-extractable arsenic (As) concentrations in soil adjacent to a cattle-dip site, employing a linear mixed model-based geostatistical approach. The soil samples in the study area (n = 102 in 8.1 m(2)) were taken at the nodes of a 0.30 × 0.35 m grid. The results showed that total As concentration (0-0.2 m depth) and phosphate-extractable As concentration (at depths of 0-0.2, 0.2-0.4, and 0.4-0.6 m) in soil adjacent to the dip varied greatly. Both total and phosphate-extractable soil As concentrations significantly (p = 0.004-0.048) increased toward the cattle-dip. Using the linear mixed model, we suggest that 5 samples are sufficient to assess a dip site for soil (As) contamination (95% confidence interval of ±475.9 mg kg(-1)), but 15 samples (95% confidence interval of ±212.3 mg kg(-1)) is desirable baseline when the ultimate goal is to evaluate the effects of phytoremediation. Such guidelines on sampling requirements are crucial for the assessment of As contamination levels at other cattle-dip sites, and to determine the effect of phytoremediation on soil As.
NASA Astrophysics Data System (ADS)
Huysmans, Marijke; Dassargues, Alain
2012-07-01
SummaryThis study investigates whether fine-scale clay drapes can cause an anisotropic pumping test response at a much larger scale. A pumping test was performed in a sandbar deposit consisting of cross-bedded units composed of materials with different grain sizes and hydraulic conductivities. The measured drawdown values in the different observation wells reveal an anisotropic or elliptically-shaped pumping cone. The major axis of the pumping ellipse is parallel with the strike of cm to m-scale clay drapes that are observed in several outcrops. To determine (1) whether this large-scale anisotropy can be the result of fine-scale clay drapes and (2) whether application of multiple-point geostatistics can improve interpretation of pumping tests, this pumping test is analyzed with a local 3D groundwater model in which fine-scale sedimentary heterogeneity is modelled using multiple-point geostatistics. To reduce CPU and RAM demand of the multiple-point geostatistical simulation step, edge properties indicating the presence of irregularly-shaped surfaces are directly simulated. Results show that the anisotropic pumping cone can be attributed to the presence of the clay drapes. Incorporating fine-scale clay drapes results in a better fit between observed and calculated drawdowns. These results thus show that fine-scale clay drapes can cause an anisotropic pumping test response at a much larger scale and that the combined approach of multiple-point geostatistics and cell edge properties is an efficient method for integrating fine-scale features in larger scale models.
Meyer, Swen; Blaschek, Michael; Duttmann, Rainer; Ludwig, Ralf
2016-02-01
According to current climate projections, Mediterranean countries are at high risk for an even pronounced susceptibility to changes in the hydrological budget and extremes. These changes are expected to have severe direct impacts on the management of water resources, agricultural productivity and drinking water supply. Current projections of future hydrological change, based on regional climate model results and subsequent hydrological modeling schemes, are very uncertain and poorly validated. The Rio Mannu di San Sperate Basin, located in Sardinia, Italy, is one test site of the CLIMB project. The Water Simulation Model (WaSiM) was set up to model current and future hydrological conditions. The availability of measured meteorological and hydrological data is poor as it is common for many Mediterranean catchments. In this study we conducted a soil sampling campaign in the Rio Mannu catchment. We tested different deterministic and hybrid geostatistical interpolation methods on soil textures and tested the performance of the applied models. We calculated a new soil texture map based on the best prediction method. The soil model in WaSiM was set up with the improved new soil information. The simulation results were compared to standard soil parametrization. WaSiMs was validated with spatial evapotranspiration rates using the triangle method (Jiang and Islam, 1999). WaSiM was driven with the meteorological forcing taken from 4 different ENSEMBLES climate projections for a reference (1971-2000) and a future (2041-2070) times series. The climate change impact was assessed based on differences between reference and future time series. The simulated results show a reduction of all hydrological quantities in the future in the spring season. Furthermore simulation results reveal an earlier onset of dry conditions in the catchment. We show that a solid soil model setup based on short-term field measurements can improve long-term modeling results, which is especially important
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Bayesian restoration of ion channel records using hidden Markov models.
Rosales, R; Stark, J A; Fitzgerald, W J; Hladky, S B
2001-03-01
Hidden Markov models have been used to restore recorded signals of single ion channels buried in background noise. Parameter estimation and signal restoration are usually carried out through likelihood maximization by using variants of the Baum-Welch forward-backward procedures. This paper presents an alternative approach for dealing with this inferential task. The inferences are made by using a combination of the framework provided by Bayesian statistics and numerical methods based on Markov chain Monte Carlo stochastic simulation. The reliability of this approach is tested by using synthetic signals of known characteristics. The expectations of the model parameters estimated here are close to those calculated using the Baum-Welch algorithm, but the present methods also yield estimates of their errors. Comparisons of the results of the Bayesian Markov Chain Monte Carlo approach with those obtained by filtering and thresholding demonstrate clearly the superiority of the new methods.
A geostatistical approach to contaminant source identification
NASA Astrophysics Data System (ADS)
Snodgrass, Mark F.; Kitanidis, Peter K.
1997-04-01
A geostatistical approach to contaminant source estimation is presented. The problem is to estimate the release history of a conservative solute given point concentration measurements at some time after the release. A Bayesian framework is followed to derive the best estimate and to quantify the estimation error. The relation between this approach and common regularization and interpolation schemes is discussed. The performance of the method is demonstrated for transport in a simple one-dimensional homogeneous medium, although the approach is directly applicable to transport in two- or three-dimensional domains. The methodology produces a best estimate of the release history and a confidence interval. Conditional realizations of the release history are generated that are useful in visualization and risk assessment. The performance of the method with sparse data and large measurement error is examined. Emphasis is placed on formulating the estimation method in a computationally efficient manner. The method does not require the inversion of matrices whose size depends on the grid size used to resolve the solute release history. The issue of model validation is addressed.
Theory-Based Bayesian Models of Inductive Inference
2010-06-30
Oxford University Press . 28. Griffiths, T. L. and Tenenbaum, J.B. (2007). Two proposals for causal grammar. In A. Gopnik and L. Schulz (eds.). ( ausal Learning. Oxford University Press . 29. Tenenbaum. J. B.. Kemp, C, Shafto. P. (2007). Theory-based Bayesian models for inductive reasoning. In A. Feeney and E. Heit (eds.). Induction. Cambridge University Press. 30. Goodman, N. D., Tenenbaum, J. B., Griffiths. T. L.. & Feldman, J. (2008). Compositionality in rational analysis: Grammar-based induction for concept
Bayesian hierarchical model for large-scale covariance matrix estimation.
Zhu, Dongxiao; Hero, Alfred O
2007-12-01
Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.
Slice sampling technique in Bayesian extreme of gold price modelling
NASA Astrophysics Data System (ADS)
Rostami, Mohammad; Adam, Mohd Bakri; Ibrahim, Noor Akma; Yahya, Mohamed Hisham
2013-09-01
In this paper, a simulation study of Bayesian extreme values by using Markov Chain Monte Carlo via slice sampling algorithm is implemented. We compared the accuracy of slice sampling with other methods for a Gumbel model. This study revealed that slice sampling algorithm offers more accurate and closer estimates with less RMSE than other methods . Finally we successfully employed this procedure to estimate the parameters of Malaysia extreme gold price from 2000 to 2011.
How to Address Measurement Noise in Bayesian Model Averaging
NASA Astrophysics Data System (ADS)
Schöniger, A.; Wöhling, T.; Nowak, W.
2014-12-01
When confronted with the challenge of selecting one out of several competing conceptual models for a specific modeling task, Bayesian model averaging is a rigorous choice. It ranks the plausibility of models based on Bayes' theorem, which yields an optimal trade-off between performance and complexity. With the resulting posterior model probabilities, their individual predictions are combined into a robust weighted average and the overall predictive uncertainty (including conceptual uncertainty) can be quantified. This rigorous framework does, however, not yet explicitly consider statistical significance of measurement noise in the calibration data set. This is a major drawback, because model weights might be instable due to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new extension to the Bayesian model averaging framework that explicitly accounts for measurement noise as a source of uncertainty for the weights. This enables modelers to assess the reliability of model ranking for a specific application and a given calibration data set. Also, the impact of measurement noise on the overall prediction uncertainty can be determined. Technically, our extension is built within a Monte Carlo framework. We repeatedly perturb the observed data with random realizations of measurement error. Then, we determine the robustness of the resulting model weights against measurement noise. We quantify the variability of posterior model weights as weighting variance. We add this new variance term to the overall prediction uncertainty analysis within the Bayesian model averaging framework to make uncertainty quantification more realistic and "complete". We illustrate the importance of our suggested extension with an application to soil-plant model selection, based on studies by Wöhling et al. (2013, 2014). Results confirm that noise in leaf area index or evaporation rate observations produces a significant amount of weighting
Bayesian Isotonic Regression Dose-response (BIRD) Model.
Li, Wen; Fu, Haoda
2016-12-21
Understanding dose-response relationship is a crucial step in drug development. There are a few parametric methods to estimate dose-response curves, such as the Emax model and the logistic model. These parametric models are easy to interpret and, hence, widely used. However, these models often require the inclusion of patients on high-dose levels; otherwise, the model parameters cannot be reliably estimated. To have robust estimation, nonparametric models are used. However, these models are not able to estimate certain important clinical parameters, such as ED50 and Emax. Furthermore, in many therapeutic areas, dose-response curves can be assumed as non-decreasing functions. This creates an additional challenge for nonparametric methods. In this paper, we propose a new Bayesian isotonic regression dose-response model which features advantages from both parametric and nonparametric models. The ED50 and Emax can be derived from this model. Simulations are provided to evaluate the Bayesian isotonic regression dose-response model performance against two parametric models. We apply this model to a data set from a diabetes dose-finding study.
Bayesian prediction of placebo analgesia in an instrumental learning model
Jung, Won-Mo; Lee, Ye-Seul; Wallraven, Christian; Chae, Younbyoung
2017-01-01
Placebo analgesia can be primarily explained by the Pavlovian conditioning paradigm in which a passively applied cue becomes associated with less pain. In contrast, instrumental conditioning employs an active paradigm that might be more similar to clinical settings. In the present study, an instrumental conditioning paradigm involving a modified trust game in a simulated clinical situation was used to induce placebo analgesia. Additionally, Bayesian modeling was applied to predict the placebo responses of individuals based on their choices. Twenty-four participants engaged in a medical trust game in which decisions to receive treatment from either a doctor (more effective with high cost) or a pharmacy (less effective with low cost) were made after receiving a reference pain stimulus. In the conditioning session, the participants received lower levels of pain following both choices, while high pain stimuli were administered in the test session even after making the decision. The choice-dependent pain in the conditioning session was modulated in terms of both intensity and uncertainty. Participants reported significantly less pain when they chose the doctor or the pharmacy for treatment compared to the control trials. The predicted pain ratings based on Bayesian modeling showed significant correlations with the actual reports from participants for both of the choice categories. The instrumental conditioning paradigm allowed for the active choice of optional cues and was able to induce the placebo analgesia effect. Additionally, Bayesian modeling successfully predicted pain ratings in a simulated clinical situation that fits well with placebo analgesia induced by instrumental conditioning. PMID:28225816
NASA Astrophysics Data System (ADS)
Koike, Katsuaki; Kubo, Taiki; Liu, Chunxue; Masoud, Alaa; Amano, Kenji; Kurihara, Arata; Matsuoka, Toshiyuki; Lanyon, Bill
2015-10-01
This study integrates 3D models of rock fractures from different sources and hydraulic properties aimed at identifying relationships between fractures and permeability. The Tono area in central Japan, chiefly overlain by Cretaceous granite, was examined because of the availability of a unique dataset from deep borehole data at 26 sites. A geostatistical method (GEOFRAC) that can incorporate orientations of sampled data was applied to 50,900 borehole fractures for spatial modeling of fractures over a 12 km by 8 km area, to a depth of 1.5 km. GEOFRAC produced a plausible 3D fracture model, in that the orientations of simulated fractures correspond to those of the sample data and the continuous fractures appeared near a known fault. Small-scale fracture distributions with dominant orientations were also characterized around the two shafts using fracture data from the shaft walls. By integrating the 3D model of hydraulic conductivity using sequential Gaussian simulation with the GEOFRAC fractures from the borehole data, the fracture sizes and directions that strongly affect permeable features were identified. Four fracture-related elements: lineaments from a shaded 10-m DEM, GEOFRAC fractures using the borehole and shaft data, and microcracks from SEM images, were used for correlating fracture attributes at different scales. The consistency of the semivariogram models of distribution densities was identified. Using an experimental relationship between hydraulic conductivity and fracture length, the fractures that typically affect the hydraulic properties at the drift scale were surmised to be in the range 100-200 m. These results are useful for a comprehensive understanding of rock fracture systems and their hydraulic characteristics at multiple scales in a target area.
DISSECTING MAGNETAR VARIABILITY WITH BAYESIAN HIERARCHICAL MODELS
Huppenkothen, Daniela; Elenbaas, Chris; Watts, Anna L.; Horst, Alexander J. van der; Brewer, Brendon J.; Hogg, David W.; Murray, Iain; Frean, Marcus; Levin, Yuri; Kouveliotou, Chryssa
2015-09-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
A Bayesian Nonparametric Meta-Analysis Model
ERIC Educational Resources Information Center
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
3-D model-based Bayesian classification
Soenneland, L.; Tenneboe, P.; Gehrmann, T.; Yrke, O.
1994-12-31
The challenging task of the interpreter is to integrate different pieces of information and combine them into an earth model. The sophistication level of this earth model might vary from the simplest geometrical description to the most complex set of reservoir parameters related to the geometrical description. Obviously the sophistication level also depend on the completeness of the available information. The authors describe the interpreter`s task as a mapping between the observation space and the model space. The information available to the interpreter exists in observation space and the task is to infer a model in model-space. It is well-known that this inversion problem is non-unique. Therefore any attempt to find a solution depend son constraints being added in some manner. The solution will obviously depend on which constraints are introduced and it would be desirable to allow the interpreter to modify the constraints in a problem-dependent manner. They will present a probabilistic framework that gives the interpreter the tools to integrate the different types of information and produce constrained solutions. The constraints can be adapted to the problem at hand.
Bayesian Thurstonian models for ranking data using JAGS.
Johnson, Timothy R; Kuhn, Kristine M
2013-09-01
A Thurstonian model for ranking data assumes that observed rankings are consistent with those of a set of underlying continuous variables. This model is appealing since it renders ranking data amenable to familiar models for continuous response variables-namely, linear regression models. To date, however, the use of Thurstonian models for ranking data has been very rare in practice. One reason for this may be that inferences based on these models require specialized technical methods. These methods have been developed to address computational challenges involved in these models but are not easy to implement without considerable technical expertise and are not widely available in software packages. To address this limitation, we show that Bayesian Thurstonian models for ranking data can be very easily implemented with the JAGS software package. We provide JAGS model files for Thurstonian ranking models for general use, discuss their implementation, and illustrate their use in analyses.
Dummer, T J B; Yu, Z M; Nauta, L; Murimboh, J D; Parker, L
2015-02-01
Arsenic is a naturally occurring class 1 human carcinogen that is widespread in private drinking water wells throughout the province of Nova Scotia in Canada. In this paper we explore the spatial variation in toenail arsenic concentrations (arsenic body burden) in Nova Scotia. We describe the regional distribution of arsenic concentrations in private well water supplies in the province, and evaluate the geological and environmental features associated with higher levels of arsenic in well water. We develop geostatistical process models to predict high toenail arsenic concentrations and high well water arsenic concentrations, which have utility for studies where no direct measurements of arsenic body burden or arsenic exposure are available. 892 men and women who participated in the Atlantic Partnership for Tomorrow's Health Project provided both drinking water and toenail clipping samples. Information on socio-demographic, lifestyle and health factors was obtained with a set of standardized questionnaires. Anthropometric indices and arsenic concentrations in drinking water and toenails were measured. In addition, data on arsenic concentrations in 10,498 private wells were provided by the Nova Scotia Department of Environment. We utilised stepwise multivariable logistic regression modelling to develop separate statistical models to: a) predict high toenail arsenic concentrations (defined as toenail arsenic levels ≥0.12 μg g(-1)) and b) predict high well water arsenic concentrations (defined as well water arsenic levels ≥5.0 μg L(-1)). We found that the geological and environmental information that predicted well water arsenic concentrations can also be used to accurately predict toenail arsenic concentrations. We conclude that geological and environmental factors contributing to arsenic contamination in well water are the major contributing influences on arsenic body burden among Nova Scotia residents. Further studies are warranted to assess appropriate
NASA Astrophysics Data System (ADS)
Tadeu Pereira, Gener; Ribeiro de Oliveira, Ismênia; De Bortoli Teixeira, Daniel; Arantes Camargo, Livia; Rodrigo Panosso, Alan; Marques, José, Jr.
2015-04-01
Phosphorus is one of the limiting nutrients for sugarcane development in Brazilian soils. The spatial variability of this nutrient is great, defined by the properties that control its adsorption and desorption reactions. Spatial estimates to characterize this variability are based on geostatistical interpolation. Thus, the assessment of the uncertainty of estimates associated with the spatial distribution of available P (Plabile) is decisive to optimize the use of phosphate fertilizers. The purpose of this study was to evaluate the performance of sequential Gaussian simulation (sGs) and ordinary kriging (OK) in the modeling of uncertainty in available P estimates. A sampling grid with 626 points was established in a 200-ha experimental sugarcane field in Tabapuã, São Paulo State, Brazil. The soil was sampled in the crossover points of a regular grid with intervals of 50 m. From the observations, 63 points, approximately 10% of sampled points were randomly selected before the geostatistical modeling of the composition of a data set used in the validation process modeling, while the remaining 563 points were used for the predictions variable in a place not sampled. The sGs generated 200 realizations. From the realizations generated, different measures of estimation and uncertainty were obtained. The standard deviation, calculated point to point, all simulated maps provided the map of deviation, used to assess local uncertainty. The visual analysis of maps of the E-type and KO showed that the spatial patterns produced by both methods were similar, however, it was possible to observe the characteristic smoothing effect of the KO especially in regions with extreme values. The Standardized variograms of selected realizations sGs showed both range and model similar to the variogram of the Observed date of Plabile. The variogram KO showed a distinct structure of the observed data, underestimating the variability over short distances, presenting parabolic behavior near
Predicting coastal cliff erosion using a Bayesian probabilistic model
Hapke, C.; Plant, N.
2010-01-01
Regional coastal cliff retreat is difficult to model due to the episodic nature of failures and the along-shore variability of retreat events. There is a growing demand, however, for predictive models that can be used to forecast areas vulnerable to coastal erosion hazards. Increasingly, probabilistic models are being employed that require data sets of high temporal density to define the joint probability density function that relates forcing variables (e.g. wave conditions) and initial conditions (e.g. cliff geometry) to erosion events. In this study we use a multi-parameter Bayesian network to investigate correlations between key variables that control and influence variations in cliff retreat processes. The network uses Bayesian statistical methods to estimate event probabilities using existing observations. Within this framework, we forecast the spatial distribution of cliff retreat along two stretches of cliffed coast in Southern California. The input parameters are the height and slope of the cliff, a descriptor of material strength based on the dominant cliff-forming lithology, and the long-term cliff erosion rate that represents prior behavior. The model is forced using predicted wave impact hours. Results demonstrate that the Bayesian approach is well-suited to the forward modeling of coastal cliff retreat, with the correct outcomes forecast in 70-90% of the modeled transects. The model also performs well in identifying specific locations of high cliff erosion, thus providing a foundation for hazard mapping. This approach can be employed to predict cliff erosion at time-scales ranging from storm events to the impacts of sea-level rise at the century-scale. ?? 2010.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Jara, Alejandro; Hanson, Timothy E.; Quintana, Fernando A.; Müller, Peter; Rosner, Gary L.
2011-01-01
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in R, DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN. PMID:21796263
Bayesian calibration of hyperelastic constitutive models of soft tissue.
Madireddy, Sandeep; Sista, Bhargava; Vemaganti, Kumar
2016-06-01
There is inherent variability in the experimental response used to characterize the hyperelastic mechanical response of soft tissues. This has to be accounted for while estimating the parameters in the constitutive models to obtain reliable estimates of the quantities of interest. The traditional least squares method of parameter estimation does not give due importance to this variability. We use a Bayesian calibration framework based on nested Monte Carlo sampling to account for the variability in the experimental data and its effect on the estimated parameters through a systematic probability-based treatment. We consider three different constitutive models to represent the hyperelastic nature of soft tissue: Mooney-Rivlin model, exponential model, and Ogden model. Three stress-strain data sets corresponding to the deformation of agarose gel, bovine liver tissue, and porcine brain tissue are considered. Bayesian fits and parameter estimates are compared with the corresponding least squares values. Finally, we propagate the uncertainty in the parameters to a quantity of interest (QoI), namely the force-indentation response, to study the effect of model form on the values of the QoI. Our results show that the quality of the fit alone is insufficient to determine the adequacy of the model, and due importance has to be given to the maximum likelihood value, the landscape of the likelihood distribution, and model complexity.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
Estimating anatomical trajectories with Bayesian mixed-effects modeling.
Ziegler, G; Penny, W D; Ridgway, G R; Ourselin, S; Friston, K J
2015-11-01
We introduce a mass-univariate framework for the analysis of whole-brain structural trajectories using longitudinal Voxel-Based Morphometry data and Bayesian inference. Our approach to developmental and aging longitudinal studies characterizes heterogeneous structural growth/decline between and within groups. In particular, we propose a probabilistic generative model that parameterizes individual and ensemble average changes in brain structure using linear mixed-effects models of age and subject-specific covariates. Model inversion uses Expectation Maximization (EM), while voxelwise (empirical) priors on the size of individual differences are estimated from the data. Bayesian inference on individual and group trajectories is realized using Posterior Probability Maps (PPM). In addition to parameter inference, the framework affords comparisons of models with varying combinations of model order for fixed and random effects using model evidence. We validate the model in simulations and real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) project. We further demonstrate how subject specific characteristics contribute to individual differences in longitudinal volume changes in healthy subjects, Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD).
Estimating anatomical trajectories with Bayesian mixed-effects modeling
Ziegler, G.; Penny, W.D.; Ridgway, G.R.; Ourselin, S.; Friston, K.J.
2015-01-01
We introduce a mass-univariate framework for the analysis of whole-brain structural trajectories using longitudinal Voxel-Based Morphometry data and Bayesian inference. Our approach to developmental and aging longitudinal studies characterizes heterogeneous structural growth/decline between and within groups. In particular, we propose a probabilistic generative model that parameterizes individual and ensemble average changes in brain structure using linear mixed-effects models of age and subject-specific covariates. Model inversion uses Expectation Maximization (EM), while voxelwise (empirical) priors on the size of individual differences are estimated from the data. Bayesian inference on individual and group trajectories is realized using Posterior Probability Maps (PPM). In addition to parameter inference, the framework affords comparisons of models with varying combinations of model order for fixed and random effects using model evidence. We validate the model in simulations and real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) project. We further demonstrate how subject specific characteristics contribute to individual differences in longitudinal volume changes in healthy subjects, Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD). PMID:26190405
Modeling Women's Menstrual Cycles using PICI Gates in Bayesian Network.
Zagorecki, Adam; Łupińska-Dubicka, Anna; Voortman, Mark; Druzdzel, Marek J
2016-03-01
A major difficulty in building Bayesian network (BN) models is the size of conditional probability tables, which grow exponentially in the number of parents. One way of dealing with this problem is through parametric conditional probability distributions that usually require only a number of parameters that is linear in the number of parents. In this paper, we introduce a new class of parametric models, the Probabilistic Independence of Causal Influences (PICI) models, that aim at lowering the number of parameters required to specify local probability distributions, but are still capable of efficiently modeling a variety of interactions. A subset of PICI models is decomposable and this leads to significantly faster inference as compared to models that cannot be decomposed. We present an application of the proposed method to learning dynamic BNs for modeling a woman's menstrual cycle. We show that PICI models are especially useful for parameter learning from small data sets and lead to higher parameter accuracy than when learning CPTs.
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
Baele, Guy; Lemey, Philippe; Suchard, Marc A
2016-03-01
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of "working distributions" to facilitate--or shorten--the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a "working" distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different "working" distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
Study of TEC fluctuation via stochastic models and Bayesian inversion
NASA Astrophysics Data System (ADS)
Bires, A.; Roininen, L.; Damtie, B.; Nigussie, M.; Vanhamäki, H.
2016-11-01
We propose stochastic processes to be used to model the total electron content (TEC) observation. Based on this, we model the rate of change of TEC (ROT) variation during ionospheric quiet conditions with stationary processes. During ionospheric disturbed conditions, for example, when irregularity in ionospheric electron density distribution occurs, stationarity assumption over long time periods is no longer valid. In these cases, we make the parameter estimation for short time scales, during which we can assume stationarity. We show the relationship between the new method and commonly used TEC characterization parameters ROT and the ROT Index (ROTI). We construct our parametric model within the framework of Bayesian statistical inverse problems and hence give the solution as an a posteriori probability distribution. Bayesian framework allows us to model measurement errors systematically. Similarly, we mitigate variation of TEC due to factors which are not of ionospheric origin, like due to the motion of satellites relative to the receiver, by incorporating a priori knowledge in the Bayesian model. In practical computations, we draw the so-called maximum a posteriori estimates, which are our ROT and ROTI estimates, from the posterior distribution. Because the algorithm allows to estimate ROTI at each observation time, the estimator does not depend on the period of time for ROTI computation. We verify the method by analyzing TEC data recorded by GPS receiver located in Ethiopia (11.6°N, 37.4°E). The results indicate that the TEC fluctuations caused by the ionospheric irregularity can be effectively detected and quantified from the estimated ROT and ROTI values.
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
A Bayesian subgroup analysis using collections of ANOVA models.
Liu, Jinzhong; Sivaganesan, Siva; Laud, Purushottam W; Müller, Peter
2017-03-20
We develop a Bayesian approach to subgroup analysis using ANOVA models with multiple covariates, extending an earlier work. We assume a two-arm clinical trial with normally distributed response variable. We also assume that the covariates for subgroup finding are categorical and are a priori specified, and parsimonious easy-to-interpret subgroups are preferable. We represent the subgroups of interest by a collection of models and use a model selection approach to finding subgroups with heterogeneous effects. We develop suitable priors for the model space and use an objective Bayesian approach that yields multiplicity adjusted posterior probabilities for the models. We use a structured algorithm based on the posterior probabilities of the models to determine which subgroup effects to report. Frequentist operating characteristics of the approach are evaluated using simulation. While our approach is applicable in more general cases, we mainly focus on the 2 × 2 case of two covariates each at two levels for ease of presentation. The approach is illustrated using a real data example.
Bayesian joint modeling of longitudinal and spatial survival AIDS data.
Martins, Rui; Silva, Giovani L; Andreozzi, Valeska
2016-08-30
Joint analysis of longitudinal and survival data has received increasing attention in the recent years, especially for analyzing cancer and AIDS data. As both repeated measurements (longitudinal) and time-to-event (survival) outcomes are observed in an individual, a joint modeling is more appropriate because it takes into account the dependence between the two types of responses, which are often analyzed separately. We propose a Bayesian hierarchical model for jointly modeling longitudinal and survival data considering functional time and spatial frailty effects, respectively. That is, the proposed model deals with non-linear longitudinal effects and spatial survival effects accounting for the unobserved heterogeneity among individuals living in the same region. This joint approach is applied to a cohort study of patients with HIV/AIDS in Brazil during the years 2002-2006. Our Bayesian joint model presents considerable improvements in the estimation of survival times of the Brazilian HIV/AIDS patients when compared with those obtained through a separate survival model and shows that the spatial risk of death is the same across the different Brazilian states. Copyright © 2016 John Wiley & Sons, Ltd.
Two Bayesian tests of the GLOMOsys Model.
Field, Sarahanne M; Wagenmakers, Eric-Jan; Newell, Ben R; Zeelenberg, René; van Ravenzwaaij, Don
2016-12-01
Priming is arguably one of the key phenomena in contemporary social psychology. Recent retractions and failed replication attempts have led to a division in the field between proponents and skeptics and have reinforced the importance of confirming certain priming effects through replication. In this study, we describe the results of 2 preregistered replication attempts of 1 experiment by Förster and Denzler (2012). In both experiments, participants first processed letters either globally or locally, then were tested using a typicality rating task. Bayes factor hypothesis tests were conducted for both experiments: Experiment 1 (N = 100) yielded an indecisive Bayes factor of 1.38, indicating that the in-lab data are 1.38 times more likely to have occurred under the null hypothesis than under the alternative. Experiment 2 (N = 908) yielded a Bayes factor of 10.84, indicating strong support for the null hypothesis that global priming does not affect participants' mean typicality ratings. The failure to replicate this priming effect challenges existing support for the GLOMO(sys) model. (PsycINFO Database Record
Objective Bayesian Comparison of Constrained Analysis of Variance Models.
Consonni, Guido; Paroli, Roberta
2016-10-04
In the social sciences we are often interested in comparing models specified by parametric equality or inequality constraints. For instance, when examining three group means [Formula: see text] through an analysis of variance (ANOVA), a model may specify that [Formula: see text], while another one may state that [Formula: see text], and finally a third model may instead suggest that all means are unrestricted. This is a challenging problem, because it involves a combination of nonnested models, as well as nested models having the same dimension. We adopt an objective Bayesian approach, requiring no prior specification from the user, and derive the posterior probability of each model under consideration. Our method is based on the intrinsic prior methodology, suitably modified to accommodate equality and inequality constraints. Focussing on normal ANOVA models, a comparative assessment is carried out through simulation studies. We also present an application to real data collected in a psychological experiment.
Bayesian model comparison in cosmology with Population Monte Carlo
NASA Astrophysics Data System (ADS)
Kilbinger, Martin; Wraith, Darren; Robert, Christian P.; Benabed, Karim; Cappé, Olivier; Cardoso, Jean-François; Fort, Gersende; Prunet, Simon; Bouchet, François R.
2010-07-01
We use Bayesian model selection techniques to test extensions of the standard flat Λ cold dark matter (ΛCDM) paradigm. Dark-energy and curvature scenarios, and primordial perturbation models are considered. To that end, we calculate the Bayesian evidence in favour of each model using Population Monte Carlo (PMC), a new adaptive sampling technique which was recently applied in a cosmological context. In contrast to the case of other sampling-based inference techniques such as Markov chain Monte Carlo (MCMC), the Bayesian evidence is immediately available from the PMC sample used for parameter estimation without further computational effort, and it comes with an associated error evaluation. Also, it provides an unbiased estimator of the evidence after any fixed number of iterations and it is naturally parallelizable, in contrast with MCMC and nested sampling methods. By comparison with analytical predictions for simulated data, we show that our results obtained with PMC are reliable and robust. The variability in the evidence evaluation and the stability for various cases are estimated both from simulations and from data. For the cases we consider, the log-evidence is calculated with a precision of better than 0.08. Using a combined set of recent cosmic microwave background, type Ia supernovae and baryonic acoustic oscillation data, we find inconclusive evidence between flat ΛCDM and simple dark-energy models. A curved universe is moderately to strongly disfavoured with respect to a flat cosmology. Using physically well-motivated priors within the slow-roll approximation of inflation, we find a weak preference for a running spectral index. A Harrison-Zel'dovich spectrum is weakly disfavoured. With the current data, tensor modes are not detected; the large prior volume on the tensor-to-scalar ratio r results in moderate evidence in favour of r = 0.
Predictive RANS simulations via Bayesian Model-Scenario Averaging
NASA Astrophysics Data System (ADS)
Edeling, W. N.; Cinnella, P.; Dwight, R. P.
2014-10-01
The turbulence closure model is the dominant source of error in most Reynolds-Averaged Navier-Stokes simulations, yet no reliable estimators for this error component currently exist. Here we develop a stochastic, a posteriori error estimate, calibrated to specific classes of flow. It is based on variability in model closure coefficients across multiple flow scenarios, for multiple closure models. The variability is estimated using Bayesian calibration against experimental data for each scenario, and Bayesian Model-Scenario Averaging (BMSA) is used to collate the resulting posteriors, to obtain a stochastic estimate of a Quantity of Interest (QoI) in an unmeasured (prediction) scenario. The scenario probabilities in BMSA are chosen using a sensor which automatically weights those scenarios in the calibration set which are similar to the prediction scenario. The methodology is applied to the class of turbulent boundary-layers subject to various pressure gradients. For all considered prediction scenarios the standard-deviation of the stochastic estimate is consistent with the measurement ground truth. Furthermore, the mean of the estimate is more consistently accurate than the individual model predictions.
Quantum-Like Bayesian Networks for Modeling Decision Making.
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios.
Predictive RANS simulations via Bayesian Model-Scenario Averaging
Edeling, W.N.; Cinnella, P.; Dwight, R.P.
2014-10-15
The turbulence closure model is the dominant source of error in most Reynolds-Averaged Navier–Stokes simulations, yet no reliable estimators for this error component currently exist. Here we develop a stochastic, a posteriori error estimate, calibrated to specific classes of flow. It is based on variability in model closure coefficients across multiple flow scenarios, for multiple closure models. The variability is estimated using Bayesian calibration against experimental data for each scenario, and Bayesian Model-Scenario Averaging (BMSA) is used to collate the resulting posteriors, to obtain a stochastic estimate of a Quantity of Interest (QoI) in an unmeasured (prediction) scenario. The scenario probabilities in BMSA are chosen using a sensor which automatically weights those scenarios in the calibration set which are similar to the prediction scenario. The methodology is applied to the class of turbulent boundary-layers subject to various pressure gradients. For all considered prediction scenarios the standard-deviation of the stochastic estimate is consistent with the measurement ground truth. Furthermore, the mean of the estimate is more consistently accurate than the individual model predictions.
Bayesian Modeling of Biomolecular Assemblies with Cryo-EM Maps
Habeck, Michael
2017-01-01
A growing array of experimental techniques allows us to characterize the three-dimensional structure of large biological assemblies at increasingly higher resolution. In addition to X-ray crystallography and nuclear magnetic resonance in solution, new structure determination methods such cryo-electron microscopy (cryo-EM), crosslinking/mass spectrometry and solid-state NMR have emerged. Often it is not sufficient to use a single experimental method, but complementary data need to be collected by using multiple techniques. The integration of all datasets can only be achieved by computational means. This article describes Inferential structure determination, a Bayesian approach to integrative modeling of biomolecular complexes with hybrid structural data. I will introduce probabilistic models for cryo-EM maps and outline Markov chain Monte Carlo algorithms for sampling model structures from the posterior distribution. I will focus on rigid and flexible modeling with cryo-EM data and discuss some of the computational challenges of Bayesian inference in the context of biomolecular modeling. PMID:28382301
Quantum-Like Bayesian Networks for Modeling Decision Making
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669
Bayesian Variable Selection on Model Spaces Constrained by Heredity Conditions.
Taylor-Rodriguez, Daniel; Womack, Andrew; Bliznyuk, Nikolay
2016-01-01
This paper investigates Bayesian variable selection when there is a hierarchical dependence structure on the inclusion of predictors in the model. In particular, we study the type of dependence found in polynomial response surfaces of orders two and higher, whose model spaces are required to satisfy weak or strong heredity conditions. These conditions restrict the inclusion of higher-order terms depending upon the inclusion of lower-order parent terms. We develop classes of priors on the model space, investigate their theoretical and finite sample properties, and provide a Metropolis-Hastings algorithm for searching the space of models. The tools proposed allow fast and thorough exploration of model spaces that account for hierarchical polynomial structure in the predictors and provide control of the inclusion of false positives in high posterior probability models.
Assessing uncertainty in a stand growth model by Bayesian synthesis
Green, E.J.; MacFarlane, D.W.; Valentine, H.T.; Strawderman, W.E.
1999-11-01
The Bayesian synthesis method (BSYN) was used to bound the uncertainty in projections calculated with PIPESTEM, a mechanistic model of forest growth. The application furnished posterior distributions of (a) the values of the model's parameters, and (b) the values of three of the model's output variables--basal area per unit land area, average tree height, and tree density--at different points in time. Confidence or credible intervals for the output variables were obtained directly from the posterior distributions. The application also provides estimates of correlation among the parameters and output variables. BSYN, which originally was applied to a population dynamics model for bowhead whales, is generally applicable to deterministic models. Extension to two or more linked models is discussed. A simple worked example is included in an appendix.
ERIC Educational Resources Information Center
Wu, Haiyan
2013-01-01
General diagnostic models (GDMs) and Bayesian networks are mathematical frameworks that cover a wide variety of psychometric models. Both extend latent class models, and while GDMs also extend item response theory (IRT) models, Bayesian networks can be parameterized using discretized IRT. The purpose of this study is to examine similarities and…
Assessing global vegetation activity using spatio-temporal Bayesian modelling
NASA Astrophysics Data System (ADS)
Mulder, Vera L.; van Eck, Christel M.; Friedlingstein, Pierre; Regnier, Pierre A. G.
2016-04-01
This work demonstrates the potential of modelling vegetation activity using a hierarchical Bayesian spatio-temporal model. This approach allows modelling changes in vegetation and climate simultaneous in space and time. Changes of vegetation activity such as phenology are modelled as a dynamic process depending on climate variability in both space and time. Additionally, differences in observed vegetation status can be contributed to other abiotic ecosystem properties, e.g. soil and terrain properties. Although these properties do not change in time, they do change in space and may provide valuable information in addition to the climate dynamics. The spatio-temporal Bayesian models were calibrated at a regional scale because the local trends in space and time can be better captured by the model. The regional subsets were defined according to the SREX segmentation, as defined by the IPCC. Each region is considered being relatively homogeneous in terms of large-scale climate and biomes, still capturing small-scale (grid-cell level) variability. Modelling within these regions is hence expected to be less uncertain due to the absence of these large-scale patterns, compared to a global approach. This overall modelling approach allows the comparison of model behavior for the different regions and may provide insights on the main dynamic processes driving the interaction between vegetation and climate within different regions. The data employed in this study encompasses the global datasets for soil properties (SoilGrids), terrain properties (Global Relief Model based on SRTM DEM and ETOPO), monthly time series of satellite-derived vegetation indices (GIMMS NDVI3g) and climate variables (Princeton Meteorological Forcing Dataset). The findings proved the potential of a spatio-temporal Bayesian modelling approach for assessing vegetation dynamics, at a regional scale. The observed interrelationships of the employed data and the different spatial and temporal trends support
Theory-based Bayesian models of inductive learning and reasoning.
Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles
2006-07-01
Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
Predicting brain activity using a Bayesian spatial model.
Derado, Gordana; Bowman, F Dubois; Zhang, Lijun
2013-08-01
Increasing the clinical applicability of functional neuroimaging technology is an emerging objective, e.g. for diagnostic and treatment purposes. We propose a novel Bayesian spatial hierarchical framework for predicting follow-up neural activity based on an individual's baseline functional neuroimaging data. Our approach attempts to overcome some shortcomings of the modeling methods used in other neuroimaging settings, by borrowing strength from the spatial correlations present in the data. Our proposed methodology is applicable to data from various imaging modalities including functional magnetic resonance imaging and positron emission tomography, and we provide an illustration here using positron emission tomography data from a study of Alzheimer's disease to predict disease progression.
Approximate Bayesian computation for forward modeling in cosmology
Akeret, Joël; Refregier, Alexandre; Amara, Adam; Seehars, Sebastian; Hasner, Caspar E-mail: alexandre.refregier@phys.ethz.ch E-mail: sebastian.seehars@phys.ethz.ch
2015-08-01
Bayesian inference is often used in cosmology and astrophysics to derive constraints on model parameters from observations. This approach relies on the ability to compute the likelihood of the data given a choice of model parameters. In many practical situations, the likelihood function may however be unavailable or intractable due to non-gaussian errors, non-linear measurements processes, or complex data formats such as catalogs and maps. In these cases, the simulation of mock data sets can often be made through forward modeling. We discuss how Approximate Bayesian Computation (ABC) can be used in these cases to derive an approximation to the posterior constraints using simulated data sets. This technique relies on the sampling of the parameter set, a distance metric to quantify the difference between the observation and the simulations and summary statistics to compress the information in the data. We first review the principles of ABC and discuss its implementation using a Population Monte-Carlo (PMC) algorithm and the Mahalanobis distance metric. We test the performance of the implementation using a Gaussian toy model. We then apply the ABC technique to the practical case of the calibration of image simulations for wide field cosmological surveys. We find that the ABC analysis is able to provide reliable parameter constraints for this problem and is therefore a promising technique for other applications in cosmology and astrophysics. Our implementation of the ABC PMC method is made available via a public code release.
Matheron, G.; Armstrong, M.
1987-01-01
The objective of this volume of contributed chapters is to present a series of applications of geostatistics. These range from a careful variographic analysis on uranium data, through detailed studies on geologically complex deposits, right up to the latest nonlinear methods applied to deposits with highly skewed data contributions. Applications of new techniques such as the external drift method for combining well data with seismic information have also been included. The volume emphasizes geostatistics in practice. Notation has been kept to a minimum and mathematical details have been relegated to annexes.
Model Selection in Historical Research Using Approximate Bayesian Computation
Rubio-Campillo, Xavier
2016-01-01
Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. Case Study This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Impact Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. PMID:26730953
Efficient multilevel brain tumor segmentation with integrated bayesian model classification.
Corso, J J; Sharon, E; Dube, S; El-Saden, S; Sinha, U; Yuille, A
2008-05-01
We present a new method for automatic segmentation of heterogeneous image data that takes a step toward bridging the gap between bottom-up affinity-based segmentation methods and top-down generative model based approaches. The main contribution of the paper is a Bayesian formulation for incorporating soft model assignments into the calculation of affinities, which are conventionally model free. We integrate the resulting model-aware affinities into the multilevel segmentation by weighted aggregation algorithm, and apply the technique to the task of detecting and segmenting brain tumor and edema in multichannel magnetic resonance (MR) volumes. The computationally efficient method runs orders of magnitude faster than current state-of-the-art techniques giving comparable or improved results. Our quantitative results indicate the benefit of incorporating model-aware affinities into the segmentation process for the difficult case of glioblastoma multiforme brain tumor.
GY SAMPLING THEORY AND GEOSTATISTICS: ALTERNATE MODELS OF VARIABILITY IN CONTINUOUS MEDIA
In the sampling theory developed by Pierre Gy, sample variability is modeled as the sum of a set of seven discrete error components. The variogram used in geostatisties provides an alternate model in which several of Gy's error components are combined in a continuous mode...
Bayesian Learning of a Language Model from Continuous Speech
NASA Astrophysics Data System (ADS)
Neubig, Graham; Mimura, Masato; Mori, Shinsuke; Kawahara, Tatsuya
We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning problem, we use non-parametric Bayesian statistics, which make it possible to balance the learned model's complexity (such as the size of the learned vocabulary) and expressive power, and provide a principled learning algorithm through the use of Gibbs sampling. Implementation is performed using weighted finite state transducers (WFSTs), which allow for the simple handling of lattice input. Experimental results on natural, adult-directed speech demonstrate that LMs built using only continuous speech are able to significantly reduce ASR phoneme error rates. The proposed technique of joint Bayesian learning of lexical units and an LM over lattices is shown to significantly contribute to this improvement.
Enhancing debris flow modeling parameters integrating Bayesian networks
NASA Astrophysics Data System (ADS)
Graf, C.; Stoffel, M.; Grêt-Regamey, A.
2009-04-01
Applied debris-flow modeling requires suitably constraint input parameter sets. Depending on the used model, there is a series of parameters to define before running the model. Normally, the data base describing the event, the initiation conditions, the flow behavior, the deposition process and mainly the potential range of possible debris flow events in a certain torrent is limited. There are only some scarce places in the world, where we fortunately can find valuable data sets describing event history of debris flow channels delivering information on spatial and temporal distribution of former flow paths and deposition zones. Tree-ring records in combination with detailed geomorphic mapping for instance provide such data sets over a long time span. Considering the significant loss potential associated with debris-flow disasters, it is crucial that decisions made in regard to hazard mitigation are based on a consistent assessment of the risks. This in turn necessitates a proper assessment of the uncertainties involved in the modeling of the debris-flow frequencies and intensities, the possible run out extent, as well as the estimations of the damage potential. In this study, we link a Bayesian network to a Geographic Information System in order to assess debris-flow risk. We identify the major sources of uncertainty and show the potential of Bayesian inference techniques to improve the debris-flow model. We model the flow paths and deposition zones of a highly active debris-flow channel in the Swiss Alps using the numerical 2-D model RAMMS. Because uncertainties in run-out areas cause large changes in risk estimations, we use the data of flow path and deposition zone information of reconstructed debris-flow events derived from dendrogeomorphological analysis covering more than 400 years to update the input parameters of the RAMMS model. The probabilistic model, which consistently incorporates this available information, can serve as a basis for spatial risk
Markov chain Monte Carlo simulation for Bayesian Hidden Markov Models
NASA Astrophysics Data System (ADS)
Chan, Lay Guat; Ibrahim, Adriana Irawati Nur Binti
2016-10-01
A hidden Markov model (HMM) is a mixture model which has a Markov chain with finite states as its mixing distribution. HMMs have been applied to a variety of fields, such as speech and face recognitions. The main purpose of this study is to investigate the Bayesian approach to HMMs. Using this approach, we can simulate from the parameters' posterior distribution using some Markov chain Monte Carlo (MCMC) sampling methods. HMMs seem to be useful, but there are some limitations. Therefore, by using the Mixture of Dirichlet processes Hidden Markov Model (MDPHMM) based on Yau et. al (2011), we hope to overcome these limitations. We shall conduct a simulation study using MCMC methods to investigate the performance of this model.
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
NASA Astrophysics Data System (ADS)
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
NASA Astrophysics Data System (ADS)
Mendes, B. S.; Draper, D.
2008-12-01
The issue of model uncertainty and model choice is central in any groundwater modeling effort [Neuman and Wierenga, 2003]; among the several approaches to the problem we favour using Bayesian statistics because it is a method that integrates in a natural way uncertainties (arising from any source) and experimental data. In this work, we experiment with several Bayesian approaches to model choice, focusing primarily on demonstrating the usefulness of the Reversible Jump Markov Chain Monte Carlo (RJMCMC) simulation method [Green, 1995]; this is an extension of the now- common MCMC methods. Standard MCMC techniques approximate posterior distributions for quantities of interest, often by creating a random walk in parameter space; RJMCMC allows the random walk to take place between parameter spaces with different dimensionalities. This fact allows us to explore state spaces that are associated with different deterministic models for experimental data. Our work is exploratory in nature; we restrict our study to comparing two simple transport models applied to a data set gathered to estimate the breakthrough curve for a tracer compound in groundwater. One model has a mean surface based on a simple advection dispersion differential equation; the second model's mean surface is also governed by a differential equation but in two dimensions. We focus on artificial data sets (in which truth is known) to see if model identification is done correctly, but we also address the issues of over and under-paramerization, and we compare RJMCMC's performance with other traditional methods for model selection and propagation of model uncertainty, including Bayesian model averaging, BIC and DIC.References Neuman and Wierenga (2003). A Comprehensive Strategy of Hydrogeologic Modeling and Uncertainty Analysis for Nuclear Facilities and Sites. NUREG/CR-6805, Division of Systems Analysis and Regulatory Effectiveness Office of Nuclear Regulatory Research, U. S. Nuclear Regulatory Commission
Bayesian Dose-Response Modeling in Sparse Data
NASA Astrophysics Data System (ADS)
Kim, Steven B.
This book discusses Bayesian dose-response modeling in small samples applied to two different settings. The first setting is early phase clinical trials, and the second setting is toxicology studies in cancer risk assessment. In early phase clinical trials, experimental units are humans who are actual patients. Prior to a clinical trial, opinions from multiple subject area experts are generally more informative than the opinion of a single expert, but we may face a dilemma when they have disagreeing prior opinions. In this regard, we consider compromising the disagreement and compare two different approaches for making a decision. In addition to combining multiple opinions, we also address balancing two levels of ethics in early phase clinical trials. The first level is individual-level ethics which reflects the perspective of trial participants. The second level is population-level ethics which reflects the perspective of future patients. We extensively compare two existing statistical methods which focus on each perspective and propose a new method which balances the two conflicting perspectives. In toxicology studies, experimental units are living animals. Here we focus on a potential non-monotonic dose-response relationship which is known as hormesis. Briefly, hormesis is a phenomenon which can be characterized by a beneficial effect at low doses and a harmful effect at high doses. In cancer risk assessments, the estimation of a parameter, which is known as a benchmark dose, can be highly sensitive to a class of assumptions, monotonicity or hormesis. In this regard, we propose a robust approach which considers both monotonicity and hormesis as a possibility. In addition, We discuss statistical hypothesis testing for hormesis and consider various experimental designs for detecting hormesis based on Bayesian decision theory. Past experiments have not been optimally designed for testing for hormesis, and some Bayesian optimal designs may not be optimal under a
Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model
Bitzer, Sebastian; Park, Hame; Blankenburg, Felix; Kiebel, Stefan J.
2014-01-01
Behavioral data obtained with perceptual decision making experiments are typically analyzed with the drift-diffusion model. This parsimonious model accumulates noisy pieces of evidence toward a decision bound to explain the accuracy and reaction times of subjects. Recently, Bayesian models have been proposed to explain how the brain extracts information from noisy input as typically presented in perceptual decision making tasks. It has long been known that the drift-diffusion model is tightly linked with such functional Bayesian models but the precise relationship of the two mechanisms was never made explicit. Using a Bayesian model, we derived the equations which relate parameter values between these models. In practice we show that this equivalence is useful when fitting multi-subject data. We further show that the Bayesian model suggests different decision variables which all predict equal responses and discuss how these may be discriminated based on neural correlates of accumulated evidence. In addition, we discuss extensions to the Bayesian model which would be difficult to derive for the drift-diffusion model. We suggest that these and other extensions may be highly useful for deriving new experiments which test novel hypotheses. PMID:24616689
Semiparametric Bayesian local functional models for diffusion tensor tract statistics☆
Hua, Zhaowei; Dunson, David B.; Gilmore, John H.; Styner, Martin A.; Zhu, Hongtu
2012-01-01
We propose a semiparametric Bayesian local functional model (BFM) for the analysis of multiple diffusion properties (e.g., fractional anisotropy) along white matter fiber bundles with a set of covariates of interest, such as age and gender. BFM accounts for heterogeneity in the shape of the fiber bundle diffusion properties among subjects, while allowing the impact of the covariates to vary across subjects. A nonparametric Bayesian LPP2 prior facilitates global and local borrowings of information among subjects, while an infinite factor model flexibly represents low-dimensional structure. Local hypothesis testing and credible bands are developed to identify fiber segments, along which multiple diffusion properties are significantly associated with covariates of interest, while controlling for multiple comparisons. Moreover, BFM naturally group subjects into more homogeneous clusters. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFM. We apply BFM to investigate the development of white matter diffusivities along the splenium of the corpus callosum tract and the right internal capsule tract in a clinical study of neurodevelopment in new born infants. PMID:22732565
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Advanced REACH Tool: A Bayesian Model for Occupational Exposure Assessment
McNally, Kevin; Warren, Nicholas; Fransman, Wouter; Entink, Rinke Klein; Schinkel, Jody; van Tongeren, Martie; Cherrie, John W.; Kromhout, Hans; Schneider, Thomas; Tielemans, Erik
2014-01-01
This paper describes a Bayesian model for the assessment of inhalation exposures in an occupational setting; the methodology underpins a freely available web-based application for exposure assessment, the Advanced REACH Tool (ART). The ART is a higher tier exposure tool that combines disparate sources of information within a Bayesian statistical framework. The information is obtained from expert knowledge expressed in a calibrated mechanistic model of exposure assessment, data on inter- and intra-individual variability in exposures from the literature, and context-specific exposure measurements. The ART provides central estimates and credible intervals for different percentiles of the exposure distribution, for full-shift and long-term average exposures. The ART can produce exposure estimates in the absence of measurements, but the precision of the estimates improves as more data become available. The methodology presented in this paper is able to utilize partially analogous data, a novel approach designed to make efficient use of a sparsely populated measurement database although some additional research is still required before practical implementation. The methodology is demonstrated using two worked examples: an exposure to copper pyrithione in the spraying of antifouling paints and an exposure to ethyl acetate in shoe repair. PMID:24665110
Bayesian Energy Landscape Tilting: Towards Concordant Models of Molecular Ensembles
Beauchamp, Kyle A.; Pande, Vijay S.; Das, Rhiju
2014-01-01
Predicting biological structure has remained challenging for systems such as disordered proteins that take on myriad conformations. Hybrid simulation/experiment strategies have been undermined by difficulties in evaluating errors from computational model inaccuracies and data uncertainties. Building on recent proposals from maximum entropy theory and nonequilibrium thermodynamics, we address these issues through a Bayesian energy landscape tilting (BELT) scheme for computing Bayesian hyperensembles over conformational ensembles. BELT uses Markov chain Monte Carlo to directly sample maximum-entropy conformational ensembles consistent with a set of input experimental observables. To test this framework, we apply BELT to model trialanine, starting from disagreeing simulations with the force fields ff96, ff99, ff99sbnmr-ildn, CHARMM27, and OPLS-AA. BELT incorporation of limited chemical shift and 3J measurements gives convergent values of the peptide’s α, β, and PPII conformational populations in all cases. As a test of predictive power, all five BELT hyperensembles recover set-aside measurements not used in the fitting and report accurate errors, even when starting from highly inaccurate simulations. BELT’s principled framework thus enables practical predictions for complex biomolecular systems from discordant simulations and sparse data. PMID:24655513
Lemouzy, P.
1997-08-01
In field delineation phase, uncertainty in hydrocarbon reservoir descriptions is large. To quickly examine the impact of this uncertainty on production performance, it is necessary to evaluate a large number of descriptions in relation to possible production methods (well spacing, injection rate, etc.). The method of using coarse upscaled models was first proposed by Ballin. Unlike other methods (connectivity analysis, tracer simulations), it considers parameters such as PVT, well management, etc. After a detailed review of upscaling issues, applications to water-injection cases (either with balance or imbalance of production, with or without aquifer) and to depletion of an oil reservoir with aquifer coning are presented. Much more important than the method of permeability upscaling far from wells, the need of correct upscaling of numerical well representation is pointed out Methods are proposed to accurately represent fluids volumes in coarse models. Simple methods to upscale relative permeabilities, and methods to efficiently correct numerical dispersion are proposed. Good results are obtained for water injection. The coarse upscaling method allows the performance of sensitivity analyses on model parameters at a much lower CPU cost than comprehensive simulations. Models representing extreme behaviors can be easily distinguished. For depletion of an oil reservoir showing aquifer coning, however, the method did not work property. It is our opinion that further research is required for upscaling close to wells. We therefore recombined this method for practical use in the case of water injection.
Modeling the Climatology of Tornado Occurrence with Bayesian Inference
NASA Astrophysics Data System (ADS)
Cheng, Vincent Y. S.
Our mechanistic understanding of tornadic environments has significantly improved by the recent technological enhancements in the detection of tornadoes as well as the advances of numerical weather predictive modeling. Nonetheless, despite the decades of active research, prediction of tornado occurrence remains one of the most difficult problems in meteorological and climate science. In our efforts to develop predictive tools for tornado occurrence, there are a number of issues to overcome, such as the treatment of inconsistent tornado records, the consideration of suitable combination of atmospheric predictors, and the selection of appropriate resolution to accommodate the variability in time and space. In this dissertation, I address each of these topics by undertaking three empirical (statistical) modeling studies, where I examine the signature of different atmospheric factors influencing the tornado occurrence, the sampling biases in tornado observations, and the optimal spatiotemporal resolution for studying tornado occurrence. In the first study, I develop a novel Bayesian statistical framework to assess the probability of tornado occurrence in Canada, in which the sampling bias of tornado observations and the linkage between lightning climatology and tornadogenesis are considered. The results produced reasonable probability estimates of tornado occurrence for the under-sampled areas in the model domain. The same study also delineated the geographical variability in the lightning-tornado relationship across Canada. In the second study, I present a novel modeling framework to examine the relative importance of several key atmospheric variables (e.g., convective available potential energy, 0-3 km storm-relative helicity, 0-6 km bulk wind difference, 0-tropopause vertical wind shear) on tornado activity in North America. I found that the variable quantifying the updraft strength is more important during the warm season, whereas the effects of wind
Tests of Bayesian model selection techniques for gravitational wave astronomy
Cornish, Neil J.; Littenberg, Tyson B.
2007-10-15
The analysis of gravitational wave data involves many model selection problems. The most important example is the detection problem of selecting between the data being consistent with instrument noise alone, or instrument noise and a gravitational wave signal. The analysis of data from ground based gravitational wave detectors is mostly conducted using classical statistics, and methods such as the Neyman-Peterson criteria are used for model selection. Future space based detectors, such as the Laser Interferometer Space Antenna (LISA), are expected to produce rich data streams containing the signals from many millions of sources. Determining the number of sources that are resolvable, and the most appropriate description of each source poses a challenging model selection problem that may best be addressed in a Bayesian framework. An important class of LISA sources are the millions of low-mass binary systems within our own galaxy, tens of thousands of which will be detectable. Not only are the number of sources unknown, but so are the number of parameters required to model the waveforms. For example, a significant subset of the resolvable galactic binaries will exhibit orbital frequency evolution, while a smaller number will have measurable eccentricity. In the Bayesian approach to model selection one needs to compute the Bayes factor between competing models. Here we explore various methods for computing Bayes factors in the context of determining which galactic binaries have measurable frequency evolution. The methods explored include a reverse jump Markov chain Monte Carlo algorithm, Savage-Dickie density ratios, the Schwarz-Bayes information criterion, and the Laplace approximation to the model evidence. We find good agreement between all of the approaches.
NASA Astrophysics Data System (ADS)
Panagos, Panos; Ballabio, Cristiano; Borrelli, Pasquale; Meusburger, Katrin; Alewell, Christine
2015-04-01
Rainfall erosivity (R-factor) is among the 6 input factors in estimating soil erosion risk by using the empirical Revised Universal Soil Loss Equation (RUSLE). R-factor is a driving force for soil erosion modelling and potentially can be used in flood risk assessments, landslides susceptibility, post-fire damage assessment, application of agricultural management practices and climate change modelling. The rainfall erosivity is extremely difficult to model at large scale (national, European) due to lack of high temporal resolution precipitation data which cover long-time series. In most cases, R-factor is estimated based on empirical equations which take into account precipitation volume. The Rainfall Erosivity Database on the European Scale (REDES) is the output of an extensive data collection of high resolution precipitation data in the 28 Member States of the European Union plus Switzerland taking place during 2013-2014 in collaboration with national meteorological/environmental services. Due to different temporal resolutions of the data (5, 10, 15, 30, 60 minutes), conversion equations have been applied in order to homogenise the database at 30-minutes interval. The 1,541 stations included in REDES have been interpolated using the Gaussian Process Regression (GPR) model using as covariates the climatic data (monthly precipitation, monthly temperature, wettest/driest month) from WorldClim Database, Digital Elevation Model and latitude/longitude. GPR has been selected among other candidate models (GAM, Regression Kriging) due the best performance both in cross validation (R2=0.63) and in fitting dataset (R2=0.72). The highest uncertainty has been noticed in North-western Scotland, North Sweden and Finland due to limited number of stations in REDES. Also, in highlands such as Alpine arch and Pyrenees the diversity of environmental features forced relatively high uncertainty. The rainfall erosivity map of Europe available at 500m resolution plus the standard error
Geostatistical modeling of riparian forest microclimate and its implications for sampling
Eskelson, B.N.I.; Anderson, P.D.; Hagar, J.C.; Temesgen, H.
2011-01-01
Predictive models of microclimate under various site conditions in forested headwater stream - riparian areas are poorly developed, and sampling designs for characterizing underlying riparian microclimate gradients are sparse. We used riparian microclimate data collected at eight headwater streams in the Oregon Coast Range to compare ordinary kriging (OK), universal kriging (UK), and kriging with external drift (KED) for point prediction of mean maximum air temperature (Tair). Several topographic and forest structure characteristics were considered as site-specific parameters. Height above stream and distance to stream were the most important covariates in the KED models, which outperformed OK and UK in terms of root mean square error. Sample patterns were optimized based on the kriging variance and the weighted means of shortest distance criterion using the simulated annealing algorithm. The optimized sample patterns outperformed systematic sample patterns in terms of mean kriging variance mainly for small sample sizes. These findings suggest methods for increasing efficiency of microclimate monitoring in riparian areas.
Development of a Bayesian Belief Network Runway Incursion Model
NASA Technical Reports Server (NTRS)
Green, Lawrence L.
2014-01-01
In a previous paper, a statistical analysis of runway incursion (RI) events was conducted to ascertain their relevance to the top ten Technical Challenges (TC) of the National Aeronautics and Space Administration (NASA) Aviation Safety Program (AvSP). The study revealed connections to perhaps several of the AvSP top ten TC. That data also identified several primary causes and contributing factors for RI events that served as the basis for developing a system-level Bayesian Belief Network (BBN) model for RI events. The system-level BBN model will allow NASA to generically model the causes of RI events and to assess the effectiveness of technology products being developed under NASA funding. These products are intended to reduce the frequency of RI events in particular, and to improve runway safety in general. The development, structure and assessment of that BBN for RI events by a Subject Matter Expert panel are documented in this paper.
Fast Bayesian Inference in Dirichlet Process Mixture Models.
Wang, Lianming; Dunson, David B
2011-01-01
There has been increasing interest in applying Bayesian nonparametric methods in large samples and high dimensions. As Markov chain Monte Carlo (MCMC) algorithms are often infeasible, there is a pressing need for much faster algorithms. This article proposes a fast approach for inference in Dirichlet process mixture (DPM) models. Viewing the partitioning of subjects into clusters as a model selection problem, we propose a sequential greedy search algorithm for selecting the partition. Then, when conjugate priors are chosen, the resulting posterior conditionally on the selected partition is available in closed form. This approach allows testing of parametric models versus nonparametric alternatives based on Bayes factors. We evaluate the approach using simulation studies and compare it with four other fast nonparametric methods in the literature. We apply the proposed approach to three datasets including one from a large epidemiologic study. Matlab codes for the simulation and data analyses using the proposed approach are available online in the supplemental materials.
Aggregated Residential Load Modeling Using Dynamic Bayesian Networks
Vlachopoulou, Maria; Chin, George; Fuller, Jason C.; Lu, Shuai
2014-09-28
Abstract—It is already obvious that the future power grid will have to address higher demand for power and energy, and to incorporate renewable resources of different energy generation patterns. Demand response (DR) schemes could successfully be used to manage and balance power supply and demand under operating conditions of the future power grid. To achieve that, more advanced tools for DR management of operations and planning are necessary that can estimate the available capacity from DR resources. In this research, a Dynamic Bayesian Network (DBN) is derived, trained, and tested that can model aggregated load of Heating, Ventilation, and Air Conditioning (HVAC) systems. DBNs can provide flexible and powerful tools for both operations and planing, due to their unique analytical capabilities. The DBN model accuracy and flexibility of use is demonstrated by testing the model under different operational scenarios.
Bayesian Inference for Duplication–Mutation with Complementarity Network Models
Persing, Adam; Beskos, Alexandros; Heine, Kari; De Iorio, Maria
2015-01-01
Abstract We observe an undirected graph G without multiple edges and self-loops, which is to represent a protein–protein interaction (PPI) network. We assume that G evolved under the duplication–mutation with complementarity (DMC) model from a seed graph, G0, and we also observe the binary forest Γ that represents the duplication history of G. A posterior density for the DMC model parameters is established, and we outline a sampling strategy by which one can perform Bayesian inference; that sampling strategy employs a particle marginal Metropolis–Hastings (PMMH) algorithm. We test our methodology on numerical examples to demonstrate a high accuracy and precision in the inference of the DMC model's mutation and homodimerization parameters. PMID:26355682
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
Advances in Bayesian Model Based Clustering Using Particle Learning
Merl, D M
2009-11-19
Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original
Beck-Wörner, Christian; Raso, Giovanna; Vounatsou, Penelope; N'Goran, Eliézer K; Rigo, Gergely; Parlow, Eberhard; Utzinger, Jürg
2007-05-01
An important epidemiologic feature of schistosomiasis is the focal distribution of the disease. Thus, the identification of high-risk communities is an essential first step for targeting interventions in an efficient and cost-effective manner. We used a remotely-sensed digital elevation model (DEM), derived hydrologic features (i.e., stream order, and catchment area), and fitted Bayesian geostatistical models to assess associations between environmental factors and infection with Schistosoma mansoni among more than 4,000 school children from the region of Man in western Côte d'Ivoire. At the unit of the school, we found significant correlations between the infection prevalence of S. mansoni and stream order of the nearest river, water catchment area, and altitude. In conclusion, the use of a freely available 90 m high-resolution DEM, geographic information system applications, and Bayesian spatial modeling facilitates risk prediction for S. mansoni, and is a powerful approach for risk profiling of other neglected tropical diseases that are pervasive in the developing world.
Road network safety evaluation using Bayesian hierarchical joint model.
Wang, Jie; Huang, Helai
2016-05-01
Safety and efficiency are commonly regarded as two significant performance indicators of transportation systems. In practice, road network planning has focused on road capacity and transport efficiency whereas the safety level of a road network has received little attention in the planning stage. This study develops a Bayesian hierarchical joint model for road network safety evaluation to help planners take traffic safety into account when planning a road network. The proposed model establishes relationships between road network risk and micro-level variables related to road entities and traffic volume, as well as socioeconomic, trip generation and network density variables at macro level which are generally used for long term transportation plans. In addition, network spatial correlation between intersections and their connected road segments is also considered in the model. A road network is elaborately selected in order to compare the proposed hierarchical joint model with a previous joint model and a negative binomial model. According to the results of the model comparison, the hierarchical joint model outperforms the joint model and negative binomial model in terms of the goodness-of-fit and predictive performance, which indicates the reasonableness of considering the hierarchical data structure in crash prediction and analysis. Moreover, both random effects at the TAZ level and the spatial correlation between intersections and their adjacent segments are found to be significant, supporting the employment of the hierarchical joint model as an alternative in road-network-level safety modeling as well.
NASA Astrophysics Data System (ADS)
Liao, Chujiang
2015-08-01
On different degrees of desertification land, there exists different vegetation communities, and spatial structure differences are obvious among different vegetation communities. This study implemented variogram calculation using typical sample selected from the image, adopting a common global optimization method to fit them into the spherical model. The results showed that the difference is obvious among different vegetation communities for the sill and range, such as, the sill and range are smaller for sample variogram of Artemisia halodendron and Salix flavida community than that of Artemisia halodendron and Caragana microphylla community, and the range for sample variogram of Agriophyllum arenarium community is bigger than that of Artemisia halodendron and Salix flavida community, but smaller than that of Artemisia halodendron and Caragana microphylla community. Incorporating the difference of the spatial structure characterization into the vegetation classification can improve sample separation, thereby increasing the overall classification accuracy.
Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837
ERIC Educational Resources Information Center
Levy, Roy
2014-01-01
Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…
Bayesian Framework for Water Quality Model Uncertainty Estimation and Risk Management
A formal Bayesian methodology is presented for integrated model calibration and risk-based water quality management using Bayesian Monte Carlo simulation and maximum likelihood estimation (BMCML). The primary focus is on lucid integration of model calibration with risk-based wat...
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
Nonparametric Bayesian inference of the microcanonical stochastic block model
NASA Astrophysics Data System (ADS)
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
A Semiparametric Bayesian Model for Detecting Synchrony Among Multiple Neurons
Shahbaba, Babak; Zhou, Bo; Lan, Shiwei; Ombao, Hernando; Moorman, David; Behseta, Sam
2015-01-01
We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their co-firing (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1’s (spike) and 0’s (silence) for each neuron is modeled using the logistic function of a continuous latent variable with a Gaussian process prior. For multiple neurons, the corresponding marginal distributions are coupled to their joint probability distribution using a parametric copula model. The advantages of our approach are as follows: the nonparametric component (i.e., the Gaussian process model) provides a flexible framework for modeling the underlying firing rates; the parametric component (i.e., the copula model) allows us to make inference regarding both contemporaneous and lagged relationships among neurons; using the copula model, we construct multivariate probabilistic models by separating the modeling of univariate marginal distributions from the modeling of dependence structure among variables; our method is easy to implement using a computationally efficient sampling algorithm that can be easily extended to high dimensional problems. Using simulated data, we show that our approach could correctly capture temporal dependencies in firing rates and identify synchronous neurons. We also apply our model to spike train data obtained from prefrontal cortical areas. PMID:24922500
NASA Astrophysics Data System (ADS)
Popova, Olga H.
Dental hygiene students must embody effective critical thinking skills in order to provide evidence-based comprehensive patient care. The problem addressed in this study it was not known if and to what extent concept mapping and reflective journaling activities embedded in a curriculum over a 4-week period, impacted the critical thinking skills of 22 first and second-year dental hygiene students attending a community college in the Midwest. The overarching research questions were: what is the effect of concept mapping, and what is the effect of reflective journaling on the level of critical thinking skills of first and second year dental hygiene students? This quantitative study employed a quasi-experimental, pretest-posttest design. Analysis of Covariance (ANCOVA) assessed students' mean scores of critical thinking on the California Critical Thinking Skills Test (CCTST) pretest and posttest for the concept mapping and reflective journaling treatment groups. The results of the study found an increase in CCTST posttest scores with the use of both concept mapping and reflective journaling. However, the increase in scores was not found to be statistically significant. Hence, this study identified concept mapping using Ausubel's assimilation theory and reflective journaling incorporating Johns's revision of Carper's patterns of knowing as potential instructional strategies and theoretical models to enhance undergraduate students' critical thinking skills. More research is required in this area to draw further conclusions. Keywords: Critical thinking, critical thinking development, critical thinking skills, instructional strategies, concept mapping, reflective journaling, dental hygiene, college students.
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error in addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.
Modeling the user preference on broadcasting contents using Bayesian networks
NASA Astrophysics Data System (ADS)
Kang, Sanggil; Lim, Jeongyeon; Kim, Munchurl
2004-01-01
In this paper, we introduce a new supervised learning method of a Bayesian network for user preference models. Unlike other preference models, our method traces the trend of a user preference as time passes. It allows us to do online learning so we do not need the exhaustive data collection. The tracing of the trend can be done by modifying the frequency of attributes in order to force the old preference to be correlated with the current preference under the assumption that the current preference is correlated with the near future preference. The objective of our learning method is to force the mutual information to be reinforced by modifying the frequency of the attributes in the old preference by providing weights to the attributes. With developing mathematical derivation of our learning method, experimental results on the learning and reasoning performance on TV genre preference using a real set of TV program watching history data.
A Bayesian model of context-sensitive value attribution.
Rigoli, Francesco; Friston, Karl J; Martinelli, Cristina; Selaković, Mirjana; Shergill, Sukhwinder S; Dolan, Raymond J
2016-06-22
Substantial evidence indicates that incentive value depends on an anticipation of rewards within a given context. However, the computations underlying this context sensitivity remain unknown. To address this question, we introduce a normative (Bayesian) account of how rewards map to incentive values. This assumes that the brain inverts a model of how rewards are generated. Key features of our account include (i) an influence of prior beliefs about the context in which rewards are delivered (weighted by their reliability in a Bayes-optimal fashion), (ii) the notion that incentive values correspond to precision-weighted prediction errors, (iii) and contextual information unfolding at different hierarchical levels. This formulation implies that incentive value is intrinsically context-dependent. We provide empirical support for this model by showing that incentive value is influenced by context variability and by hierarchically nested contexts. The perspective we introduce generates new empirical predictions that might help explaining psychopathologies, such as addiction.
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error inmore » addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.« less
Mixed-point geostatistical simulation: A combination of two- and multiple-point geostatistics
NASA Astrophysics Data System (ADS)
Cordua, Knud Skou; Hansen, Thomas Mejer; Gulbrandsen, Mats Lundh; Barnes, Christophe; Mosegaard, Klaus
2016-09-01
Multiple-point-based geostatistical methods are used to model complex geological structures. However, a training image containing the characteristic patterns of the Earth model has to be provided. If no training image is available, two-point (i.e., covariance-based) geostatistical methods are typically applied instead because these methods provide fewer constraints on the Earth model. This study is motivated by the case where 1-D vertical training images are available through borehole logs, whereas little or no information about horizontal dependencies exists. This problem is solved by developing theory that makes it possible to combine information from multiple- and two-point geostatistics for different directions, leading to a mixed-point geostatistical model. An example of combining information from the multiple-point-based single normal equation simulation algorithm and two-point-based sequential indicator simulation algorithm is provided. The mixed-point geostatistical model is used for conditional sequential simulation based on vertical training images from five borehole logs and a range parameter describing the horizontal dependencies.
Wang, Meng; Sampson, Paul D; Hu, Jianlin; Kleeman, Michael; Keller, Joshua P; Olives, Casey; Szpiro, Adam A; Vedal, Sverre; Kaufman, Joel D
2016-05-17
Assessments of long-term air pollution exposure in population studies have commonly employed land-use regression (LUR) or chemical transport modeling (CTM) techniques. Attempts to incorporate both approaches in one modeling framework are challenging. We present a novel geostatistical modeling framework, incorporating CTM predictions into a spatiotemporal LUR model with spatial smoothing to estimate spatiotemporal variability of ozone (O3) and particulate matter with diameter less than 2.5 μm (PM2.5) from 2000 to 2008 in the Los Angeles Basin. The observations include over 9 years' data from more than 20 routine monitoring sites and specific monitoring data at over 100 locations to provide more comprehensive spatial coverage of air pollutants. Our composite modeling approach outperforms separate CTM and LUR models in terms of root-mean-square error (RMSE) assessed by 10-fold cross-validation in both temporal and spatial dimensions, with larger improvement in the accuracy of predictions for O3 (RMSE [ppb] for CTM, 6.6; LUR, 4.6; composite, 3.6) than for PM2.5 (RMSE [μg/m(3)] CTM: 13.7, LUR: 3.2, composite: 3.1). Our study highlights the opportunity for future exposure assessment to make use of readily available spatiotemporal modeling methods and auxiliary gridded data that takes chemical reaction processes into account to improve the accuracy of predictions in a single spatiotemporal modeling framework.
Diagnosing Hybrid Systems: a Bayesian Model Selection Approach
NASA Technical Reports Server (NTRS)
McIlraith, Sheila A.
2005-01-01
In this paper we examine the problem of monitoring and diagnosing noisy complex dynamical systems that are modeled as hybrid systems-models of continuous behavior, interleaved by discrete transitions. In particular, we examine continuous systems with embedded supervisory controllers that experience abrupt, partial or full failure of component devices. Building on our previous work in this area (MBCG99;MBCG00), our specific focus in this paper ins on the mathematical formulation of the hybrid monitoring and diagnosis task as a Bayesian model tracking algorithm. The nonlinear dynamics of many hybrid systems present challenges to probabilistic tracking. Further, probabilistic tracking of a system for the purposes of diagnosis is problematic because the models of the system corresponding to failure modes are numerous and generally very unlikely. To focus tracking on these unlikely models and to reduce the number of potential models under consideration, we exploit logic-based techniques for qualitative model-based diagnosis to conjecture a limited initial set of consistent candidate models. In this paper we discuss alternative tracking techniques that are relevant to different classes of hybrid systems, focusing specifically on a method for tracking multiple models of nonlinear behavior simultaneously using factored sampling and conditional density propagation. To illustrate and motivate the approach described in this paper we examine the problem of monitoring and diganosing NASA's Sprint AERCam, a small spherical robotic camera unit with 12 thrusters that enable both linear and rotational motion.
A Bayesian hierarchical model for categorical data with nonignorable nonresponse.
Green, Paul E; Park, Taesung
2003-12-01
Log-linear models have been shown to be useful for smoothing contingency tables when categorical outcomes are subject to nonignorable nonresponse. A log-linear model can be fit to an augmented data table that includes an indicator variable designating whether subjects are respondents or nonrespondents. Maximum likelihood estimates calculated from the augmented data table are known to suffer from instability due to boundary solutions. Park and Brown (1994, Journal of the American Statistical Association 89, 44-52) and Park (1998, Biometrics 54, 1579-1590) developed empirical Bayes models that tend to smooth estimates away from the boundary. In those approaches, estimates for nonrespondents were calculated using an EM algorithm by maximizing a posterior distribution. As an extension of their earlier work, we develop a Bayesian hierarchical model that incorporates a log-linear model in the prior specification. In addition, due to uncertainty in the variable selection process associated with just one log-linear model, we simultaneously consider a finite number of models using a stochastic search variable selection (SSVS) procedure due to George and McCulloch (1997, Statistica Sinica 7, 339-373). The integration of the SSVS procedure into a Markov chain Monte Carlo (MCMC) sampler is straightforward, and leads to estimates of cell frequencies for the nonrespondents that are averages resulting from several log-linear models. The methods are demonstrated with a data example involving serum creatinine levels of patients who survived renal transplants. A simulation study is conducted to investigate properties of the model.
Bayesian network models for error detection in radiotherapy plans.
Kalet, Alan M; Gennari, John H; Ford, Eric C; Phillips, Mark H
2015-04-07
The purpose of this study is to design and develop a probabilistic network for detecting errors in radiotherapy plans for use at the time of initial plan verification. Our group has initiated a multi-pronged approach to reduce these errors. We report on our development of Bayesian models of radiotherapy plans. Bayesian networks consist of joint probability distributions that define the probability of one event, given some set of other known information. Using the networks, we find the probability of obtaining certain radiotherapy parameters, given a set of initial clinical information. A low probability in a propagated network then corresponds to potential errors to be flagged for investigation. To build our networks we first interviewed medical physicists and other domain experts to identify the relevant radiotherapy concepts and their associated interdependencies and to construct a network topology. Next, to populate the network's conditional probability tables, we used the Hugin Expert software to learn parameter distributions from a subset of de-identified data derived from a radiation oncology based clinical information database system. These data represent 4990 unique prescription cases over a 5 year period. Under test case scenarios with approximately 1.5% introduced error rates, network performance produced areas under the ROC curve of 0.88, 0.98, and 0.89 for the lung, brain and female breast cancer error detection networks, respectively. Comparison of the brain network to human experts performance (AUC of 0.90 ± 0.01) shows the Bayes network model performs better than domain experts under the same test conditions. Our results demonstrate the feasibility and effectiveness of comprehensive probabilistic models as part of decision support systems for improved detection of errors in initial radiotherapy plan verification procedures.
Bayesian network models for error detection in radiotherapy plans
NASA Astrophysics Data System (ADS)
Kalet, Alan M.; Gennari, John H.; Ford, Eric C.; Phillips, Mark H.
2015-04-01
The purpose of this study is to design and develop a probabilistic network for detecting errors in radiotherapy plans for use at the time of initial plan verification. Our group has initiated a multi-pronged approach to reduce these errors. We report on our development of Bayesian models of radiotherapy plans. Bayesian networks consist of joint probability distributions that define the probability of one event, given some set of other known information. Using the networks, we find the probability of obtaining certain radiotherapy parameters, given a set of initial clinical information. A low probability in a propagated network then corresponds to potential errors to be flagged for investigation. To build our networks we first interviewed medical physicists and other domain experts to identify the relevant radiotherapy concepts and their associated interdependencies and to construct a network topology. Next, to populate the network’s conditional probability tables, we used the Hugin Expert software to learn parameter distributions from a subset of de-identified data derived from a radiation oncology based clinical information database system. These data represent 4990 unique prescription cases over a 5 year period. Under test case scenarios with approximately 1.5% introduced error rates, network performance produced areas under the ROC curve of 0.88, 0.98, and 0.89 for the lung, brain and female breast cancer error detection networks, respectively. Comparison of the brain network to human experts performance (AUC of 0.90 ± 0.01) shows the Bayes network model performs better than domain experts under the same test conditions. Our results demonstrate the feasibility and effectiveness of comprehensive probabilistic models as part of decision support systems for improved detection of errors in initial radiotherapy plan verification procedures.
A Bayesian Attractor Model for Perceptual Decision Making
Bitzer, Sebastian; Bruineberg, Jelle; Kiebel, Stefan J.
2015-01-01
Even for simple perceptual decisions, the mechanisms that the brain employs are still under debate. Although current consensus states that the brain accumulates evidence extracted from noisy sensory information, open questions remain about how this simple model relates to other perceptual phenomena such as flexibility in decisions, decision-dependent modulation of sensory gain, or confidence about a decision. We propose a novel approach of how perceptual decisions are made by combining two influential formalisms into a new model. Specifically, we embed an attractor model of decision making into a probabilistic framework that models decision making as Bayesian inference. We show that the new model can explain decision making behaviour by fitting it to experimental data. In addition, the new model combines for the first time three important features: First, the model can update decisions in response to switches in the underlying stimulus. Second, the probabilistic formulation accounts for top-down effects that may explain recent experimental findings of decision-related gain modulation of sensory neurons. Finally, the model computes an explicit measure of confidence which we relate to recent experimental evidence for confidence computations in perceptual decision tasks. PMID:26267143
Bayesian analysis of a reduced-form air quality model.
Foley, Kristen M; Reich, Brian J; Napelenok, Sergey L
2012-07-17
Numerical air quality models are being used for assessing emission control strategies for improving ambient pollution levels across the globe. This paper applies probabilistic modeling to evaluate the effectiveness of emission reduction scenarios aimed at lowering ground-level ozone concentrations. A Bayesian hierarchical model is used to combine air quality model output and monitoring data in order to characterize the impact of emissions reductions while accounting for different degrees of uncertainty in the modeled emissions inputs. The probabilistic model predictions are weighted based on population density in order to better quantify the societal benefits/disbenefits of four hypothetical emission reduction scenarios in which domain-wide NO(x) emissions from various sectors are reduced individually and then simultaneously. Cross validation analysis shows the statistical model performs well compared to observed ozone levels. Accounting for the variability and uncertainty in the emissions and atmospheric systems being modeled is shown to impact how emission reduction scenarios would be ranked, compared to standard methodology.
A Bayesian modelling framework for tornado occurrences in North America
NASA Astrophysics Data System (ADS)
Cheng, Vincent Y. S.; Arhonditsis, George B.; Sills, David M. L.; Gough, William A.; Auld, Heather
2015-03-01
Tornadoes represent one of nature’s most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year.
Designing and testing inflationary models with Bayesian networks
Price, Layne C.; Peiris, Hiranya V.; Frazer, Jonathan; Easther, Richard E-mail: h.peiris@ucl.ac.uk E-mail: r.easther@auckland.ac.nz
2016-02-01
Even simple inflationary scenarios have many free parameters. Beyond the variables appearing in the inflationary action, these include dynamical initial conditions, the number of fields, and couplings to other sectors. These quantities are often ignored but cosmological observables can depend on the unknown parameters. We use Bayesian networks to account for a large set of inflationary parameters, deriving generative models for the primordial spectra that are conditioned on a hierarchical set of prior probabilities describing the initial conditions, reheating physics, and other free parameters. We use N{sub f}-quadratic inflation as an illustrative example, finding that the number of e-folds N{sub *} between horizon exit for the pivot scale and the end of inflation is typically the most important parameter, even when the number of fields, their masses and initial conditions are unknown, along with possible conditional dependencies between these parameters.
A Bayesian modelling framework for tornado occurrences in North America.
Cheng, Vincent Y S; Arhonditsis, George B; Sills, David M L; Gough, William A; Auld, Heather
2015-03-25
Tornadoes represent one of nature's most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year.
Reginal Frequency Analysis Based on Scaling Properties and Bayesian Models
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; Lee, Jeong-Ju; Moon, Young-Il
2010-05-01
A regional frequency analysis based on Hierarchical Bayesian Network (HBN) and scaling theory was developmed. Many recording rain gauges over South Korea were used for the analysis. First, a scaling approach combined with extreme distribution was employed to derive regional formula for frequency analysis. Second, HBN model was used to represent additional information about the regional structure of the scaling parameters, especially the location parameter and shape parameter. The location and shape parameters of the extreme distribution were estimated by utilizing scaling properties in a regression framework, and the scaling parameters linking the parameters (location and shape) to various duration times were simultaneously estimated. It was found that the regional frequency analysis combined with HBN and scaling properties show promising results in terms of establishing regional IDF curves.
Toward diagnostic model calibration and evaluation: Approximate Bayesian computation
NASA Astrophysics Data System (ADS)
Vrugt, Jasper A.; Sadegh, Mojtaba
2013-07-01
The ever increasing pace of computational power, along with continued advances in measurement technologies and improvements in process understanding has stimulated the development of increasingly complex hydrologic models that simulate soil moisture flow, groundwater recharge, surface runoff, root water uptake, and river discharge at different spatial and temporal scales. Reconciling these high-order system models with perpetually larger volumes of field data is becoming more and more difficult, particularly because classical likelihood-based fitting methods lack the power to detect and pinpoint deficiencies in the model structure. Gupta et al. (2008) has recently proposed steps (amongst others) toward the development of a more robust and powerful method of model evaluation. Their diagnostic approach uses signature behaviors and patterns observed in the input-output data to illuminate to what degree a representation of the real world has been adequately achieved and how the model should be improved for the purpose of learning and scientific discovery. In this paper, we introduce approximate Bayesian computation (ABC) as a vehicle for diagnostic model evaluation. This statistical methodology relaxes the need for an explicit likelihood function in favor of one or multiple different summary statistics rooted in hydrologic theory that together have a clearer and more compelling diagnostic power than some average measure of the size of the error residuals. Two illustrative case studies are used to demonstrate that ABC is relatively easy to implement, and readily employs signature based indices to analyze and pinpoint which part of the model is malfunctioning and in need of further improvement.
Bridging groundwater models and decision support with a Bayesian network
Fienen, Michael N.; Masterson, John P.; Plant, Nathaniel G.; Gutierrez, Benjamin T.; Thieler, E. Robert
2013-01-01
Resource managers need to make decisions to plan for future environmental conditions, particularly sea level rise, in the face of substantial uncertainty. Many interacting processes factor in to the decisions they face. Advances in process models and the quantification of uncertainty have made models a valuable tool for this purpose. Long-simulation runtimes and, often, numerical instability make linking process models impractical in many cases. A method for emulating the important connections between model input and forecasts, while propagating uncertainty, has the potential to provide a bridge between complicated numerical process models and the efficiency and stability needed for decision making. We explore this using a Bayesian network (BN) to emulate a groundwater flow model. We expand on previous approaches to validating a BN by calculating forecasting skill using cross validation of a groundwater model of Assateague Island in Virginia and Maryland, USA. This BN emulation was shown to capture the important groundwater-flow characteristics and uncertainty of the groundwater system because of its connection to island morphology and sea level. Forecast power metrics associated with the validation of multiple alternative BN designs guided the selection of an optimal level of BN complexity. Assateague island is an ideal test case for exploring a forecasting tool based on current conditions because the unique hydrogeomorphological variability of the island includes a range of settings indicative of past, current, and future conditions. The resulting BN is a valuable tool for exploring the response of groundwater conditions to sea level rise in decision support.
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
ERIC Educational Resources Information Center
West, Patti; Rutstein, Daisy Wise; Mislevy, Robert J.; Liu, Junhui; Choi, Younyoung; Levy, Roy; Crawford, Aaron; DiCerbo, Kristen E.; Chappel, Kristina; Behrens, John T.
2010-01-01
A major issue in the study of learning progressions (LPs) is linking student performance on assessment tasks to the progressions. This report describes the challenges faced in making this linkage using Bayesian networks to model LPs in the field of computer networking. The ideas are illustrated with exemplar Bayesian networks built on Cisco…
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan; Tang, Nian-Sheng
2007-01-01
The analysis of interaction among latent variables has received much attention. This article introduces a Bayesian approach to analyze a general structural equation model that accommodates the general nonlinear terms of latent variables and covariates. This approach produces a Bayesian estimate that has the same statistical optimal properties as a…
Bayesian Multiscale Modeling of Closed Curves in Point Clouds.
Gu, Kelvin; Pati, Debdeep; Dunson, David B
2014-10-01
Modeling object boundaries based on image or point cloud data is frequently necessary in medical and scientific applications ranging from detecting tumor contours for targeted radiation therapy, to the classification of organisms based on their structural information. In low-contrast images or sparse and noisy point clouds, there is often insufficient data to recover local segments of the boundary in isolation. Thus, it becomes critical to model the entire boundary in the form of a closed curve. To achieve this, we develop a Bayesian hierarchical model that expresses highly diverse 2D objects in the form of closed curves. The model is based on a novel multiscale deformation process. By relating multiple objects through a hierarchical formulation, we can successfully recover missing boundaries by borrowing structural information from similar objects at the appropriate scale. Furthermore, the model's latent parameters help interpret the population, indicating dimensions of significant structural variability and also specifying a 'central curve' that summarizes the collection. Theoretical properties of our prior are studied in specific cases and efficient Markov chain Monte Carlo methods are developed, evaluated through simulation examples and applied to panorex teeth images for modeling teeth contours and also to a brain tumor contour detection problem.
A Bayesian hierarchical model for wind gust prediction
NASA Astrophysics Data System (ADS)
Friederichs, Petra; Oesting, Marco; Schlather, Martin
2014-05-01
A postprocessing method for ensemble wind gust forecasts given by a mesoscale limited area numerical weather prediction (NWP) model is presented, which is based on extreme value theory. A process layer for the parameters of a generalized extreme value distribution (GEV) is introduced using a Bayesian hierarchical model (BHM). Incorporating the information of the COMSO-DE forecasts, the process parameters model the spatial response surfaces of the GEV parameters as Gaussian random fields. The spatial BHM provides area wide forecasts of wind gusts in terms of a conditional GEV. It models the marginal distribution of the spatial gust process and provides not only forecasts of the conditional GEV at locations without observations, but also uncertainty information about the estimates. A disadvantages of BHM model is that it assumes conditional independent observations. In order to incorporate the dependence between gusts at neighboring locations as well as the spatial random fields of observed and forecasted maximal wind gusts, we propose to model them jointly by a bivariate Brown-Resnick process.
Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach
Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.
2005-01-01
A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.
Bayesian calibration of the Community Land Model using surrogates
Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Swiler, Laura Painton
2014-02-01
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.
Bayesian Calibration of the Community Land Model using Surrogates
Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Sargsyan, K.; Swiler, Laura P.
2015-01-01
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditioned on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that accurate surrogate models can be created for CLM in most cases. The posterior distributions lead to better prediction than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters’ distributions significantly. The structural error model reveals a correlation time-scale which can potentially be used to identify physical processes that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.
Bayesian Gaussian Mixture Models for High-Density Genotyping Arrays
Sabatti, Chiara; Lange, Kenneth
2011-01-01
Affymetrix's SNP (single-nucleotide polymorphism) genotyping chips have increased the scope and decreased the cost of gene-mapping studies. Because each SNP is queried by multiple DNA probes, the chips present interesting challenges in genotype calling. Traditional clustering methods distinguish the three genotypes of an SNP fairly well given a large enough sample of unrelated individuals or a training sample of known genotypes. This article describes our attempt to improve genotype calling by constructing Gaussian mixture models with empirically derived priors. The priors stabilize parameter estimation and borrow information collectively gathered on tens of thousands of SNPs. When data from related family members are available, our models capture the correlations in signals between relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We integrate the genotype-calling model with pedigree analysis and examine a sequence of symmetry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel mathematical issues of parameterization. Using the Bayesian information criterion, we select the best combination of symmetry assumptions. Compared to Affymetrix's software, our model leads to a reduction in no-calls with little sacrifice in overall calling accuracy. PMID:21572926
Bayesian Multiscale Modeling of Closed Curves in Point Clouds
Gu, Kelvin; Pati, Debdeep; Dunson, David B.
2014-01-01
Modeling object boundaries based on image or point cloud data is frequently necessary in medical and scientific applications ranging from detecting tumor contours for targeted radiation therapy, to the classification of organisms based on their structural information. In low-contrast images or sparse and noisy point clouds, there is often insufficient data to recover local segments of the boundary in isolation. Thus, it becomes critical to model the entire boundary in the form of a closed curve. To achieve this, we develop a Bayesian hierarchical model that expresses highly diverse 2D objects in the form of closed curves. The model is based on a novel multiscale deformation process. By relating multiple objects through a hierarchical formulation, we can successfully recover missing boundaries by borrowing structural information from similar objects at the appropriate scale. Furthermore, the model’s latent parameters help interpret the population, indicating dimensions of significant structural variability and also specifying a ‘central curve’ that summarizes the collection. Theoretical properties of our prior are studied in specific cases and efficient Markov chain Monte Carlo methods are developed, evaluated through simulation examples and applied to panorex teeth images for modeling teeth contours and also to a brain tumor contour detection problem. PMID:25544786
Ensemble bayesian model averaging using markov chain Monte Carlo sampling
Vrugt, Jasper A; Diks, Cees G H; Clark, Martyn P
2008-01-01
Bayesian model averaging (BMA) has recently been proposed as a statistical method to calibrate forecast ensembles from numerical weather models. Successful implementation of BMA however, requires accurate estimates of the weights and variances of the individual competing models in the ensemble. In their seminal paper (Raftery etal. Mon Weather Rev 133: 1155-1174, 2(05)) has recommended the Expectation-Maximization (EM) algorithm for BMA model training, even though global convergence of this algorithm cannot be guaranteed. In this paper, we compare the performance of the EM algorithm and the recently developed Differential Evolution Adaptive Metropolis (DREAM) Markov Chain Monte Carlo (MCMC) algorithm for estimating the BMA weights and variances. Simulation experiments using 48-hour ensemble data of surface temperature and multi-model stream-flow forecasts show that both methods produce similar results, and that their performance is unaffected by the length of the training data set. However, MCMC simulation with DREAM is capable of efficiently handling a wide variety of BMA predictive distributions, and provides useful information about the uncertainty associated with the estimated BMA weights and variances.
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity, but development of predictive MoA classification models in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity MoA using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the dataset of 1098 chemicals with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2%. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blank
A Bayesian Model of Category-Specific Emotional Brain Responses
Wager, Tor D.; Kang, Jian; Johnson, Timothy D.; Nichols, Thomas E.; Satpute, Ajay B.; Barrett, Lisa Feldman
2015-01-01
Understanding emotion is critical for a science of healthy and disordered brain function, but the neurophysiological basis of emotional experience is still poorly understood. We analyzed human brain activity patterns from 148 studies of emotion categories (2159 total participants) using a novel hierarchical Bayesian model. The model allowed us to classify which of five categories—fear, anger, disgust, sadness, or happiness—is engaged by a study with 66% accuracy (43-86% across categories). Analyses of the activity patterns encoded in the model revealed that each emotion category is associated with unique, prototypical patterns of activity across multiple brain systems including the cortex, thalamus, amygdala, and other structures. The results indicate that emotion categories are not contained within any one region or system, but are represented as configurations across multiple brain networks. The model provides a precise summary of the prototypical patterns for each emotion category, and demonstrates that a sufficient characterization of emotion categories relies on (a) differential patterns of involvement in neocortical systems that differ between humans and other species, and (b) distinctive patterns of cortical-subcortical interactions. Thus, these findings are incompatible with several contemporary theories of emotion, including those that emphasize emotion-dedicated brain systems and those that propose emotion is localized primarily in subcortical activity. They are consistent with componential and constructionist views, which propose that emotions are differentiated by a combination of perceptual, mnemonic, prospective, and motivational elements. Such brain-based models of emotion provide a foundation for new translational and clinical approaches. PMID:25853490
Bayesian Belief Networks Approach for Modeling Irrigation Behavior
NASA Astrophysics Data System (ADS)
Andriyas, S.; McKee, M.
2012-12-01
Canal operators need information to manage water deliveries to irrigators. Short-term irrigation demand forecasts can potentially valuable information for a canal operator who must manage an on-demand system. Such forecasts could be generated by using information about the decision-making processes of irrigators. Bayesian models of irrigation behavior can provide insight into the likely criteria which farmers use to make irrigation decisions. This paper develops a Bayesian belief network (BBN) to learn irrigation decision-making behavior of farmers and utilizes the resulting model to make forecasts of future irrigation decisions based on factor interaction and posterior probabilities. Models for studying irrigation behavior have been rarely explored in the past. The model discussed here was built from a combination of data about biotic, climatic, and edaphic conditions under which observed irrigation decisions were made. The paper includes a case study using data collected from the Canal B region of the Sevier River, near Delta, Utah. Alfalfa, barley and corn are the main crops of the location. The model has been tested with a portion of the data to affirm the model predictive capabilities. Irrigation rules were deduced in the process of learning and verified in the testing phase. It was found that most of the farmers used consistent rules throughout all years and across different types of crops. Soil moisture stress, which indicates the level of water available to the plant in the soil profile, was found to be one of the most significant likely driving forces for irrigation. Irrigations appeared to be triggered by a farmer's perception of soil stress, or by a perception of combined factors such as information about a neighbor irrigating or an apparent preference to irrigate on a weekend. Soil stress resulted in irrigation probabilities of 94.4% for alfalfa. With additional factors like weekend and irrigating when a neighbor irrigates, alfalfa irrigation
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Using Bayesian Stable Isotope Mixing Models to Enhance Marine Ecosystem Models
The use of stable isotopes in food web studies has proven to be a valuable tool for ecologists. We investigated the use of Bayesian stable isotope mixing models as constraints for an ecosystem model of a temperate seagrass system on the Atlantic coast of France. δ13C and δ15N i...
Technology Transfer Automated Retrieval System (TEKTRAN)
In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT). In this hybrid method, several SWAT models with different structures are first selected; next GA i...
Forecasting unconventional resource productivity - A spatial Bayesian model
NASA Astrophysics Data System (ADS)
Montgomery, J.; O'sullivan, F.
2015-12-01
Today's low prices mean that unconventional oil and gas development requires ever greater efficiency and better development decision-making. Inter and intra-field variability in well productivity, which is a major contemporary driver of uncertainty regarding resource size and its economics is driven by factors including geological conditions, well and completion design (which companies vary as they seek to optimize their performance), and uncertainty about the nature of fracture propagation. Geological conditions are often not be well understood early on in development campaigns, but nevertheless critical assessments and decisions must be made regarding the value of drilling an area and the placement of wells. In these situations, location provides a reasonable proxy for geology and the "rock quality." We propose a spatial Bayesian model for forecasting acreage quality, which improves decision-making by leveraging available production data and provides a framework for statistically studying the influence of different parameters on well productivity. Our approach consists of subdividing a field into sections and forming prior distributions for productivity in each section based on knowledge about the overall field. Production data from wells is used to update these estimates in a Bayesian fashion, improving model accuracy far more rapidly and with less sensitivity to outliers than a model that simply establishes an "average" productivity in each section. Additionally, forecasts using this model capture the importance of uncertainty—either due to a lack of information or for areas that demonstrate greater geological risk. We demonstrate the forecasting utility of this method using public data and also provide examples of how information from this model can be combined with knowledge about a field's geology or changes in technology to better quantify development risk. This approach represents an important shift in the way that production data is used to guide
Bayesian Model Selection with Network Based Diffusion Analysis
Whalen, Andrew; Hoppitt, William J. E.
2016-01-01
A number of recent studies have used Network Based Diffusion Analysis (NBDA) to detect the role of social transmission in the spread of a novel behavior through a population. In this paper we present a unified framework for performing NBDA in a Bayesian setting, and demonstrate how the Watanabe Akaike Information Criteria (WAIC) can be used for model selection. We present a specific example of applying this method to Time to Acquisition Diffusion Analysis (TADA). To examine the robustness of this technique, we performed a large scale simulation study and found that NBDA using WAIC could recover the correct model of social transmission under a wide range of cases, including under the presence of random effects, individual level variables, and alternative models of social transmission. This work suggests that NBDA is an effective and widely applicable tool for uncovering whether social transmission underpins the spread of a novel behavior, and may still provide accurate results even when key model assumptions are relaxed. PMID:27092089
Improving default risk prediction using Bayesian model uncertainty techniques.
Kazemi, Reza; Mosleh, Ali
2012-11-01
Credit risk is the potential exposure of a creditor to an obligor's failure or refusal to repay the debt in principal or interest. The potential of exposure is measured in terms of probability of default. Many models have been developed to estimate credit risk, with rating agencies dating back to the 19th century. They provide their assessment of probability of default and transition probabilities of various firms in their annual reports. Regulatory capital requirements for credit risk outlined by the Basel Committee on Banking Supervision have made it essential for banks and financial institutions to develop sophisticated models in an attempt to measure credit risk with higher accuracy. The Bayesian framework proposed in this article uses the techniques developed in physical sciences and engineering for dealing with model uncertainty and expert accuracy to obtain improved estimates of credit risk and associated uncertainties. The approach uses estimates from one or more rating agencies and incorporates their historical accuracy (past performance data) in estimating future default risk and transition probabilities. Several examples demonstrate that the proposed methodology can assess default probability with accuracy exceeding the estimations of all the individual models. Moreover, the methodology accounts for potentially significant departures from "nominal predictions" due to "upsetting events" such as the 2008 global banking crisis.
Bayesian spatially dependent variable selection for small area health modeling.
Choi, Jungsoon; Lawson, Andrew B
2016-06-16
Statistical methods for spatial health data to identify the significant covariates associated with the health outcomes are of critical importance. Most studies have developed variable selection approaches in which the covariates included appear within the spatial domain and their effects are fixed across space. However, the impact of covariates on health outcomes may change across space and ignoring this behavior in spatial epidemiology may cause the wrong interpretation of the relations. Thus, the development of a statistical framework for spatial variable selection is important to allow for the estimation of the space-varying patterns of covariate effects as well as the early detection of disease over space. In this paper, we develop flexible spatial variable selection approaches to find the spatially-varying subsets of covariates with significant effects. A Bayesian hierarchical latent model framework is applied to account for spatially-varying covariate effects. We present a simulation example to examine the performance of the proposed models with the competing models. We apply our models to a county-level low birth weight incidence dataset in Georgia.
A Bayesian model for the analysis of transgenerational epigenetic variation.
Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan
2015-01-23
Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T: matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix ( T-1: ) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible.
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
NASA Astrophysics Data System (ADS)
Iskandar, Ismed; Satria Gondokaryono, Yudi
2016-02-01
In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range
Bayesian modelling of compositional heterogeneity in molecular phylogenetics.
Heaps, Sarah E; Nye, Tom M W; Boys, Richard J; Williams, Tom A; Embley, T Martin
2014-10-01
In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
CRAFFT: An Activity Prediction Model based on Bayesian Networks.
Nazerfard, Ehsan; Cook, Diane J
2015-04-01
Recent advances in the areas of pervasive computing, data mining, and machine learning offer unique opportunities to provide health monitoring and assistance for individuals facing difficulties to live independently in their homes. Several components have to work together to provide health monitoring for smart home residents including, but not limited to, activity recognition, activity discovery, activity prediction, and prompting system. Compared to the significant research done to discover and recognize activities, less attention has been given to predict the future activities that the resident is likely to perform. Activity prediction components can play a major role in design of a smart home. For instance, by taking advantage of an activity prediction module, a smart home can learn context-aware rules to prompt individuals to initiate important activities. In this paper, we propose an activity prediction model using Bayesian networks together with a novel two-step inference process to predict both the next activity features and the next activity label. We also propose an approach to predict the start time of the next activity which is based on modeling the relative start time of the predicted activity using the continuous normal distribution and outlier detection. To validate our proposed models, we used real data collected from physical smart environments.
A Bayesian Semiparametric Model for Radiation Dose-Response Estimation.
Furukawa, Kyoji; Misumi, Munechika; Cologne, John B; Cullings, Harry M
2016-06-01
In evaluating the risk of exposure to health hazards, characterizing the dose-response relationship and estimating acceptable exposure levels are the primary goals. In analyses of health risks associated with exposure to ionizing radiation, while there is a clear agreement that moderate to high radiation doses cause harmful effects in humans, little has been known about the possible biological effects at low doses, for example, below 0.1 Gy, which is the dose range relevant to most radiation exposures of concern today. A conventional approach to radiation dose-response estimation based on simple parametric forms, such as the linear nonthreshold model, can be misleading in evaluating the risk and, in particular, its uncertainty at low doses. As an alternative approach, we consider a Bayesian semiparametric model that has a connected piece-wise-linear dose-response function with prior distributions having an autoregressive structure among the random slope coefficients defined over closely spaced dose categories. With a simulation study and application to analysis of cancer incidence data among Japanese atomic bomb survivors, we show that this approach can produce smooth and flexible dose-response estimation while reasonably handling the risk uncertainty at low doses and elsewhere. With relatively few assumptions and modeling options to be made by the analyst, the method can be particularly useful in assessing risks associated with low-dose radiation exposures.
Hierarchical Bayesian cognitive processing models to analyze clinical trial data.
Shankle, William R; Hara, Junko; Mangrola, Tushar; Hendrix, Suzanne; Alva, Gus; Lee, Michael D
2013-07-01
Identifying disease-modifying treatment effects in earlier stages of Alzheimer's disease (AD)-when changes are subtle-will require improved trial design and more sensitive analytical methods. We applied hierarchical Bayesian analysis with cognitive processing (HBCP) models to the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog) and MCI (mild cognitive impairment) Screen word list memory task data from 14 Alzheimer's disease AD patients of the Myriad Pharmaceuticals' phase III clinical trial of Flurizan (a γ-secretase modulator) versus placebo. The original analysis of 1649 patients found no treatment group differences. HBCP analysis and the original ADAS-Cog analysis were performed on the small sample. HBCP analysis detected impaired memory storage during delayed recall, whereas the original ADAS-Cog analytical method did not. The HBCP model identified a harmful treatment effect in a small sample, which has been independently confirmed from the results of other γ-secretase inhibitor. The original analytical method applied to the ADAS-Cog data did not detect this harmful treatment effect on either the full or the small sample. These findings suggest that HBCP models can detect treatment effects more sensitively than currently used analytical methods required by the Food and Drug Administration, and they do so using small patient samples.
Bayesian network model of crowd emotion and negative behavior
NASA Astrophysics Data System (ADS)
Ramli, Nurulhuda; Ghani, Noraida Abdul; Hatta, Zulkarnain Ahmad; Hashim, Intan Hashimah Mohd; Sulong, Jasni; Mahudin, Nor Diana Mohd; Rahman, Shukran Abd; Saad, Zarina Mat
2014-12-01
The effects of overcrowding have become a major concern for event organizers. One aspect of this concern has been the idea that overcrowding can enhance the occurrence of serious incidents during events. As one of the largest Muslim religious gathering attended by pilgrims from all over the world, Hajj has become extremely overcrowded with many incidents being reported. The purpose of this study is to analyze the nature of human emotion and negative behavior resulting from overcrowding during Hajj events from data gathered in Malaysian Hajj Experience Survey in 2013. The sample comprised of 147 Malaysian pilgrims (70 males and 77 females). Utilizing a probabilistic model called Bayesian network, this paper models the dependence structure between different emotions and negative behaviors of pilgrims in the crowd. The model included the following variables of emotion: negative, negative comfortable, positive, positive comfortable and positive spiritual and variables of negative behaviors; aggressive and hazardous acts. The study demonstrated that emotions of negative, negative comfortable, positive spiritual and positive emotion have a direct influence on aggressive behavior whereas emotion of negative comfortable, positive spiritual and positive have a direct influence on hazardous acts behavior. The sensitivity analysis showed that a low level of negative and negative comfortable emotions leads to a lower level of aggressive and hazardous behavior. Findings of the study can be further improved to identify the exact cause and risk factors of crowd-related incidents in preventing crowd disasters during the mass gathering events.
Random vectors and spatial analysis by geostatistics for geotechnical applications
Young, D.S.
1987-08-01
Geostatistics is extended to the spatial analysis of vector variables by defining the estimation variance and vector variogram in terms of the magnitude of difference vectors. Many random variables in geotechnology are in vectorial terms rather than scalars, and its structural analysis requires those sample variable interpolations to construct and characterize structural models. A better local estimator will result in greater quality of input models; geostatistics can provide such estimators; kriging estimators. The efficiency of geostatistics for vector variables is demonstrated in a case study of rock joint orientations in geological formations. The positive cross-validation encourages application of geostatistics to spatial analysis of random vectors in geoscience as well as various geotechnical fields including optimum site characterization, rock mechanics for mining and civil structures, cavability analysis of block cavings, petroleum engineering, and hydrologic and hydraulic modelings.
Greiner, Matthias; Smid, Joost; Havelaar, Arie H; Müller-Graf, Christine
2013-05-15
Quantitative microbiological risk assessment (QMRA) models are used to reflect knowledge about complex real-world scenarios for the propagation of microbiological hazards along the feed and food chain. The aim is to provide insight into interdependencies among model parameters, typically with an interest to characterise the effect of risk mitigation measures. A particular requirement is to achieve clarity about the reliability of conclusions from the model in the presence of uncertainty. To this end, Monte Carlo (MC) simulation modelling has become a standard in so-called probabilistic risk assessment. In this paper, we elaborate on the application of Bayesian computational statistics in the context of QMRA. It is useful to explore the analogy between MC modelling and Bayesian inference (BI). This pertains in particular to the procedures for deriving prior distributions for model parameters. We illustrate using a simple example that the inability to cope with feedback among model parameters is a major limitation of MC modelling. However, BI models can be easily integrated into MC modelling to overcome this limitation. We refer a BI submodel integrated into a MC model to as a "Bayes domain". We also demonstrate that an entire QMRA model can be formulated as Bayesian graphical model (BGM) and discuss the advantages of this approach. Finally, we show example graphs of MC, BI and BGM models, highlighting the similarities among the three approaches.
Enhancing the Modeling of PFOA Pharmacokinetics with Bayesian Analysis
The detail sufficient to describe the pharmacokinetics (PK) for perfluorooctanoic acid (PFOA) and the methods necessary to combine information from multiple data sets are both subjects of ongoing investigation. Bayesian analysis provides tools to accommodate these goals. We exa...
Predicting individual brain functional connectivity using a Bayesian hierarchical model.
Dai, Tian; Guo, Ying
2017-02-15
Network-oriented analysis of functional magnetic resonance imaging (fMRI), especially resting-state fMRI, has revealed important association between abnormal connectivity and brain disorders such as schizophrenia, major depression and Alzheimer's disease. Imaging-based brain connectivity measures have become a useful tool for investigating the pathophysiology, progression and treatment response of psychiatric disorders and neurodegenerative diseases. Recent studies have started to explore the possibility of using functional neuroimaging to help predict disease progression and guide treatment selection for individual patients. These studies provide the impetus to develop statistical methodology that would help provide predictive information on disease progression-related or treatment-related changes in neural connectivity. To this end, we propose a prediction method based on Bayesian hierarchical model that uses individual's baseline fMRI scans, coupled with relevant subject characteristics, to predict the individual's future functional connectivity. A key advantage of the proposed method is that it can improve the accuracy of individualized prediction of connectivity by combining information from both group-level connectivity patterns that are common to subjects with similar characteristics as well as individual-level connectivity features that are particular to the specific subject. Furthermore, our method also offers statistical inference tools such as predictive intervals that help quantify the uncertainty or variability of the predicted outcomes. The proposed prediction method could be a useful approach to predict the changes in individual patient's brain connectivity with the progression of a disease. It can also be used to predict a patient's post-treatment brain connectivity after a specified treatment regimen. Another utility of the proposed method is that it can be applied to test-retest imaging data to develop a more reliable estimator for individual
A Flexible Bayesian Model for Testing for Transmission Ratio Distortion
Casellas, Joaquim; Manunza, Arianna; Mercader, Anna; Quintanilla, Raquel; Amills, Marcel
2014-01-01
Current statistical approaches to investigate the nature and magnitude of transmission ratio distortion (TRD) are scarce and restricted to the most common experimental designs such as F2 populations and backcrosses. In this article, we describe a new Bayesian approach to check TRD within a given biallelic genetic marker in a diploid species, providing a highly flexible framework that can accommodate any kind of population structure. This model relies on the genotype of each offspring and thus integrates all available information from either the parents’ genotypes or population-specific allele frequencies and yields TRD estimates that can be corroborated by the calculation of a Bayes factor (BF). This approach has been evaluated on simulated data sets with appealing statistical performance. As a proof of concept, we have also tested TRD in a porcine population with five half-sib families and 352 offspring. All boars and piglets were genotyped with the Porcine SNP60 BeadChip, whereas genotypes from the sows were not available. The SNP-by-SNP screening of the pig genome revealed 84 SNPs with decisive evidences of TRD (BF > 100) after accounting for multiple testing. Many of these regions contained genes related to biological processes (e.g., nucleosome assembly and co-organization, DNA conformation and packaging, and DNA complex assembly) that are critically associated with embryonic viability. The implementation of this method, which overcomes many of the limitations of previous approaches, should contribute to fostering research on TRD in both model and nonmodel organisms. PMID:25271302
Bayesian Modeling of Haplotype Effects in Multiparent Populations
Zhang, Zhaojun; Wang, Wei; Valdar, William
2014-01-01
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus. PMID:25236455
Bayesian Modeling and Chronological Precision for Polynesian Settlement of Tonga
Weisler, Marshall; Zhao, Jian-xin
2015-01-01
First settlement of Polynesia, and population expansion throughout the ancestral Polynesian homeland are foundation events for global history. A precise chronology is paramount to informed archaeological interpretation of these events and their consequences. Recently applied chronometric hygiene protocols excluding radiocarbon dates on wood charcoal without species identification all but eliminates this chronology as it has been built for the Kingdom of Tonga, the initial islands to be settled in Polynesia. In this paper we re-examine and redevelop this chronology through application of Bayesian models to the questioned suite of radiocarbon dates, but also incorporating short-lived wood charcoal dates from archived samples and high precision U/Th dates on coral artifacts. These models provide generation level precision allowing us to track population migration from first Lapita occupation on the island of Tongatapu through Tonga’s central and northern island groups. They further illustrate an exceptionally short duration for the initial colonizing Lapita phase and a somewhat abrupt transition to ancestral Polynesian society as it is currently defined. PMID:25799460
Bayesian image reconstruction - The pixon and optimal image modeling
NASA Technical Reports Server (NTRS)
Pina, R. K.; Puetter, R. C.
1993-01-01
In this paper we describe the optimal image model, maximum residual likelihood method (OptMRL) for image reconstruction. OptMRL is a Bayesian image reconstruction technique for removing point-spread function blurring. OptMRL uses both a goodness-of-fit criterion (GOF) and an 'image prior', i.e., a function which quantifies the a priori probability of the image. Unlike standard maximum entropy methods, which typically reconstruct the image on the data pixel grid, OptMRL varies the image model in order to find the optimal functional basis with which to represent the image. We show how an optimal basis for image representation can be selected and in doing so, develop the concept of the 'pixon' which is a generalized image cell from which this basis is constructed. By allowing both the image and the image representation to be variable, the OptMRL method greatly increases the volume of solution space over which the image is optimized. Hence the likelihood of the final reconstructed image is greatly increased. For the goodness-of-fit criterion, OptMRL uses the maximum residual likelihood probability distribution introduced previously by Pina and Puetter (1992). This GOF probability distribution, which is based on the spatial autocorrelation of the residuals, has the advantage that it ensures spatially uncorrelated image reconstruction residuals.
Bayesian image reconstruction - The pixon and optimal image modeling
NASA Astrophysics Data System (ADS)
Pina, R. K.; Puetter, R. C.
1993-06-01
In this paper we describe the optimal image model, maximum residual likelihood method (OptMRL) for image reconstruction. OptMRL is a Bayesian image reconstruction technique for removing point-spread function blurring. OptMRL uses both a goodness-of-fit criterion (GOF) and an 'image prior', i.e., a function which quantifies the a priori probability of the image. Unlike standard maximum entropy methods, which typically reconstruct the image on the data pixel grid, OptMRL varies the image model in order to find the optimal functional basis with which to represent the image. We show how an optimal basis for image representation can be selected and in doing so, develop the concept of the 'pixon' which is a generalized image cell from which this basis is constructed. By allowing both the image and the image representation to be variable, the OptMRL method greatly increases the volume of solution space over which the image is optimized. Hence the likelihood of the final reconstructed image is greatly increased. For the goodness-of-fit criterion, OptMRL uses the maximum residual likelihood probability distribution introduced previously by Pina and Puetter (1992). This GOF probability distribution, which is based on the spatial autocorrelation of the residuals, has the advantage that it ensures spatially uncorrelated image reconstruction residuals.
Bayesian network models in brain functional connectivity analysis
Zhang, Sheng; Li, Chiang-shan R.
2013-01-01
Much effort has been made to better understand the complex integration of distinct parts of the human brain using functional magnetic resonance imaging (fMRI). Altered functional connectivity between brain regions is associated with many neurological and mental illnesses, such as Alzheimer and Parkinson diseases, addiction, and depression. In computational science, Bayesian networks (BN) have been used in a broad range of studies to model complex data set in the presence of uncertainty and when expert prior knowledge is needed. However, little is done to explore the use of BN in connectivity analysis of fMRI data. In this paper, we present an up-to-date literature review and methodological details of connectivity analyses using BN, while highlighting caveats in a real-world application. We present a BN model of fMRI dataset obtained from sixty healthy subjects performing the stop-signal task (SST), a paradigm widely used to investigate response inhibition. Connectivity results are validated with the extant literature including our previous studies. By exploring the link strength of the learned BN’s and correlating them to behavioral performance measures, this novel use of BN in connectivity analysis provides new insights to the functional neural pathways underlying response inhibition. PMID:24319317
Bayesian modeling of temporal dependence in large sparse contingency tables
Kunihama, Tsuyoshi; Dunson, David B.
2013-01-01
In many applications, it is of interest to study trends over time in relationships among categorical variables, such as age group, ethnicity, religious affiliation, political party and preference for particular policies. At each time point, a sample of individuals provide responses to a set of questions, with different individuals sampled at each time. In such settings, there tends to be abundant missing data and the variables being measured may change over time. At each time point, one obtains a large sparse contingency table, with the number of cells often much larger than the number of individuals being surveyed. To borrow information across time in modeling large sparse contingency tables, we propose a Bayesian autoregressive tensor factorization approach. The proposed model relies on a probabilistic Parafac factorization of the joint pmf characterizing the categorical data distribution at each time point, with autocorrelation included across times. Efficient computational methods are developed relying on MCMC. The methods are evaluated through simulation examples and applied to social survey data. PMID:24482548
A Bayesian model of context-sensitive value attribution
Rigoli, Francesco; Friston, Karl J; Martinelli, Cristina; Selaković, Mirjana; Shergill, Sukhwinder S; Dolan, Raymond J
2016-01-01
Substantial evidence indicates that incentive value depends on an anticipation of rewards within a given context. However, the computations underlying this context sensitivity remain unknown. To address this question, we introduce a normative (Bayesian) account of how rewards map to incentive values. This assumes that the brain inverts a model of how rewards are generated. Key features of our account include (i) an influence of prior beliefs about the context in which rewards are delivered (weighted by their reliability in a Bayes-optimal fashion), (ii) the notion that incentive values correspond to precision-weighted prediction errors, (iii) and contextual information unfolding at different hierarchical levels. This formulation implies that incentive value is intrinsically context-dependent. We provide empirical support for this model by showing that incentive value is influenced by context variability and by hierarchically nested contexts. The perspective we introduce generates new empirical predictions that might help explaining psychopathologies, such as addiction. DOI: http://dx.doi.org/10.7554/eLife.16127.001 PMID:27328323
Emulation Modeling with Bayesian Networks for Efficient Decision Support
NASA Astrophysics Data System (ADS)
Fienen, M. N.; Masterson, J.; Plant, N. G.; Gutierrez, B. T.; Thieler, E. R.
2012-12-01
Bayesian decision networks (BDN) have long been used to provide decision support in systems that require explicit consideration of uncertainty; applications range from ecology to medical diagnostics and terrorism threat assessments. Until recently, however, few studies have applied BDNs to the study of groundwater systems. BDNs are particularly useful for representing real-world system variability by synthesizing a range of hydrogeologic situations within a single simulation. Because BDN output is cast in terms of probability—an output desired by decision makers—they explicitly incorporate the uncertainty of a system. BDNs can thus serve as a more efficient alternative to other uncertainty characterization methods such as computationally demanding Monte Carlo analyses and others methods restricted to linear model analyses. We present a unique application of a BDN to a groundwater modeling analysis of the hydrologic response of Assateague Island, Maryland to sea-level rise. Using both input and output variables of the modeled groundwater response to different sea-level (SLR) rise scenarios, the BDN predicts the probability of changes in the depth to fresh water, which exerts an important influence on physical and biological island evolution. Input variables included barrier-island width, maximum island elevation, and aquifer recharge. The variability of these inputs and their corresponding outputs are sampled along cross sections in a single model run to form an ensemble of input/output pairs. The BDN outputs, which are the posterior distributions of water table conditions for the sea-level rise scenarios, are evaluated through error analysis and cross-validation to assess both fit to training data and predictive power. The key benefit for using BDNs in groundwater modeling analyses is that they provide a method for distilling complex model results into predictions with associated uncertainty, which is useful to decision makers. Future efforts incorporate
Tauber, Sean; Navarro, Daniel J; Perfors, Amy; Steyvers, Mark
2017-03-30
Recent debates in the psychological literature have raised questions about the assumptions that underpin Bayesian models of cognition and what inferences they license about human cognition. In this paper we revisit this topic, arguing that there are 2 qualitatively different ways in which a Bayesian model could be constructed. The most common approach uses a Bayesian model as a normative standard upon which to license a claim about optimality. In the alternative approach, a descriptive Bayesian model need not correspond to any claim that the underlying cognition is optimal or rational, and is used solely as a tool for instantiating a substantive psychological theory. We present 3 case studies in which these 2 perspectives lead to different computational models and license different conclusions about human cognition. We demonstrate how the descriptive Bayesian approach can be used to answer different sorts of questions than the optimal approach, especially when combined with principled tools for model evaluation and model selection. More generally we argue for the importance of making a clear distinction between the 2 perspectives. Considerable confusion results when descriptive models and optimal models are conflated, and if Bayesians are to avoid contributing to this confusion it is important to avoid making normative claims when none are intended. (PsycINFO Database Record
A Bayesian view on acoustic model-based techniques for robust speech recognition
NASA Astrophysics Data System (ADS)
Maas, Roland; Huemmer, Christian; Sehr, Armin; Kellermann, Walter
2015-12-01
This article provides a unifying Bayesian view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By identifying and converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules. We thus summarize the various approaches as approximations or modifications of the same Bayesian decoding rule leading to a unified view on known derivations as well as to new formulations for certain approaches.
Lee, Sik-Yum; Song, Xin-Yuan
2004-05-01
Missing data are very common in behavioural and psychological research. In this paper, we develop a Bayesian approach in the context of a general nonlinear structural equation model with missing continuous and ordinal categorical data. In the development, the missing data are treated as latent quantities, and provision for the incompleteness of the data is made by a hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm. We show by means of a simulation study that the Bayesian estimates are accurate. A Bayesian model comparison procedure based on the Bayes factor and path sampling is proposed. The required observations from the posterior distribution for computing the Bayes factor are simulated by the hybrid algorithm in Bayesian estimation. Our simulation results indicate that the correct model is selected more frequently when the incomplete records are used in the analysis than when they are ignored. The methodology is further illustrated with a real data set from a study concerned with an AIDS preventative intervention for Filipina sex workers.
Bayesian Safety Risk Modeling of Human-Flightdeck Automation Interaction
NASA Technical Reports Server (NTRS)
Ancel, Ersin; Shih, Ann T.
2015-01-01
Usage of automatic systems in airliners has increased fuel efficiency, added extra capabilities, enhanced safety and reliability, as well as provide improved passenger comfort since its introduction in the late 80's. However, original automation benefits, including reduced flight crew workload, human errors or training requirements, were not achieved as originally expected. Instead, automation introduced new failure modes, redistributed, and sometimes increased workload, brought in new cognitive and attention demands, and increased training requirements. Modern airliners have numerous flight modes, providing more flexibility (and inherently more complexity) to the flight crew. However, the price to pay for the increased flexibility is the need for increased mode awareness, as well as the need to supervise, understand, and predict automated system behavior. Also, over-reliance on automation is linked to manual flight skill degradation and complacency in commercial pilots. As a result, recent accidents involving human errors are often caused by the interactions between humans and the automated systems (e.g., the breakdown in man-machine coordination), deteriorated manual flying skills, and/or loss of situational awareness due to heavy dependence on automated systems. This paper describes the development of the increased complexity and reliance on automation baseline model, named FLAP for FLightdeck Automation Problems. The model development process starts with a comprehensive literature review followed by the construction of a framework comprised of high-level causal factors leading to an automation-related flight anomaly. The framework was then converted into a Bayesian Belief Network (BBN) using the Hugin Software v7.8. The effects of automation on flight crew are incorporated into the model, including flight skill degradation, increased cognitive demand and training requirements along with their interactions. Besides flight crew deficiencies, automation system
A Bayesian hierarchical surrogate outcome model for multiple sclerosis.
Pozzi, Luca; Schmidli, Heinz; Ohlssen, David I
2016-07-01
The development of novel therapies in multiple sclerosis (MS) is one area where a range of surrogate outcomes are used in various stages of clinical research. While the aim of treatments in MS is to prevent disability, a clinical trial for evaluating a drugs effect on disability progression would require a large sample of patients with many years of follow-up. The early stage of MS is characterized by relapses. To reduce study size and duration, clinical relapses are accepted as primary endpoints in phase III trials. For phase II studies, the primary outcomes are typically lesion counts based on magnetic resonance imaging (MRI), as these are considerably more sensitive than clinical measures for detecting MS activity. Recently, Sormani and colleagues in 'Surrogate endpoints for EDSS worsening in multiple sclerosis' provided a systematic review and used weighted regression analyses to examine the role of either MRI lesions or relapses as trial level surrogate outcomes for disability. We build on this work by developing a Bayesian three-level model, accommodating the two surrogates and the disability endpoint, and properly taking into account that treatment effects are estimated with errors. Specifically, a combination of treatment effects based on MRI lesion count outcomes and clinical relapse was used to develop a study-level surrogate outcome model for the corresponding treatment effects based on disability progression. While the primary aim for developing this model was to support decision-making in drug development, the proposed model may also be considered for future validation. Copyright © 2016 John Wiley & Sons, Ltd.
A Bayesian network model for predicting aquatic toxicity mode ...
The mode of toxic action (MoA) has been recognized as a key determinant of chemical toxicity but MoA classification in aquatic toxicology has been limited. We developed a Bayesian network model to classify aquatic toxicity mode of action using a recently published dataset containing over one thousand chemicals with MoA assignments for aquatic animal toxicity. Two dimensional theoretical chemical descriptors were generated for each chemical using the Toxicity Estimation Software Tool. The model was developed through augmented Markov blanket discovery from the data set with the MoA broad classifications as a target node. From cross validation, the overall precision for the model was 80.2% with a R2 of 0.959. The best precision was for the AChEI MoA (93.5%) where 257 chemicals out of 275 were correctly classified. Model precision was poorest for the reactivity MoA (48.5%) where 48 out of 99 reactive chemicals were correctly classified. Narcosis represented the largest class within the MoA dataset and had a precision and reliability of 80.0%, reflecting the global precision across all of the MoAs. False negatives for narcosis most often fell into electron transport inhibition, neurotoxicity or reactivity MoAs. False negatives for all other MoAs were most often narcosis. A probabilistic sensitivity analysis was undertaken for each MoA to examine the sensitivity to individual and multiple descriptor findings. The results show that the Markov blanket of a structurally
Bayesian Hidden Markov Modeling of Array CGH Data.
Guha, Subharup; Li, Yi; Neuberg, Donna
2008-06-01
Genomic alterations have been linked to the development and progression of cancer. The technique of comparative genomic hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data.We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Because the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme, and breast cancer are analyzed, and comparisons are made with some widely used algorithms to illustrate the reliability and success of the technique.
Estimating seabed scattering mechanisms via Bayesian model selection.
Steininger, Gavin; Dosso, Stan E; Holland, Charles W; Dettmer, Jan
2014-10-01
A quantitative inversion procedure is developed and applied to determine the dominant scattering mechanism (surface roughness and/or volume scattering) from seabed scattering-strength data. The classification system is based on trans-dimensional Bayesian inversion with the deviance information criterion used to select the dominant scattering mechanism. Scattering is modeled using first-order perturbation theory as due to one of three mechanisms: Interface scattering from a rough seafloor, volume scattering from a heterogeneous sediment layer, or mixed scattering combining both interface and volume scattering. The classification system is applied to six simulated test cases where it correctly identifies the true dominant scattering mechanism as having greater support from the data in five cases; the remaining case is indecisive. The approach is also applied to measured backscatter-strength data where volume scattering is determined as the dominant scattering mechanism. Comparison of inversion results with core data indicates the method yields both a reasonable volume heterogeneity size distribution and a good estimate of the sub-bottom depths at which scatterers occur.
Bayesian parameter inference and model selection by population annealing in systems biology.
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.
Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832
Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements
Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Datta, Susmita; Payne, Samuel H.; Kang, Jiyun; Bramer, Lisa M.; Nicora, Carrie D.; Shukla, Anil K.; Metz, Thomas O.; Rodland, Karin D.; Smith, Richard D.; Tardiff, Mark F.; McDermott, Jason E.; Pounds, Joel G.; Waters, Katrina M.
2014-12-01
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.
Geostatistical applications in environmental remediation
Stewart, R.N.; Purucker, S.T.; Lyon, B.F.
1995-02-01
Geostatistical analysis refers to a collection of statistical methods for addressing data that vary in space. By incorporating spatial information into the analysis, geostatistics has advantages over traditional statistical analysis for problems with a spatial context. Geostatistics has a history of success in earth science applications, and its popularity is increasing in other areas, including environmental remediation. Due to recent advances in computer technology, geostatistical algorithms can be executed at a speed comparable to many standard statistical software packages. When used responsibly, geostatistics is a systematic and defensible tool can be used in various decision frameworks, such as the Data Quality Objectives (DQO) process. At every point in the site, geostatistics can estimate both the concentration level and the probability or risk of exceeding a given value. Using these probability maps can assist in identifying clean-up zones. Given any decision threshold and an acceptable level of risk, the probability maps identify those areas that are estimated to be above or below the acceptable risk. Those areas that are above the threshold are of the most concern with regard to remediation. In addition to estimating clean-up zones, geostatistics can assist in designing cost-effective secondary sampling schemes. Those areas of the probability map with high levels of estimated uncertainty are areas where more secondary sampling should occur. In addition, geostatistics has the ability to incorporate soft data directly into the analysis. These data include historical records, a highly correlated secondary contaminant, or expert judgment. The role of geostatistics in environmental remediation is a tool that in conjunction with other methods can provide a common forum for building consensus.
Bayesian Model Comparison for the Order Restricted RC Association Model
ERIC Educational Resources Information Center
Iliopoulos, G.; Kateri, M.; Ntzoufras, I.
2009-01-01
Association models constitute an attractive alternative to the usual log-linear models for modeling the dependence between classification variables. They impose special structure on the underlying association by assigning scores on the levels of each classification variable, which can be fixed or parametric. Under the general row-column (RC)…
Ice Shelf Modeling: A Cross-Polar Bayesian Statistical Approach
NASA Astrophysics Data System (ADS)
Kirchner, N.; Furrer, R.; Jakobsson, M.; Zwally, H. J.
2010-12-01
Ice streams interlink glacial terrestrial and marine environments: embedded in a grounded inland ice such as the Antarctic Ice Sheet or the paleo ice sheets covering extensive parts of the Eurasian and Amerasian Arctic respectively, ice streams are major drainage agents facilitating the discharge of substantial portions of continental ice into the ocean. At their seaward side, ice streams can either extend onto the ocean as floating ice tongues (such as the Drygalsky Ice Tongue/East Antarctica), or feed large ice shelves (as is the case for e.g. the Siple Coast and the Ross Ice Shelf/West Antarctica). The flow behavior of ice streams has been recognized to be intimately linked with configurational changes in their attached ice shelves; in particular, ice shelf disintegration is associated with rapid ice stream retreat and increased mass discharge from the continental ice mass, contributing eventually to sea level rise. Investigations of ice stream retreat mechanism are however incomplete if based on terrestrial records only: rather, the dynamics of ice shelves (and, eventually, the impact of the ocean on the latter) must be accounted for. However, since floating ice shelves leave hardly any traces behind when melting, uncertainty regarding the spatio-temporal distribution and evolution of ice shelves in times prior to instrumented and recorded observation is high, calling thus for a statistical modeling approach. Complementing ongoing large-scale numerical modeling efforts (Pollard & DeConto, 2009), we model the configuration of ice shelves by using a Bayesian Hiearchial Modeling (BHM) approach. We adopt a cross-polar perspective accounting for the fact that currently, ice shelves exist mainly along the coastline of Antarctica (and are virtually non-existing in the Arctic), while Arctic Ocean ice shelves repeatedly impacted the Arctic ocean basin during former glacial periods. Modeled Arctic ocean ice shelf configurations are compared with geological spatial
Inherently irrational? A computational model of escalation of commitment as Bayesian Updating.
Gilroy, Shawn P; Hantula, Donald A
2016-06-01
Monte Carlo simulations were performed to analyze the degree to which two-, three- and four-step learning histories of losses and gains correlated with escalation and persistence in extended extinction (continuous loss) conditions. Simulated learning histories were randomly generated at varying lengths and compositions and warranted probabilities were determined using Bayesian Updating methods. Bayesian Updating predicted instances where particular learning sequences were more likely to engender escalation and persistence under extinction conditions. All simulations revealed greater rates of escalation and persistence in the presence of heterogeneous (e.g., both Wins and Losses) lag sequences, with substantially increased rates of escalation when lags comprised predominantly of losses were followed by wins. These methods were then applied to human investment choices in earlier experiments. The Bayesian Updating models corresponded with data obtained from these experiments. These findings suggest that Bayesian Updating can be utilized as a model for understanding how and when individual commitment may escalate and persist despite continued failures.
Applications of geostatistics in plant nematology.
Wallace, M K; Hawkins, D M
1994-12-01
The application of geostatistics to plant nematology was made by evaluating soil and nematode data acquired from 200 soil samples collected from the A(p) horizon of a reed canary-grass field in northern Minnesota. Geostatistical concepts relevant to nematology include semi-variogram modelling, kriging, and change of support calculations. Soil and nematode data generally followed a spherical semi-variogram model, with little random variability associated with soil data and large inherent variability for nematode data. Block kriging of soil and nematode data provided useful contour maps of the data. Change of snpport calculations indicated that most of the random variation in nematode data was due to short-range spatial variability in the nematode population densities.
Hierarchical Bayesian Model Averaging for Chance Constrained Remediation Designs
NASA Astrophysics Data System (ADS)
Chitsazan, N.; Tsai, F. T.
2012-12-01
Groundwater remediation designs are heavily relying on simulation models which are subjected to various sources of uncertainty in their predictions. To develop a robust remediation design, it is crucial to understand the effect of uncertainty sources. In this research, we introduce a hierarchical Bayesian model averaging (HBMA) framework to segregate and prioritize sources of uncertainty in a multi-layer frame, where each layer targets a source of uncertainty. The HBMA framework provides an insight to uncertainty priorities and propagation. In addition, HBMA allows evaluating model weights in different hierarchy levels and assessing the relative importance of models in each level. To account for uncertainty, we employ a chance constrained (CC) programming for stochastic remediation design. Chance constrained programming was implemented traditionally to account for parameter uncertainty. Recently, many studies suggested that model structure uncertainty is not negligible compared to parameter uncertainty. Using chance constrained programming along with HBMA can provide a rigorous tool for groundwater remediation designs under uncertainty. In this research, the HBMA-CC was applied to a remediation design in a synthetic aquifer. The design was to develop a scavenger well approach to mitigate saltwater intrusion toward production wells. HBMA was employed to assess uncertainties from model structure, parameter estimation and kriging interpolation. An improved harmony search optimization method was used to find the optimal location of the scavenger well. We evaluated prediction variances of chloride concentration at the production wells through the HBMA framework. The results showed that choosing the single best model may lead to a significant error in evaluating prediction variances for two reasons. First, considering the single best model, variances that stem from uncertainty in the model structure will be ignored. Second, considering the best model with non
Applications of Bayesian model averaging to the curvature and size of the Universe
NASA Astrophysics Data System (ADS)
Vardanyan, Mihran; Trotta, Roberto; Silk, Joseph
2011-05-01
Bayesian model averaging is a procedure to obtain parameter constraints that account for the uncertainty about the correct cosmological model. We use recent cosmological observations and Bayesian model averaging to derive tight limits on the curvature parameter, as well as robust lower bounds on the curvature radius of the Universe and its minimum size, while allowing for the possibility of an evolving dark energy component. Because flat models are favoured by Bayesian model selection, we find that model-averaged constraints on the curvature and size of the Universe can be considerably stronger than non-model-averaged ones. For the most conservative prior choice (based on inflationary considerations), our procedure improves on non-model-averaged constraints on the curvature by a factor of ˜2. The curvature scale of the Universe is conservatively constrained to be Rc > 42 Gpc (99 per cent), corresponding to a lower limit to the number of Hubble spheres in the Universe NU > 251 (99 per cent).
Bayesian estimation of regularization parameters for deformable surface models
Cunningham, G.S.; Lehovich, A.; Hanson, K.M.
1999-02-20
In this article the authors build on their past attempts to reconstruct a 3D, time-varying bolus of radiotracer from first-pass data obtained by the dynamic SPECT imager, FASTSPECT, built by the University of Arizona. The object imaged is a CardioWest total artificial heart. The bolus is entirely contained in one ventricle and its associated inlet and outlet tubes. The model for the radiotracer distribution at a given time is a closed surface parameterized by 482 vertices that are connected to make 960 triangles, with nonuniform intensity variations of radiotracer allowed inside the surface on a voxel-to-voxel basis. The total curvature of the surface is minimized through the use of a weighted prior in the Bayesian framework, as is the weighted norm of the gradient of the voxellated grid. MAP estimates for the vertices, interior intensity voxels and background count level are produced. The strength of the priors, or hyperparameters, are determined by maximizing the probability of the data given the hyperparameters, called the evidence. The evidence is calculated by first assuming that the posterior is approximately normal in the values of the vertices and voxels, and then by evaluating the integral of the multi-dimensional normal distribution. This integral (which requires evaluating the determinant of a covariance matrix) is computed by applying a recent algorithm from Bai et. al. that calculates the needed determinant efficiently. They demonstrate that the radiotracer is highly inhomogeneous in early time frames, as suspected in earlier reconstruction attempts that assumed a uniform intensity of radiotracer within the closed surface, and that the optimal choice of hyperparameters is substantially different for different time frames.
NASA Astrophysics Data System (ADS)
Mateus, Luis; Stollenwerk, Nico; Zambrini, Jean Claude
2012-09-01
We compare two stochastic epidemiological models in a Bayesian framework, both models performing on the same simulated data set. In some cases of data obtained under one model with specific parameter values the model comparison favours the model not underlying the simulated data.
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
Yi, Nengjun; Shriner, Daniel; Banerjee, Samprit; Mehta, Tapan; Pomp, Daniel; Yandell, Brian S.
2007-01-01
We extend our Bayesian model selection framework for mapping epistatic QTL in experimental crosses to include environmental effects and gene–environment interactions. We propose a new, fast Markov chain Monte Carlo algorithm to explore the posterior distribution of unknowns. In addition, we take advantage of any prior knowledge about genetic architecture to increase posterior probability on more probable models. These enhancements have significant computational advantages in models with many effects. We illustrate the proposed method by detecting new epistatic and gene–sex interactions for obesity-related traits in two real data sets of mice. Our method has been implemented in the freely available package R/qtlbim (http://www.qtlbim.org) to facilitate the general usage of the Bayesian methodology for genomewide interacting QTL analysis. PMID:17483424
Model Criticism of Bayesian Networks with Latent Variables.
ERIC Educational Resources Information Center
Williamson, David M.; Mislevy, Robert J.; Almond, Russell G.
This study investigated statistical methods for identifying errors in Bayesian networks (BN) with latent variables, as found in intelligent cognitive assessments. BN, commonly used in artificial intelligence systems, are promising mechanisms for scoring constructed-response examinations. The success of an intelligent assessment or tutoring system…
Bayesian Modeling in Institutional Research: An Example of Nonlinear Classification
ERIC Educational Resources Information Center
Xu, Yonghong Jade; Ishitani, Terry T.
2008-01-01
In recent years, rapid advancement has taken place in computing technology that allows institutional researchers to efficiently and effectively address data of increasing volume and structural complexity (Luan, 2002). In this chapter, the authors propose a new data analytical technique, Bayesian belief networks (BBN), to add to the toolbox for…
A Comparison of Imputation Methods for Bayesian Factor Analysis Models
ERIC Educational Resources Information Center
Merkle, Edgar C.
2011-01-01
Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
NASA Astrophysics Data System (ADS)
Scradeanu, D.; Pagnejer, M.
2012-04-01
The purpose of the works is to evaluate the uncertainty of the hydrodynamic model for a multilayered geological structure, a potential trap for carbon dioxide storage. The hydrodynamic model is based on a conceptual model of the multilayered hydrostructure with three components: 1) spatial model; 2) parametric model and 3) energy model. The necessary data to achieve the three components of the conceptual model are obtained from: 240 boreholes explored by geophysical logging and seismic investigation, for the first two components, and an experimental water injection test for the last one. The hydrodinamic model is a finite difference numerical model based on a 3D stratigraphic model with nine stratigraphic units (Badenian and Oligocene) and a 3D multiparameter model (porosity, permeability, hydraulic conductivity, storage coefficient, leakage etc.). The uncertainty of the two 3D models was evaluated using multivariate geostatistical tools: a)cross-semivariogram for structural analysis, especially the study of anisotropy and b)cokriging to reduce estimation variances in a specific situation where is a cross-correlation between a variable and one or more variables that are undersampled. It has been identified important differences between univariate and bivariate anisotropy. The minimised uncertainty of the parametric model (by cokriging) was transferred to hydrodynamic model. The uncertainty distribution of the pressures generated by the water injection test has been additional filtered by the sensitivity of the numerical model. The obtained relative errors of the pressure distribution in the hydrodynamic model are 15-20%. The scientific research was performed in the frame of the European FP7 project "A multiple space and time scale approach for the quantification of deep saline formation for CO2 storage(MUSTANG)".
Parameterizing Bayesian network Representations of Social-Behavioral Models by Expert Elicitation
Walsh, Stephen J.; Dalton, Angela C.; Whitney, Paul D.; White, Amanda M.
2010-05-23
Bayesian networks provide a general framework with which to model many natural phenomena. The mathematical nature of Bayesian networks enables a plethora of model validation and calibration techniques: e.g parameter estimation, goodness of fit tests, and diagnostic checking of the model assumptions. However, they are not free of shortcomings. Parameter estimation from relevant extant data is a common approach to calibrating the model parameters. In practice it is not uncommon to find oneself lacking adequate data to reliably estimate all model parameters. In this paper we present the early development of a novel application of conjoint analysis as a method for eliciting and modeling expert opinions and using the results in a methodology for calibrating the parameters of a Bayesian network.
Shen, Yanna; Cooper, Gregory F
2012-09-01
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete.
NASA Astrophysics Data System (ADS)
Placek, Ben; Knuth, Kevin H.; Angerhausen, Daniel
2014-11-01
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
Placek, Ben; Knuth, Kevin H.; Angerhausen, Daniel E-mail: kknuth@albany.edu
2014-11-10
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
A Bayesian hierarchical diffusion model decomposition of performance in Approach–Avoidance Tasks
Krypotos, Angelos-Miltiadis; Beckers, Tom; Kindt, Merel; Wagenmakers, Eric-Jan
2015-01-01
Common methods for analysing response time (RT) tasks, frequently used across different disciplines of psychology, suffer from a number of limitations such as the failure to directly measure the underlying latent processes of interest and the inability to take into account the uncertainty associated with each individual's point estimate of performance. Here, we discuss a Bayesian hierarchical diffusion model and apply it to RT data. This model allows researchers to decompose performance into meaningful psychological processes and to account optimally for individual differences and commonalities, even with relatively sparse data. We highlight the advantages of the Bayesian hierarchical diffusion model decomposition by applying it to performance on Approach–Avoidance Tasks, widely used in the emotion and psychopathology literature. Model fits for two experimental data-sets demonstrate that the model performs well. The Bayesian hierarchical diffusion model overcomes important limitations of current analysis procedures and provides deeper insight in latent psychological processes of interest. PMID:25491372
Number-Knower Levels in Young Children: Insights from Bayesian Modeling
ERIC Educational Resources Information Center
Lee, Michael D.; Sarnecka, Barbara W.
2011-01-01
Lee and Sarnecka (2010) developed a Bayesian model of young children's behavior on the Give-N test of number knowledge. This paper presents two new extensions of the model, and applies the model to new data. In the first extension, the model is used to evaluate competing theories about the conceptual knowledge underlying children's behavior. One,…
A Bayesian approach to model structural error and input variability in groundwater modeling
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Lin, Y. F. F.; Liang, F.
2015-12-01
Effective water resource management typically relies on numerical models to analyze groundwater flow and solute transport processes. Model structural error (due to simplification and/or misrepresentation of the "true" environmental system) and input forcing variability (which commonly arises since some inputs are uncontrolled or estimated with high uncertainty) are ubiquitous in groundwater models. Calibration that overlooks errors in model structure and input data can lead to biased parameter estimates and compromised predictions. We present a fully Bayesian approach for a complete assessment of uncertainty for spatially distributed groundwater models. The approach explicitly recognizes stochastic input and uses data-driven error models based on nonparametric kernel methods to account for model structural error. We employ exploratory data analysis to assist in specifying informative prior for error models to improve identifiability. The inference is facilitated by an efficient sampling algorithm based on DREAM-ZS and a parameter subspace multiple-try strategy to reduce the required number of forward simulations of the groundwater model. We demonstrate the Bayesian approach through a synthetic case study of surface-ground water interaction under changing pumping conditions. It is found that explicit treatment of errors in model structure and input data (groundwater pumping rate) has substantial impact on the posterior distribution of groundwater model parameters. Using error models reduces predictive bias caused by parameter compensation. In addition, input variability increases parametric and predictive uncertainty. The Bayesian approach allows for a comparison among the contributions from various error sources, which could inform future model improvement and data collection efforts on how to best direct resources towards reducing predictive uncertainty.
Karacan, C.O.; Olea, R.A.; Goodman, G.
2012-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control.This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines.Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may
Karacan, C. Özgen; Olea, Ricardo A.; Goodman, Gerrit
2015-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control. This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines. Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may
Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno
2016-01-01
Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323
Abanto-Valle, C. A.; Bandyopadhyay, D.; Lachos, V. H.; Enriquez, I.
2009-01-01
A Bayesian analysis of stochastic volatility (SV) models using the class of symmetric scale mixtures of normal (SMN) distributions is considered. In the face of non-normality, this provides an appealing robust alternative to the routine use of the normal distribution. Specific distributions examined include the normal, student-t, slash and the variance gamma distributions. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. Moreover, the mixing parameters obtained as a by-product of the scale mixture representation can be used to identify outliers. The methods developed are applied to analyze daily stock returns data on S&P500 index. Bayesian model selection criteria as well as out-of- sample forecasting results reveal that the SV models based on heavy-tailed SMN distributions provide significant improvement in model fit as well as prediction to the S&P500 index data over the usual normal model. PMID:20730043
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-12-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-01-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272
Bayesian shared frailty models for regional inference about wildlife survival
Heisey, D.M.
2012-01-01
One can joke that 'exciting statistics' is an oxymoron, but it is neither a joke nor an exaggeration to say that these are exciting times to be involved in statistical ecology. As Halstead et al.'s (2012) paper nicely exemplifies, recently developed Bayesian analyses can now be used to extract insights from data using techniques that would have been unavailable to the ecological researcher just a decade ago. Some object to this, implying that the subjective priors of the Bayesian approach is the pathway to perdition (e.g. Lele & Dennis, 2009). It is reasonable to ask whether these new approaches are really giving us anything that we could not obtain with traditional tried-and-true frequentist approaches. I believe the answer is a clear yes.
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
Bayesian Comparison of Alternative Graded Response Models for Performance Assessment Applications
ERIC Educational Resources Information Center
Zhu, Xiaowen; Stone, Clement A.
2012-01-01
This study examined the relative effectiveness of Bayesian model comparison methods in selecting an appropriate graded response (GR) model for performance assessment applications. Three popular methods were considered: deviance information criterion (DIC), conditional predictive ordinate (CPO), and posterior predictive model checking (PPMC). Using…
ERIC Educational Resources Information Center
Natesan, Prathiba; Limbers, Christine; Varni, James W.
2010-01-01
The present study presents the formulation of graded response models in the multilevel framework (as nonlinear mixed models) and demonstrates their use in estimating item parameters and investigating the group-level effects for specific covariates using Bayesian estimation. The graded response multilevel model (GRMM) combines the formulation of…
Fraczek, W; Bytnerowicz, A; Arbaugh, M J
2001-12-07
Models of O3 distribution in two mountain ranges, the Carpathians in Central Europe and the Sierra Nevada in California were constructed using ArcGIS Geostatistical Analyst extension (ESRI, Redlands, CA) using kriging and cokriging methods. The adequacy of the spatially interpolated ozone (O3) concentrations and sample size requirements for ozone passive samplers was also examined. In case of the Carpathian Mountains, only a general surface of O3 distribution could be obtained, partially due to a weak correlation between O3 concentration and elevation, and partially due to small numbers of unevenly distributed sample sites. In the Sierra Nevada Mountains, the O3 monitoring network was much denser and more evenly distributed, and additional climatologic information was available. As a result the estimated surfaces were more precise and reliable than those created for the Carpathians. The final maps of O3 concentrations for Sierra Nevada were derived from cokriging algorithm based on two secondary variables--elevation and maximum temperature as well as the determined geographic trend. Evenly distributed and sufficient numbers of sample points are a key factor for model accuracy and reliability.
NASA Astrophysics Data System (ADS)
Werner, Johannes; Tingley, Martin
2015-04-01
Reconstructions of late-Holocene climate rely heavily upon proxies that are assumed to be accurately dated by layer counting, such as measurement on tree rings, ice cores, and varved lake sediments. Considerable advances may be achievable if time uncertain proxies could be included within these multiproxy reconstructions, and if time uncertainties were recognized and correctly modeled for proxies commonly treated as free of age model errors. Current approaches to accounting for time uncertainty are generally limited to repeating the reconstruction using each of an ensemble of age models, thereby inflating the final estimated uncertainty - in effect, each possible age model is given equal weighting. Uncertainties can be reduced by exploiting the inferred space-time covariance structure of the climate to re-weight the possible age models. Here we demonstrate how Bayesian Hierarchical climate reconstruction models can be augmented to account for time uncertain proxies. Critically, while a priori all age models are given equal probability of being correct, the probabilities associated with the age models are formally updated within the Bayesian framework, thereby reducing uncertainties. Numerical experiments show that updating the age model probabilities decreases uncertainty in the climate reconstruction, as compared with the current de-facto standard of sampling over all age models, provided there is sufficient information from other data sources in the region of the time-uncertain proxy. This approach can readily be generalized to non-layer counted proxies, such as those derived from marine sediments. Werner and Tingley, Climate of the Past Discussions (2014)
NASA Astrophysics Data System (ADS)
Werner, J. P.; Tingley, M. P.
2015-03-01
Reconstructions of the late-Holocene climate rely heavily upon proxies that are assumed to be accurately dated by layer counting, such as measurements of tree rings, ice cores, and varved lake sediments. Considerable advances could be achieved if time-uncertain proxies were able to be included within these multiproxy reconstructions, and if time uncertainties were recognized and correctly modeled for proxies commonly treated as free of age model errors. Current approaches for accounting for time uncertainty are generally limited to repeating the reconstruction using each one of an ensemble of age models, thereby inflating the final estimated uncertainty - in effect, each possible age model is given equal weighting. Uncertainties can be reduced by exploiting the inferred space-time covariance structure of the climate to re-weight the possible age models. Here, we demonstrate how Bayesian hierarchical climate reconstruction models can be augmented to account for time-uncertain proxies. Critically, although a priori all age models are given equal probability of being correct, the probabilities associated with the age models are formally updated within the Bayesian framework, thereby reducing uncertainties. Numerical experiments show that updating the age model probabilities decreases uncertainty in the resulting reconstructions, as compared with the current de facto standard of sampling over all age models, provided there is sufficient information from other data sources in the spatial region of the time-uncertain proxy. This approach can readily be generalized to non-layer-counted proxies, such as those derived from marine sediments.
NASA Astrophysics Data System (ADS)
Werner, J. P.; Tingley, M. P.
2014-12-01
Reconstructions of late-Holocene climate rely heavily upon proxies that are assumed to be accurately dated by layer counting, such as measurement on tree rings, ice cores, and varved lake sediments. Considerable advances may be achievable if time uncertain proxies could be included within these multiproxy reconstructions, and if time uncertainties were recognized and correctly modeled for proxies commonly treated as free of age model errors. Current approaches to accounting for time uncertainty are generally limited to repeating the reconstruction using each of an ensemble of age models, thereby inflating the final estimated uncertainty - in effect, each possible age model is given equal weighting. Uncertainties can be reduced by exploiting the inferred space-time covariance structure of the climate to re-weight the possible age models. Here we demonstrate how Bayesian Hierarchical climate reconstruction models can be augmented to account for time uncertain proxies. Critically, while a priori all age models are given equal probability of being correct, the probabilities associated with the age models are formally updated within the Bayesian framework, thereby reducing uncertainties. Numerical experiments show that updating the age-model probabilities decreases uncertainty in the climate reconstruction, as compared with the current de-facto standard of sampling over all age models, provided there is sufficient information from other data sources in the region of the time-uncertain proxy. This approach can readily be generalized to non-layer counted proxies, such as those derived from marine sediments.
D. L. Kelly
2007-06-01
Markov chain Monte Carlo (MCMC) techniques represent an extremely flexible and powerful approach to Bayesian modeling. This work illustrates the application of such techniques to time-dependent reliability of components with repair. The WinBUGS package is used to illustrate, via examples, how Bayesian techniques can be used for parametric statistical modeling of time-dependent component reliability. Additionally, the crucial, but often overlooked subject of model validation is discussed, and summary statistics for judging the model’s ability to replicate the observed data are developed, based on the posterior predictive distribution for the parameters of interest.
Bayesian model selection for a finite element model of a large civil aircraft
Hemez, F. M.; Rutherford, A. C.
2004-01-01
Nine aircraft stiffness parameters have been varied and used as inputs to a finite element model of an aircraft to generate natural frequency and deflection features (Goge, 2003). This data set (147 input parameter configurations and associated outputs) is now used to generate a metamodel, or a fast running surrogate model, using Bayesian model selection methods. Once a forward relationship is defined, the metamodel may be used in an inverse sense. That is, knowing the measured output frequencies and deflections, what were the input stiffness parameters that caused them?
Tang, An-Min; Tang, Nian-Sheng
2015-02-28
We propose a semiparametric multivariate skew-normal joint model for multivariate longitudinal and multivariate survival data. One main feature of the posited model is that we relax the commonly used normality assumption for random effects and within-subject error by using a centered Dirichlet process prior to specify the random effects distribution and using a multivariate skew-normal distribution to specify the within-subject error distribution and model trajectory functions of longitudinal responses semiparametrically. A Bayesian approach is proposed to simultaneously obtain Bayesian estimates of unknown parameters, random effects and nonparametric functions by combining the Gibbs sampler and the Metropolis-Hastings algorithm. Particularly, a Bayesian local influence approach is developed to assess the effect of minor perturbations to within-subject measurement error and random effects. Several simulation studies and an example are presented to illustrate the proposed methodologies.
Simplifying Probability Elicitation and Uncertainty Modeling in Bayesian Networks
Paulson, Patrick R; Carroll, Thomas E; Sivaraman, Chitra; Neorr, Peter A; Unwin, Stephen D; Hossain, Shamina S
2011-04-16
In this paper we contribute two methods that simplify the demands of knowledge elicitation for particular types of Bayesian networks. The first method simplify the task of providing probabilities when the states that a random variable takes can be described by a new, fully ordered state set in which a state implies all the preceding states. The second method leverages Dempster-Shafer theory of evidence to provide a way for the expert to express the degree of ignorance that they feel about the estimates being provided.
Bayesian conditional-independence modeling of the AIDS epidemic in England and Wales
NASA Astrophysics Data System (ADS)
Gilks, Walter R.; De Angelis, Daniela; Day, Nicholas E.
We describe the use of conditional-independence modeling, Bayesian inference and Markov chain Monte Carlo, to model and project the HIV-AIDS epidemic in homosexual/bisexual males in England and Wales. Complexity in this analysis arises through selectively missing data, indirectly observed underlying processes, and measurement error. Our emphasis is on presentation and discussion of the concepts, not on the technicalities of this analysis, which can be found elsewhere [D. De Angelis, W.R. Gilks, N.E. Day, Bayesian projection of the the acquired immune deficiency syndrome epidemic (with discussion), Applied Statistics, in press].
Parameter Expanded Algorithms for Bayesian Latent Variable Modeling of Genetic Pleiotropy Data.
Xu, Lizhen; Craiu, Radu V; Sun, Lei; Paterson, Andrew D
2016-01-01
Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes. The models studied here can incorporate both continuous and binary responses, and can account for serial and cluster correlations. We consider Bayesian estimation for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample from the posterior distribution. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to aa association study of various complication outcomes related to type 1 diabetes. This article has supplementary material online.
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes
Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D.
2016-01-01
This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. PMID:26993062
Model Diagnostics for Bayesian Networks. Research Report. ETS RR-04-17
ERIC Educational Resources Information Center
Sinharay, Sandip
2004-01-01
Assessing fit of psychometric models has always been an issue of enormous interest, but there exists no unanimously agreed upon item fit diagnostic for the models. Bayesian networks, frequently used in educational assessments (see, for example, Mislevy, Almond, Yan, & Steinberg, 2001) primarily for learning about students' knowledge and…
A Bayesian Multi-Level Factor Analytic Model of Consumer Price Sensitivities across Categories
ERIC Educational Resources Information Center
Duvvuri, Sri Devi; Gruca, Thomas S.
2010-01-01
Identifying price sensitive consumers is an important problem in marketing. We develop a Bayesian multi-level factor analytic model of the covariation among household-level price sensitivities across product categories that are substitutes. Based on a multivariate probit model of category incidence, this framework also allows the researcher to…
ERIC Educational Resources Information Center
Kessler, Lawrence M.
2013-01-01
In this paper I propose Bayesian estimation of a nonlinear panel data model with a fractional dependent variable (bounded between 0 and 1). Specifically, I estimate a panel data fractional probit model which takes into account the bounded nature of the fractional response variable. I outline estimation under the assumption of strict exogeneity as…
A Test of Bayesian Observer Models of Processing in the Eriksen Flanker Task
ERIC Educational Resources Information Center
White, Corey N.; Brown, Scott; Ratcliff, Roger
2012-01-01
Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to…
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Baird, Stuart J E; Santos, Filipe
2010-09-01
Approximate Bayesian computation (ABC) substitutes simulation for analytic models in Bayesian inference. Simulating evolutionary scenarios under Kimura's stepping stone model (KSS) might therefore allow inference over spatial genetic process where analytical results are difficult to obtain. ABC first creates a reference set of simulations and would proceed by comparing summary statistics over KSS simulations to summary statistics from localities sampled in the field, but: comparison of which localities and stepping stones? Identical stepping stones can be arranged so two localities fall in the same stepping stone, nearest or diagonal neighbours, or without contact. None is intrinsically correct, yet some choice must be made and this affects inference. We explore a Bayesian strategy for mapping field observations onto discrete stepping stones. We make Sundial, for projecting field data onto the plane, available. We generalize KSS over regular tilings of the plane. We show Bayesian averaging over the mapping between a continuous field area and discrete stepping stones improves the fit between KSS and isolation by distance expectations. We make Tiler Durden available for carrying out this Bayesian averaging. We describe a novel parameterization of KSS based on Wright's neighbourhood size, placing an upper bound on the geographic area represented by a stepping stone and make it available as m Vector. We generalize spatial coalescence recursions to continuous and discrete space cases and use these to numerically solve for KSS coalescence previously examined only using simulation. We thus provide applied and analytical resources for comparison of stepping stone simulations with field observations.
Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Van Leemput, Koen
2012-01-01
Many successful segmentation algorithms are based on Bayesian models in which prior anatomical knowledge is combined with the available image information. However, these methods typically have many free parameters that are estimated to obtain point estimates only, whereas a faithful Bayesian analysis would also consider all possible alternate values these parameters may take. In this paper, we propose to incorporate the uncertainty of the free parameters in Bayesian segmentation models more accurately by using Monte Carlo sampling. We demonstrate our technique by sampling atlas warps in a recent method for hippocampal subfield segmentation, and show a significant improvement in an Alzheimer's disease classification task. As an additional benefit, the method also yields informative "error bars" on the segmentation results for each of the individual sub-structures.
NASA Astrophysics Data System (ADS)
Xu, Tianfang; Valocchi, Albert J.
2015-11-01
Numerical groundwater flow and solute transport models are usually subject to model structural error due to simplification and/or misrepresentation of the real system, which raises questions regarding the suitability of conventional least squares regression-based (LSR) calibration. We present a new framework that explicitly describes the model structural error statistically in an inductive, data-driven way. We adopt a fully Bayesian approach that integrates Gaussian process error models into the calibration, prediction, and uncertainty analysis of groundwater flow models. We test the usefulness of the fully Bayesian approach with a synthetic case study of the impact of pumping on surface-ground water interaction. We illustrate through this example that the Bayesian parameter posterior distributions differ significantly from parameters estimated by conventional LSR, which does not account for model structural error. For the latter method, parameter compensation for model structural error leads to biased, overconfident prediction under changing pumping condition. In contrast, integrating Gaussian process error models significantly reduces predictive bias and leads to prediction intervals that are more consistent with validation data. Finally, we carry out a generalized LSR recalibration step to assimilate the Bayesian prediction while preserving mass conservation and other physical constraints, using a full error covariance matrix obtained from Bayesian results. It is found that the recalibrated model achieved lower predictive bias compared to the model calibrated using conventional LSR. The results highlight the importance of explicit treatment of model structural error especially in circumstances where subsequent decision-making and risk analysis require accurate prediction and uncertainty quantification.
Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?
NASA Astrophysics Data System (ADS)
Sahlin, Ullrika
2015-07-01
A prediction of a chemical property or activity is subject to uncertainty. Which type of uncertainties to consider, whether to account for them in a differentiated manner and with which methods, depends on the practical context. In chemical modelling, general guidance of the assessment of uncertainty is hindered by the high variety in underlying modelling algorithms, high-dimensionality problems, the acknowledgement of both qualitative and quantitative dimensions of uncertainty, and the fact that statistics offers alternative principles for uncertainty quantification. Here, a view of the assessment of uncertainty in predictions is presented with the aim to overcome these issues. The assessment sets out to quantify uncertainty representing error in predictions and is based on probability modelling of errors where uncertainty is measured by Bayesian probabilities. Even though well motivated, the choice to use Bayesian probabilities is a challenge to statistics and chemical modelling. Fully Bayesian modelling, Bayesian meta-modelling and bootstrapping are discussed as possible approaches. Deciding how to assess uncertainty is an active choice, and should not be constrained by traditions or lack of validated and reliable ways of doing it.
Automated parameter estimation for biological models using Bayesian statistical model checking
2015-01-01
Background Probabilistic models have gained widespread acceptance in the systems biology community as a useful way to represent complex biological systems. Such models are developed using existing knowledge of the structure and dynamics of the system, experimental observations, and inferences drawn from statistical analysis of empirical data. A key bottleneck in building such models is that some system variables cannot be measured experimentally. These variables are incorporated into the model as numerical parameters. Determining values of these parameters that justify existing experiments and provide reliable predictions when model simulations are performed is a key research problem. Domain experts usually estimate the values of these parameters by fitting the model to experimental data. Model fitting is usually expressed as an optimization problem that requires minimizing a cost-function which measures some notion of distance between the model and the data. This optimization problem is often solved by combining local and global search methods that tend to perform well for the specific application domain. When some prior information about parameters is available, methods such as Bayesian inference are commonly used for parameter learning. Choosing the appropriate parameter search technique requires detailed domain knowledge and insight into the underlying system. Results Using an agent-based model of the dynamics of acute inflammation, we demonstrate a novel parameter estimation algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide that guarantee a set of observed clinical outcomes with high probability. We synthesized values of twenty-eight unknown parameters such that the parameterized model instantiated with these parameter values satisfies four specifications describing the dynamic behavior of the model. Conclusions We have developed a new algorithmic technique for discovering parameters in complex stochastic models of
NASA Astrophysics Data System (ADS)
Pham, Hai V.; Tsai, Frank T.-C.
2015-09-01
The lack of hydrogeological data and knowledge often results in different propositions (or alternatives) to represent uncertain model components and creates many candidate groundwater models using the same data. Uncertainty of groundwater head prediction may become unnecessarily high. This study introduces an experimental design to identify propositions in each uncertain model component and decrease the prediction uncertainty by reducing conceptual model uncertainty. A discrimination criterion is developed based on posterior model probability that directly uses data to evaluate model importance. Bayesian model averaging (BMA) is used to predict future observation data. The experimental design aims to find the optimal number and location of future observations and the number of sampling rounds such that the desired discrimination criterion is met. Hierarchical Bayesian model averaging (HBMA) is adopted to assess if highly probable propositions can be identified and the conceptual model uncertainty can be reduced by the experimental design. The experimental design is implemented to a groundwater study in the Baton Rouge area, Louisiana. We design a new groundwater head observation network based on existing USGS observation wells. The sources of uncertainty that create multiple groundwater models are geological architecture, boundary condition, and fault permeability architecture. All possible design solutions are enumerated using a multi-core supercomputer. Several design solutions are found to achieve an 80%-identifiable groundwater model in 5 years by using six or more existing USGS wells. The HBMA result shows that each highly probable proposition can be identified for each uncertain model component once the discrimination criterion is achieved. The variances of groundwater head predictions are significantly decreased by reducing posterior model probabilities of unimportant propositions.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
A Bayesian approach to the semi-analytic model of galaxy formation
NASA Astrophysics Data System (ADS)
Lu, Yu
It is believed that a wide range of physical processes conspire to shape the observed galaxy population but it remains unsure of their detailed interactions. The semi-analytic model (SAM) of galaxy formation uses multi-dimensional parameterizations of the physical processes of galaxy formation and provides a tool to constrain these underlying physical interactions. Because of the high dimensionality and large uncertainties in the model, the parametric problem of galaxy formation can be profitably tackled with a Bayesian-inference based approach, which allows one to constrain theory with data in a statistically rigorous way. In this thesis, I present a newly developed method to build SAM upon the framework of Bayesian inference. I show that, aided by advanced Markov-Chain Monte-Carlo algorithms, the method has the power to efficiently combine information from diverse data sources, rigorously establish confidence bounds on model parameters, and provide powerful probability-based methods for hypothesis test. Using various data sets (stellar mass function, conditional stellar mass function, K-band luminosity function, and cold gas mass functions) of galaxies in the local Universe, I carry out a series of Bayesian model inferences. The results show that SAM contains huge degeneracies among its parameters, indicating that some of the conclusions drawn previously with the conventional approach may not be truly valid but need to be revisited by the Bayesian approach. Second, some of the degeneracy of the model can be broken by adopting multiple data sets that constrain different aspects of the galaxy population. Third, the inferences reveal that model has challenge to simultaneously explain some important observational results, suggesting that some key physics governing the evolution of star formation and feedback may still be missing from the model. These analyses show clearly that the Bayesian inference based SAM can be used to perform systematic and statistically
Likelihood-free Bayesian computation for structural model calibration: a feasibility study
NASA Astrophysics Data System (ADS)
Jin, Seung-Seop; Jung, Hyung-Jo
2016-04-01
Finite element (FE) model updating is often used to associate FE models with corresponding existing structures for the condition assessment. FE model updating is an inverse problem and prone to be ill-posed and ill-conditioning when there are many errors and uncertainties in both an FE model and its corresponding measurements. In this case, it is important to quantify these uncertainties properly. Bayesian FE model updating is one of the well-known methods to quantify parameter uncertainty by updating our prior belief on the parameters with the available measurements. In Bayesian inference, likelihood plays a central role in summarizing the overall residuals between model predictions and corresponding measurements. Therefore, likelihood should be carefully chosen to reflect the characteristics of the residuals. It is generally known that very little or no information is available regarding the statistical characteristics of the residuals. In most cases, the likelihood is assumed to be the independent identically distributed Gaussian distribution with the zero mean and constant variance. However, this assumption may cause biased and over/underestimated estimates of parameters, so that the uncertainty quantification and prediction are questionable. To alleviate the potential misuse of the inadequate likelihood, this study introduced approximate Bayesian computation (i.e., likelihood-free Bayesian inference), which relaxes the need for an explicit likelihood by analyzing the behavior similarities between model predictions and measurements. We performed FE model updating based on likelihood-free Markov chain Monte Carlo (MCMC) without using the likelihood. Based on the result of the numerical study, we observed that the likelihood-free Bayesian computation can quantify the updating parameters correctly and its predictive capability for the measurements, not used in calibrated, is also secured.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio (PR) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95%CI: 1.005-1.265), 1.128(95%CI: 1.001-1.264) and 1.132(95%CI: 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95% CI: 1.055-1.206) and 1.126(95% CI: 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR, which was 1.125 (95%CI: 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR. Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Bayesian Modeling of Population Variability -- Practical Guidance and Pitfalls
Dana L. Kelly; Corwin L. Atwood
2008-05-01
With the advent of easy-to-use open-source software for Markov chain Monte Carlo (MCMC) simulation, hierarchical Bayesian analysis is gaining in popularity. This paper presents practical guidance for hierarchical Bayes analysis of typical problems in probabilistic safety assessment (PSA). The guidance is related to choosing parameterizations that accelerate convergence of the MCMC sampling and to illustrating the potential sensitivity of the results to the functional form chosen for the first-stage prior. This latter issue has significant ramifications because the mean of the average population variability curve (PVC) from hierarchical Bayes (or the mean of the point estimate distribution from empirical Bayes) can be very sensitive to this choice in cases where variability is large. Numerical examples are provided to illustrate the issues discussed.
Using Bayesian Model Selection to Characterize Neonatal Eeg Recordings
NASA Astrophysics Data System (ADS)
Mitchell, Timothy J.
2009-12-01
The brains of premature infants must undergo significant maturation outside of the womb and are thus particularly susceptible to injury. Electroencephalographic (EEG) recordings are an important diagnostic tool in determining if a newborn's brain is functioning normally or if injury has occurred. However, interpreting the recordings is difficult and requires the skills of a trained electroencephelographer. Because these EEG specialists are rare, an automated interpretation of newborn EEG recordings would increase access to an important diagnostic tool for physicians. To automate this procedure, we employ Bayesian probability theory to compute the posterior probability for the EEG features of interest and use the results in a program designed to mimic EEG specialists. Specifically, we will be identifying waveforms of varying frequency and amplitude, as well as periods of flat recordings where brain activity is minimal.
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].
Dynamical modelling of NGC 6809: selecting the best model using Bayesian inference
NASA Astrophysics Data System (ADS)
Diakogiannis, Foivos I.; Lewis, Geraint F.; Ibata, Rodrigo A.
2014-02-01
The precise cosmological origin of globular clusters remains uncertain, a situation hampered by the struggle of observational approaches in conclusively identifying the presence, or not, of dark matter in these systems. In this paper, we address this question through an analysis of the particular case of NGC 6809. While previous studies have performed dynamical modelling of this globular cluster using a small number of available kinematic data, they did not perform appropriate statistical inference tests for the choice of best model description; such statistical inference for model selection is important since, in general, different models can result in significantly different inferred quantities. With the latest kinematic data, we use Bayesian inference tests for model selection and thus obtain the best-fitting models, as well as mass and dynamic mass-to-light ratio estimates. For this, we introduce a new likelihood function that provides more constrained distributions for the defining parameters of dynamical models. Initially, we consider models with a known distribution function, and then model the cluster using solutions of the spherically symmetric Jeans equation; this latter approach depends upon the mass density profile and anisotropy β parameter. In order to find the best description for the cluster we compare these models by calculating their Bayesian evidence. We find smaller mass and dynamic mass-to-light ratio values than previous studies, with the best-fitting Michie model for a constant mass-to-light ratio of Upsilon = 0.90^{+0.14}_{-0.14} and M_{dyn}=6.10^{+0.51}_{-0.88} × 10^4 M_{{⊙}}. We exclude the significant presence of dark matter throughout the cluster, showing that no physically motivated distribution of dark matter can be present away from the cluster core.
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach
Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong
2015-01-01
Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality. PMID:26958200
ERIC Educational Resources Information Center
Tchumtchoua, Sylvie; Dey, Dipak K.
2012-01-01
This paper proposes a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the…
Hierarchical Bayesian Model (HBM) - Derived Estimates of Air Quality for 2007: Annual Report
This report describes EPA's Hierarchical Bayesian model generated (HBM) estimates of ozone (O_{3}) and fine particulate matter (PM_{2.5} particles with aerodynamic diameter < 2.5 microns)concentrations throughout the continental United States during the 2007 calen...
The Bayesian Evaluation of Categorization Models: Comment on Wills and Pothos (2012)
ERIC Educational Resources Information Center
Vanpaemel, Wolf; Lee, Michael D.
2012-01-01
Wills and Pothos (2012) reviewed approaches to evaluating formal models of categorization, raising a series of worthwhile issues, challenges, and goals. Unfortunately, in discussing these issues and proposing solutions, Wills and Pothos (2012) did not consider Bayesian methods in any detail. This means not only that their review excludes a major…
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan; Cai, Jing-Heng
2010-01-01
Analysis of ordered binary and unordered binary data has received considerable attention in social and psychological research. This article introduces a Bayesian approach, which has several nice features in practical applications, for analyzing nonlinear structural equation models with dichotomous data. We demonstrate how to use the software…
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2004 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2004 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory
ERIC Educational Resources Information Center
Muthen, Bengt; Asparouhov, Tihomir
2012-01-01
This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed…
Bayesian Analysis of Structural Equation Models with Nonlinear Covariates and Latent Variables
ERIC Educational Resources Information Center
Song, Xin-Yuan; Lee, Sik-Yum
2006-01-01
In this article, we formulate a nonlinear structural equation model (SEM) that can accommodate covariates in the measurement equation and nonlinear terms of covariates and exogenous latent variables in the structural equation. The covariates can come from continuous or discrete distributions. A Bayesian approach is developed to analyze the…
A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods
ERIC Educational Resources Information Center
Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich
2013-01-01
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…
ERIC Educational Resources Information Center
Wang, Qiu; Diemer, Matthew A.; Maier, Kimberly S.
2013-01-01
This study integrated Bayesian hierarchical modeling and receiver operating characteristic analysis (BROCA) to evaluate how interest strength (IS) and interest differentiation (ID) predicted low–socioeconomic status (SES) youth's interest-major congruence (IMC). Using large-scale Kuder Career Search online-assessment data, this study fit three…
Bayesian Analysis for Linearized Multi-Stage Models in Quantal Bioassay.
ERIC Educational Resources Information Center
Kuo, Lynn; Cohen, Michael P.
Bayesian methods for estimating dose response curves in quantal bioassay are studied. A linearized multi-stage model is assumed for the shape of the curves. A Gibbs sampling approach with data augmentation is employed to compute the Bayes estimates. In addition, estimation of the "relative additional risk" and the "risk specific…
Pretense, Counterfactuals, and Bayesian Causal Models: Why What Is Not Real Really Matters
ERIC Educational Resources Information Center
Weisberg, Deena S.; Gopnik, Alison
2013-01-01
Young children spend a large portion of their time pretending about non-real situations. Why? We answer this question by using the framework of Bayesian causal models to argue that pretending and counterfactual reasoning engage the same component cognitive abilities: disengaging with current reality, making inferences about an alternative…
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2002– Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2002 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2001 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2001 calendar year. HBM estimates provide the spatial and temporal variance of O_{ 3}...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2003 – Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2003 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2005 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2005 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM) - Derived Estimates of Air Quality for 2008: Annual Report
This report describes EPA’s Hierarchical Bayesian model generated (HBM) estimates of ozone (O_{3}) and fine particulate matter (PM_{2.5}, particles with aerodynamic diameter < 2.5 microns) concentrations throughout the continental United States during the 2007 ca...
ERIC Educational Resources Information Center
Bekele, Rahel; McPherson, Maggie
2011-01-01
This research work presents a Bayesian Performance Prediction Model that was created in order to determine the strength of personality traits in predicting the level of mathematics performance of high school students in Addis Ababa. It is an automated tool that can be used to collect information from students for the purpose of effective group…
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2006 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2006 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
A Robust Bayesian Approach for Structural Equation Models with Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum; Xia, Ye-Mao
2008-01-01
In this paper, normal/independent distributions, including but not limited to the multivariate t distribution, the multivariate contaminated distribution, and the multivariate slash distribution, are used to develop a robust Bayesian approach for analyzing structural equation models with complete or missing data. In the context of a nonlinear…
Bayesian Inference for Growth Mixture Models with Latent Class Dependent Missing Data
ERIC Educational Resources Information Center
Lu, Zhenqiu Laura; Zhang, Zhiyong; Lubke, Gitta
2011-01-01
"Growth mixture models" (GMMs) with nonignorable missing data have drawn increasing attention in research communities but have not been fully studied. The goal of this article is to propose and to evaluate a Bayesian method to estimate the GMMs with latent class dependent missing data. An extended GMM is first presented in which class…
Xu, Chengcheng; Wang, Wei; Liu, Pan; Li, Zhibin
2015-12-01
This study aimed to develop a real-time crash risk model with limited data in China by using Bayesian meta-analysis and Bayesian inference approach. A systematic review was first conducted by using three different Bayesian meta-analyses, including the fixed effect meta-analysis, the random effect meta-analysis, and the meta-regression. The meta-analyses provided a numerical summary of the effects of traffic variables on crash risks by quantitatively synthesizing results from previous studies. The random effect meta-analysis and the meta-regression produced a more conservative estimate for the effects of traffic variables compared with the fixed effect meta-analysis. Then, the meta-analyses results were used as informative priors for developing crash risk models with limited data. Three different meta-analyses significantly affect model fit and prediction accuracy. The model based on meta-regression can increase the prediction accuracy by about 15% as compared to the model that was directly developed with limited data. Finally, the Bayesian predictive densities analysis was used to identify the outliers in the limited data. It can further improve the prediction accuracy by 5.0%.
A Bayesian modification to the Jelinski-Moranda software reliability growth model
NASA Technical Reports Server (NTRS)
Littlewood, B.; Sofer, A.
1983-01-01
The Jelinski-Moranda (JM) model for software reliability was examined. It is suggested that a major reason for the poor results given by this model is the poor performance of the maximum likelihood method (ML) of parameter estimation. A reparameterization and Bayesian analysis, involving a slight modelling change, are proposed. It is shown that this new Bayesian-Jelinski-Moranda model (BJM) is mathematically quite tractable, and several metrics of interest to practitioners are obtained. The BJM and JM models are compared by using several sets of real software failure data collected and in all cases the BJM model gives superior reliability predictions. A change in the assumption which underlay both models to present the debugging process more accurately is discussed.
NASA Astrophysics Data System (ADS)
Chen, X.; Hao, Z.; Devineni, N.; Lall, U.
2013-09-01
A Hierarchal Bayesian model for forecasting regional summer rainfall and streamflow season-ahead using exogenous climate variables for East Central China is presented. The model provides estimates of the posterior forecasted probability distribution for 12 rainfall and 2 streamflow stations considering parameter uncertainty, and cross-site correlation. The model has a multilevel structure with regression coefficients modeled from a common multivariate normal distribution results in partial-pooling of information across multiple stations and better representation of parameter and posterior distribution uncertainty. Covariance structure of the residuals across stations is explicitly modeled. Model performance is tested under leave-10-out cross-validation. Frequentist and Bayesian performance metrics used include Receiver Operating Characteristic, Reduction of Error, Coefficient of Efficiency, Rank Probability Skill Scores, and coverage by posterior credible intervals. The ability of the model to reliably forecast regional summer rainfall and streamflow season-ahead offers potential for developing adaptive water risk management strategies.
NASA Astrophysics Data System (ADS)
Chen, X.; Hao, Z.; Devineni, N.; Lall, U.
2014-04-01
A Hierarchal Bayesian model is presented for one season-ahead forecasts of summer rainfall and streamflow using exogenous climate variables for east central China. The model provides estimates of the posterior forecasted probability distribution for 12 rainfall and 2 streamflow stations considering parameter uncertainty, and cross-site correlation. The model has a multi-level structure with regression coefficients modeled from a common multi-variate normal distribution resulting in partial pooling of information across multiple stations and better representation of parameter and posterior distribution uncertainty. Covariance structure of the residuals across stations is explicitly modeled. Model performance is tested under leave-10-out cross-validation. Frequentist and Bayesian performance metrics used include receiver operating characteristic, reduction of error, coefficient of efficiency, rank probability skill scores, and coverage by posterior credible intervals. The ability of the model to reliably forecast season-ahead regional summer rainfall and streamflow offers potential for developing adaptive water risk management strategies.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of
ERIC Educational Resources Information Center
Hinkle, Dennis; Houston, Charles A.
The purpose of this study was to present and evaluate Bayesian-type models for estimating probabilities of program completion and for predicting first quarter grade point averages of community college students entering certain allied health fields. Two Bayesian models were tested. Bayesian Model 1--Estimating Probabilities of Program…
Bayesian model selection validates a biokinetic model for zirconium processing in humans
2012-01-01
Background In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection. Results We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler. Conclusions In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology. PMID:22863152
ERIC Educational Resources Information Center
Rindskopf, David
2012-01-01
Muthen and Asparouhov (2012) made a strong case for the advantages of Bayesian methodology in factor analysis and structural equation models. I show additional extensions and adaptations of their methods and show how non-Bayesians can take advantage of many (though not all) of these advantages by using interval restrictions on parameters. By…
Imprecise (fuzzy) information in geostatistics
Bardossy, A.; Bogardi, I.; Kelly, W.E.
1988-05-01
A methodology based on fuzzy set theory for the utilization of imprecise data in geostatistics is presented. A common problem preventing a broader use of geostatistics has been the insufficient amount of accurate measurement data. In certain cases, additional but uncertain (soft) information is available and can be encoded as subjective probabilities, and then the soft kriging method can be applied (Journal, 1986). In other cases, a fuzzy encoding of soft information may be more realistic and simplify the numerical calculations. Imprecise (fuzzy) spatial information on the possible variogram is integrated into a single variogram which is used in a fuzzy kriging procedure. The overall uncertainty of prediction is represented by the estimation variance and the calculated membership function for each kriged point. The methodology is applied to the permeability prediction of a soil liner for hazardous waste containment. The available number of hard measurement data (20) was not enough for a classical geostatistical analysis. An additional 20 soft data made it possible to prepare kriged contour maps using the fuzzy geostatistical procedure.
Bayesian Network Model with Application to Smart Power Semiconductor Lifetime Data.
Plankensteiner, Kathrin; Bluder, Olivia; Pilz, Jürgen
2015-09-01
In this article, Bayesian networks are used to model semiconductor lifetime data obtained from a cyclic stress test system. The data of interest are a mixture of log-normal distributions, representing two dominant physical failure mechanisms. Moreover, the data can be censored due to limited test resources. For a better understanding of the complex lifetime behavior, interactions between test settings, geometric designs, material properties, and physical parameters of the semiconductor device are modeled by a Bayesian network. Statistical toolboxes in MATLAB® have been extended and applied to find the best structure of the Bayesian network and to perform parameter learning. Due to censored observations Markov chain Monte Carlo (MCMC) simulations are employed to determine the posterior distributions. For model selection the automatic relevance determination (ARD) algorithm and goodness-of-fit criteria such as marginal likelihoods, Bayes factors, posterior predictive density distributions, and sum of squared errors of prediction (SSEP) are applied and evaluated. The results indicate that the application of Bayesian networks to semiconductor reliability provides useful information about the interactions between the significant covariates and serves as a reliable alternative to currently applied methods.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Equifinality of formal (DREAM) and informal (GLUE) bayesian approaches in hydrologic modeling?
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F; Gupta, Hoshin V
2008-01-01
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented using the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.
Number-knower levels in young children: insights from Bayesian modeling.
Lee, Michael D; Sarnecka, Barbara W
2011-09-01
Lee and Sarnecka (2010) developed a Bayesian model of young children's behavior on the Give-N test of number knowledge. This paper presents two new extensions of the model, and applies the model to new data. In the first extension, the model is used to evaluate competing theories about the conceptual knowledge underlying children's behavior. One, the knower-levels theory, is basically a "stage" theory involving real conceptual change. The other, the approximate-meanings theory, assumes that the child's conceptual knowledge is relatively constant, although performance improves over time. In the second extension, the model is used to ask whether the same latent psychological variable (a child's number-knower level) can simultaneously account for behavior on two tasks (the Give-N task and the Fast-Cards task) with different performance demands. Together, these two demonstrations show the potential of the Bayesian modeling approach to improve our understanding of the development of human cognition.
Bayesian spatio-temporal modeling of particulate matter concentrations in Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Manga, Edna; Awang, Norhashidah
2016-06-01
This article presents an application of a Bayesian spatio-temporal Gaussian process (GP) model on particulate matter concentrations from Peninsular Malaysia. We analyze daily PM10 concentration levels from 35 monitoring sites in June and July 2011. The spatiotemporal model set in a Bayesian hierarchical framework allows for inclusion of informative covariates, meteorological variables and spatiotemporal interactions. Posterior density estimates of the model parameters are obtained by Markov chain Monte Carlo methods. Preliminary data analysis indicate information on PM10 levels at sites classified as industrial locations could explain part of the space time variations. We include the site-type indicator in our modeling efforts. Results of the parameter estimates for the fitted GP model show significant spatio-temporal structure and positive effect of the location-type explanatory variable. We also compute some validation criteria for the out of sample sites that show the adequacy of the model for predicting PM10 at unmonitored sites.
NASA Astrophysics Data System (ADS)
Tsai, Frank T.-C.; Elshall, Ahmed S.
2013-09-01
Analysts are often faced with competing propositions for each uncertain model component. How can we judge that we select a correct proposition(s) for an uncertain model component out of numerous possible propositions? We introduce the hierarchical Bayesian model averaging (HBMA) method as a multimodel framework for uncertainty analysis. The HBMA allows for segregating, prioritizing, and evaluating different sources of uncertainty and their corresponding competing propositions through a hierarchy of BMA models that forms a BMA tree. We apply the HBMA to conduct uncertainty analysis on the reconstructed hydrostratigraphic architectures of the Baton Rouge aquifer-fault system, Louisiana. Due to uncertainty in model data, structure, and parameters, multiple possible hydrostratigraphic models are produced and calibrated as base models. The study considers four sources of uncertainty. With respect to data uncertainty, the study considers two calibration data sets. With respect to model structure, the study considers three different variogram models, two geological stationarity assumptions and two fault conceptualizations. The base models are produced following a combinatorial design to allow for uncertainty segregation. Thus, these four uncertain model components with their corresponding competing model propositions result in 24 base models. The results show that the systematic dissection of the uncertain model components along with their corresponding competing propositions allows for detecting the robust model propositions and the major sources of uncertainty.
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets
2015-01-01
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user’s own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery. PMID:25994950
A Bayesian estimation of a stochastic predator-prey model of economic fluctuations
NASA Astrophysics Data System (ADS)
Dibeh, Ghassan; Luchinsky, Dmitry G.; Luchinskaya, Daria D.; Smelyanskiy, Vadim N.
2007-06-01
In this paper, we develop a Bayesian framework for the empirical estimation of the parameters of one of the best known nonlinear models of the business cycle: The Marx-inspired model of a growth cycle introduced by R. M. Goodwin. The model predicts a series of closed cycles representing the dynamics of labor's share and the employment rate in the capitalist economy. The Bayesian framework is used to empirically estimate a modified Goodwin model. The original model is extended in two ways. First, we allow for exogenous periodic variations of the otherwise steady growth rates of the labor force and productivity per worker. Second, we allow for stochastic variations of those parameters. The resultant modified Goodwin model is a stochastic predator-prey model with periodic forcing. The model is then estimated using a newly developed Bayesian estimation method on data sets representing growth cycles in France and Italy during the years 1960-2005. Results show that inference of the parameters of the stochastic Goodwin model can be achieved. The comparison of the dynamics of the Goodwin model with the inferred values of parameters demonstrates quantitative agreement with the growth cycle empirical data.
Lessons Learned from a Past Series of Bayesian Model Averaging studies for Soil/Plant Models
NASA Astrophysics Data System (ADS)
Nowak, Wolfgang; Wöhling, Thomas; Schöniger, Anneli
2015-04-01
In this study we evaluate the lessons learned about modelling soil/plant systems from analyzing evapotranspiration data, soil moisture and leaf area index. The data were analyzed with advanced tools from the area of Bayesian Model Averaging, model ranking and Bayesian Model Selection. We have generated a large variety of model conceptualizations by sampling random parameter sets from the vegetation components of the CERES, SUCROS, GECROS, and SPASS models and a common model for soil water movement via Monte-Carlo simulations. We used data from a one vegetation period of winter wheat at a field site in Nellingen, Germany. The data set includes soil moisture, actual evapotranspiration (ETa) from an eddy covariance tower, and leaf-area index (LAI). The focus of data analysis was on how one can do model ranking and model selection. Further analysis steps included the predictive reliability of different soil/plant models calibrated on different subsets of the available data. Our main conclusion is that model selection between different competing soil-plant models remains a large challenge, because 1. different data types and their combinations favor different models, because competing models are more or less good in simulating the coupling processes between the various compartments and their states, 2. singular events (such as the evolution of LAI during plant senescence) can dominate an entire time series, and long time series can be represented well by the few data values where the models disagree most, 3. the different data types differ in their discriminating power for model selection, 4. the level of noise present in ETa and LAI data, and the level of systematic model bias through simplifications of the complex system (e.g., assuming a few internally homogeneous soil layers) substantially reduce the confidence in model ranking and model selection, 5. none of the models withstands a hypothesis test against the available data, 6. even the assumed level of measurement
NASA Astrophysics Data System (ADS)
Gilet, Estelle; Diard, Julien; Palluel-Germain, Richard; Bessière, Pierre
2011-03-01
This paper is about modeling perception-action loops and, more precisely, the study of the influence of motor knowledge during perception tasks. We use the Bayesian Action-Perception (BAP) model, which deals with the sensorimotor loop involved in reading and writing cursive isolated letters and includes an internal simulation of movement loop. By using this probabilistic model we simulate letter recognition, both with and without internal motor simulation. Comparison of their performance yields an experimental prediction, which we set forth.
NASA Astrophysics Data System (ADS)
Messier, K. P.; Serre, M. L.
2015-12-01
Radon (222Rn) is a naturally occurring chemically inert, colorless, and odorless radioactive gas produced from the decay of uranium (238U), which is ubiquitous in rocks and soils worldwide. Exposure to 222Rn is likely the second leading cause of lung cancer after cigarette smoking via inhalation; however, exposure through untreated groundwater is also a contributing factor to both inhalation and ingestion routes. A land use regression (LUR) model for groundwater 222Rn with anisotropic geological and 238U based explanatory variables is developed, which helps elucidate the factors contributing to elevated 222Rn across North Carolina. Geological and uranium based variables are constructed in elliptical buffers surrounding each observation such that they capture the lateral geometric anisotropy present in groundwater 222Rn. Moreover, geological features are defined at three different geological spatial scales to allow the model to distinguish between large area and small area effects of geology on groundwater 222Rn. The LUR is also integrated into the Bayesian Maximum Entropy (BME) geostatistical framework to increase accuracy and produce a point-level LUR-BME model of groundwater 222Rn across North Carolina including prediction uncertainty. The LUR-BME model of groundwater 222Rn results in a leave-one out cross-validation of 0.46 (Pearson correlation coefficient= 0.68), effectively predicting within the spatial covariance range. Modeled results of 222Rn concentrations show variability among Intrusive Felsic geological formations likely due to average bedrock 238U defined on the basis of overlying stream-sediment 238U concentrations that is a widely distributed consistently analyzed point-source data.
Hydrogeologic unit flow characterization using transition probability geostatistics.
Jones, Norman L; Walker, Justin R; Carle, Steven F
2005-01-01
This paper describes a technique for applying the transition probability geostatistics method for stochastic simulation to a MODFLOW model. Transition probability geostatistics has some advantages over traditional indicator kriging methods including a simpler and more intuitive framework for interpreting geologic relationships and the ability to simulate juxtapositional tendencies such as fining upward sequences. The indicator arrays generated by the transition probability simulation are converted to layer elevation and thickness arrays for use with the new Hydrogeologic Unit Flow package in MODFLOW 2000. This makes it possible to preserve complex heterogeneity while using reasonably sized grids and/or grids with nonuniform cell thicknesses.
Robust video object tracking via Bayesian model averaging-based feature fusion
NASA Astrophysics Data System (ADS)
Dai, Yi; Liu, Bin
2016-08-01
We are concerned with tracking an object of interest in a video stream. We propose an algorithm that is robust against occlusion, the presence of confusing colors, abrupt changes in the object features and changes in scale. We develop the algorithm within a Bayesian modeling framework. The state-space model is used for capturing the temporal correlation in the sequence of frame images by modeling the underlying dynamics of the tracking system. The Bayesian model averaging (BMA) strategy is proposed for fusing multiclue information in the observations. Any number of object features is allowed to be involved in the proposed framework. Every feature represents one source of information to be fused and is associated with an observation model. The state inference is performed by employing the particle filter methods. In comparison with the related approaches, the BMA-based tracker is shown to have robustness, expressivity, and comprehensibility.
Bayesian inference for kinetic models of biotransformation using a generalized rate equation.
Ying, Shanshan; Zhang, Jiangjiang; Zeng, Lingzao; Shi, Jiachun; Wu, Laosheng
2017-03-06
Selecting proper rate equations for the kinetic models is essential to quantify biotransformation processes in the environment. Bayesian model selection method can be used to evaluate the candidate models. However, comparisons of all plausible models can result in high computational cost, while limiting the number of candidate models may lead to biased results. In this work, we developed an integrated Bayesian method to simultaneously perform model selection and parameter estimation by using a generalized rate equation. In the approach, the model hypotheses were represented by discrete parameters and the rate constants were represented by continuous parameters. Then Bayesian inference of the kinetic models was solved by implementing Markov Chain Monte Carlo simulation for parameter estimation with the mixed (i.e., discrete and continuous) priors. The validity of this approach was illustrated through a synthetic case and a nitrogen transformation experimental study. It showed that our method can successfully identify the plausible models and parameters, as well as uncertainties therein. Thus this method can provide a powerful tool to reveal more insightful information for the complex biotransformation processes.
Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models.
Daunizeau, J; Friston, K J; Kiebel, S J
2009-11-01
In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identification or inversion entails the estimation of the marginal likelihood or evidence of a model. This difficult integration problem can be finessed by optimising a free-energy bound on the evidence using results from variational calculus. This yields a deterministic update scheme that optimises an approximation to the posterior density on the unknown model variables. We derive such a variational Bayesian scheme in the context of nonlinear stochastic dynamic hierarchical models, for both model identification and time-series prediction. The computational complexity of the scheme is comparable to that of an extended Kalman filter, which is critical when inverting high dimensional models or long time-series. Using Monte-Carlo simulations, we assess the estimation efficiency of this variational Bayesian approach using three stochastic variants of chaotic dynamic systems. We also demonstrate the model comparison capabilities of the method, its self-consistency and its predictive power.
Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models
NASA Astrophysics Data System (ADS)
Daunizeau, J.; Friston, K. J.; Kiebel, S. J.
2009-11-01
In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identification or inversion entails the estimation of the marginal likelihood or evidence of a model. This difficult integration problem can be finessed by optimising a free-energy bound on the evidence using results from variational calculus. This yields a deterministic update scheme that optimises an approximation to the posterior density on the unknown model variables. We derive such a variational Bayesian scheme in the context of nonlinear stochastic dynamic hierarchical models, for both model identification and time-series prediction. The computational complexity of the scheme is comparable to that of an extended Kalman filter, which is critical when inverting high dimensional models or long time-series. Using Monte-Carlo simulations, we assess the estimation efficiency of this variational Bayesian approach using three stochastic variants of chaotic dynamic systems. We also demonstrate the model comparison capabilities of the method, its self-consistency and its predictive power.
Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models
Daunizeau, J.; Friston, K.J.; Kiebel, S.J.
2009-01-01
In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identification or inversion entails the estimation of the marginal likelihood or evidence of a model. This difficult integration problem can be finessed by optimising a free-energy bound on the evidence using results from variational calculus. This yields a deterministic update scheme that optimises an approximation to the posterior density on the unknown model variables. We derive such a variational Bayesian scheme in the context of nonlinear stochastic dynamic hierarchical models, for both model identification and time-series prediction. The computational complexity of the scheme is comparable to that of an extended Kalman filter, which is critical when inverting high dimensional models or long time-series. Using Monte-Carlo simulations, we assess the estimation efficiency of this variational Bayesian approach using three stochastic variants of chaotic dynamic systems. We also demonstrate the model comparison capabilities of the method, its self-consistency and its predictive power. PMID:19862351
Lu, Dan; Ye, Ming; Curtis, Gary P.
2015-08-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. Our study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict the reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. Moreover, these reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Finally, limitations of
Lu, Dan; Ye, Ming; Curtis, Gary P.
2015-08-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. Our study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict themore » reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. Moreover, these reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Finally
Curtis, Gary P.; Lu, Dan; Ye, Ming
2015-01-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. This study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict the reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. These reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Limitations of applying MLBMA to the
A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
2015-01-01
Background The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes. Results A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods. Conclusions Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets. PMID:26423345
Dynamic causal modelling of electrographic seizure activity using Bayesian belief updating
Cooray, Gerald K.; Sengupta, Biswa; Douglas, Pamela K.; Friston, Karl
2016-01-01
Seizure activity in EEG recordings can persist for hours with seizure dynamics changing rapidly over time and space. To characterise the spatiotemporal evolution of seizure activity, large data sets often need to be analysed. Dynamic causal modelling (DCM) can be used to estimate the synaptic drivers of cortical dynamics during a seizure; however, the requisite (Bayesian) inversion procedure is computationally expensive. In this note, we describe a straightforward procedure, within the DCM framework, that provides efficient inversion of seizure activity measured with non-invasive and invasive physiological recordings; namely, EEG/ECoG. We describe the theoretical background behind a Bayesian belief updating scheme for DCM. The scheme is tested on simulated and empirical seizure activity (recorded both invasively and non-invasively) and compared with standard Bayesian inversion. We show that the Bayesian belief updating scheme provides similar estimates of time-varying synaptic parameters, compared to standard schemes, indicating no significant qualitative change in accuracy. The difference in variance explained was small (less than 5%). The updating method was substantially more efficient, taking approximately 5–10 min compared to approximately 1–2 h. Moreover, the setup of the model under the updating scheme allows for a clear specification of how neuronal variables fluctuate over separable timescales. This method now allows us to investigate the effect of fast (neuronal) activity on slow fluctuations in (synaptic) parameters, paving a way forward to understand how seizure activity is generated. PMID:26220742
Modelling household finances: A Bayesian approach to a multivariate two-part model.
Brown, Sarah; Ghosh, Pulak; Su, Li; Taylor, Karl
2015-09-01
We contribute to the empirical literature on household finances by introducing a Bayesian multivariate two-part model, which has been developed to further our understanding of household finances. Our flexible approach allows for the potential interdependence between the holding of assets and liabilities at the household level and also encompasses a two-part process to allow for differences in the influences on asset or liability holding and on the respective amounts held. Furthermore, the framework is dynamic in order to allow for persistence in household finances over time. Our findings endorse the joint modelling approach and provide evidence supporting the importance of dynamics. In addition, we find that certain independent variables exert different influences on the binary and continuous parts of the model thereby highlighting the flexibility of our framework and revealing a detailed picture of the nature of household finances.
Modelling household finances: A Bayesian approach to a multivariate two-part model
Brown, Sarah; Ghosh, Pulak; Su, Li; Taylor, Karl
2016-01-01
We contribute to the empirical literature on household finances by introducing a Bayesian multivariate two-part model, which has been developed to further our understanding of household finances. Our flexible approach allows for the potential interdependence between the holding of assets and liabilities at the household level and also encompasses a two-part process to allow for differences in the influences on asset or liability holding and on the respective amounts held. Furthermore, the framework is dynamic in order to allow for persistence in household finances over time. Our findings endorse the joint modelling approach and provide evidence supporting the importance of dynamics. In addition, we find that certain independent variables exert different influences on the binary and continuous parts of the model thereby highlighting the flexibility of our framework and revealing a detailed picture of the nature of household finances. PMID:27212801
NASA Astrophysics Data System (ADS)
Zeng, X.
2015-12-01
A large number of model executions are required to obtain alternative conceptual models' predictions and their posterior probabilities in Bayesian model averaging (BMA). The posterior model probability is estimated through models' marginal likelihood and prior probability. The heavy computation burden hinders the implementation of BMA prediction, especially for the elaborated marginal likelihood estimator. For overcoming the computation burden of BMA, an adaptive sparse grid (SG) stochastic collocation method is used to build surrogates for alternative conceptual models through the numerical experiment of a synthetical groundwater model. BMA predictions depend on model posterior weights (or marginal likelihoods), and this study also evaluated four marginal likelihood estimators, including arithmetic mean estimator (AME), harmonic mean estimator (HME), stabilized harmonic mean estimator (SHME), and thermodynamic integration estimator (TIE). The results demonstrate that TIE is accurate in estimating conceptual models' marginal likelihoods. The BMA-TIE has better predictive performance than other BMA predictions. TIE has high stability for estimating conceptual model's marginal likelihood. The repeated estimated conceptual model's marginal likelihoods by TIE have significant less variability than that estimated by other estimators. In addition, the SG surrogates are efficient to facilitate BMA predictions, especially for BMA-TIE. The number of model executions needed for building surrogates is 4.13%, 6.89%, 3.44%, and 0.43% of the required model executions of BMA-AME, BMA-HME, BMA-SHME, and BMA-TIE, respectively.
Precise Network Modeling of Systems Genetics Data Using the Bayesian Network Webserver.
Ziebarth, Jesse D; Cui, Yan
2017-01-01
The Bayesian Network Webserver (BNW, http://compbio.uthsc.edu/BNW ) is an integrated platform for Bayesian network modeling of biological datasets. It provides a web-based network modeling environment that seamlessly integrates advanced algorithms for probabilistic causal modeling and reasoning with Bayesian networks. BNW is designed for precise modeling of relatively small networks that contain less than 20 nodes. The structure learning algorithms used by BNW guarantee the discovery of the best (most probable) network structure given the data. To facilitate network modeling across multiple biological levels, BNW provides a very flexible interface that allows users to assign network nodes into different tiers and define the relationships between and within the tiers. This function is particularly useful for modeling systems genetics datasets that often consist of multiscalar heterogeneous genotype-to-phenotype data. BNW enables users to, within seconds or minutes, go from having a simply formatted input file containing a dataset to using a network model to make predictions about the interactions between variables and the potential effects of experimental interventions. In this chapter, we will introduce the functions of BNW and show how to model systems genetics datasets with BNW.
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2015-12-01
Models in biogeoscience involve uncertainties in observation data, model inputs, model structure, model processes and modeling scenarios. To accommodate for different sources of uncertainty, multimodal analysis such as model combination, model selection, model elimination or model discrimination are becoming more popular. To illustrate theoretical and practical challenges of multimodal analysis, we use an example about microbial soil respiration modeling. Global soil respiration releases more than ten times more carbon dioxide to the atmosphere than all anthropogenic emissions. Thus, improving our understanding of microbial soil respiration is essential for improving climate change models. This study focuses on a poorly understood phenomena, which is the soil microbial respiration pulses in response to episodic rainfall pulses (the "Birch effect"). We hypothesize that the "Birch effect" is generated by the following three mechanisms. To test our hypothesis, we developed and assessed five evolving microbial-enzyme models against field measurements from a semiarid Savannah that is characterized by pulsed precipitation. These five model evolve step-wise such that the first model includes none of these three mechanism, while the fifth model includes the three mechanisms. The basic component of Bayesian multimodal analysis is the estimation of marginal likelihood to rank the candidate models based on their overall likelihood with respect to observation data. The first part of the study focuses on using this Bayesian scheme to discriminate between these five candidate models. The second part discusses some theoretical and practical challenges, which are mainly the effect of likelihood function selection and the marginal likelihood estimation methods on both model ranking and Bayesian model averaging. The study shows that making valid inference from scientific data is not a trivial task, since we are not only uncertain about the candidate scientific models, but also about
Prediction and assimilation of surf-zone processes using a Bayesian network: Part II: Inverse models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
A Bayesian network model has been developed to simulate a relatively simple problem of wave propagation in the surf zone (detailed in Part I). Here, we demonstrate that this Bayesian model can provide both inverse modeling and data-assimilation solutions for predicting offshore wave heights and depth estimates given limited wave-height and depth information from an onshore location. The inverse method is extended to allow data assimilation using observational inputs that are not compatible with deterministic solutions of the problem. These inputs include sand bar positions (instead of bathymetry) and estimates of the intensity of wave breaking (instead of wave-height observations). Our results indicate that wave breaking information is essential to reduce prediction errors. In many practical situations, this information could be provided from a shore-based observer or from remote-sensing systems. We show that various combinations of the assimilated inputs significantly reduce the uncertainty in the estimates of water depths and wave heights in the model domain. Application of the Bayesian network model to new field data demonstrated significant predictive skill (R2 = 0.7) for the inverse estimate of a month-long time series of offshore wave heights. The Bayesian inverse results include uncertainty estimates that were shown to be most accurate when given uncertainty in the inputs (e.g., depth and tuning parameters). Furthermore, the inverse modeling was extended to directly estimate tuning parameters associated with the underlying wave-process model. The inverse estimates of the model parameters not only showed an offshore wave height dependence consistent with results of previous studies but the uncertainty estimates of the tuning parameters also explain previously reported variations in the model parameters.
Prediction and assimilation of surf-zone processes using a Bayesian network: Part I: Forward models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
Prediction of coastal processes, including waves, currents, and sediment transport, can be obtained from a variety of detailed geophysical-process models with many simulations showing significant skill. This capability supports a wide range of research and applied efforts that can benefit from accurate numerical predictions. However, the predictions are only as accurate as the data used to drive the models and, given the large temporal and spatial variability of the surf zone, inaccuracies in data are unavoidable such that useful predictions require corresponding estimates of uncertainty. We demonstrate how a Bayesian-network model can be used to provide accurate predictions of wave-height evolution in the surf zone given very sparse and/or inaccurate boundary-condition data. The approach is based on a formal treatment of a data-assimilation problem that takes advantage of significant reduction of the dimensionality of the model system. We demonstrate that predictions of a detailed geophysical model of the wave evolution are reproduced accurately using a Bayesian approach. In this surf-zone application, forward prediction skill was 83%, and uncertainties in