NASA Astrophysics Data System (ADS)
Nowak, W.; de Barros, F. P. J.; Rubin, Y.
2010-03-01
Geostatistical optimal design optimizes subsurface exploration for maximum information toward task-specific prediction goals. Until recently, most geostatistical design studies have assumed that the geostatistical description (i.e., the mean, trends, covariance models and their parameters) is given a priori. This contradicts, as emphasized by Rubin and Dagan (1987a), the fact that only few or even no data at all offer support for such assumptions prior to the bulk of exploration effort. We believe that geostatistical design should (1) avoid unjustified a priori assumptions on the geostatistical description, (2) instead reduce geostatistical model uncertainty as secondary design objective, (3) rate this secondary objective optimal for the overall prediction goal, and (4) be robust even under inaccurate geostatistical assumptions. Bayesian Geostatistical Design follows these guidelines by considering uncertain covariance model parameters. We transfer this concept from kriging-like applications to geostatistical inverse problems. We also deem it inappropriate to consider parametric uncertainty only within a single covariance model. The Matérn family of covariance functions has an additional shape parameter. Controlling model shape by a parameter converts covariance model selection to parameter identification and resembles Bayesian model averaging over a continuous spectrum of covariance models. This is appealing since it generalizes Bayesian model averaging from a finite number to an infinite number of models. We illustrate how our approach fulfills the above four guidelines in a series of synthetic test cases. The underlying scenarios are to minimize the prediction variance of (1) contaminant concentration or (2) arrival time at an ecologically sensitive location by optimal placement of hydraulic head and log conductivity measurements. Results highlight how both the impact of geostatistical model uncertainty and the sampling network design vary according to the
Preferential sampling and Bayesian geostatistics: Statistical modeling and examples.
Cecconi, Lorenzo; Grisotto, Laura; Catelan, Dolores; Lagazio, Corrado; Berrocal, Veronica; Biggeri, Annibale
2016-08-01
Preferential sampling refers to any situation in which the spatial process and the sampling locations are not stochastically independent. In this paper, we present two examples of geostatistical analysis in which the usual assumption of stochastic independence between the point process and the measurement process is violated. To account for preferential sampling, we specify a flexible and general Bayesian geostatistical model that includes a shared spatial random component. We apply the proposed model to two different case studies that allow us to highlight three different modeling and inferential aspects of geostatistical modeling under preferential sampling: (1) continuous or finite spatial sampling frame; (2) underlying causal model and relevant covariates; and (3) inferential goals related to mean prediction surface or prediction uncertainty.
Preferential sampling and Bayesian geostatistics: Statistical modeling and examples.
Cecconi, Lorenzo; Grisotto, Laura; Catelan, Dolores; Lagazio, Corrado; Berrocal, Veronica; Biggeri, Annibale
2016-08-01
Preferential sampling refers to any situation in which the spatial process and the sampling locations are not stochastically independent. In this paper, we present two examples of geostatistical analysis in which the usual assumption of stochastic independence between the point process and the measurement process is violated. To account for preferential sampling, we specify a flexible and general Bayesian geostatistical model that includes a shared spatial random component. We apply the proposed model to two different case studies that allow us to highlight three different modeling and inferential aspects of geostatistical modeling under preferential sampling: (1) continuous or finite spatial sampling frame; (2) underlying causal model and relevant covariates; and (3) inferential goals related to mean prediction surface or prediction uncertainty. PMID:27566774
Bayesian Geostatistical Modeling of Leishmaniasis Incidence in Brazil
Karagiannis-Voules, Dimitrios-Alexios; Scholte, Ronaldo G. C.; Guimarães, Luiz H.; Utzinger, Jürg; Vounatsou, Penelope
2013-01-01
Background Leishmaniasis is endemic in 98 countries with an estimated 350 million people at risk and approximately 2 million cases annually. Brazil is one of the most severely affected countries. Methodology We applied Bayesian geostatistical negative binomial models to analyze reported incidence data of cutaneous and visceral leishmaniasis in Brazil covering a 10-year period (2001–2010). Particular emphasis was placed on spatial and temporal patterns. The models were fitted using integrated nested Laplace approximations to perform fast approximate Bayesian inference. Bayesian variable selection was employed to determine the most important climatic, environmental, and socioeconomic predictors of cutaneous and visceral leishmaniasis. Principal Findings For both types of leishmaniasis, precipitation and socioeconomic proxies were identified as important risk factors. The predicted number of cases in 2010 were 30,189 (standard deviation [SD]: 7,676) for cutaneous leishmaniasis and 4,889 (SD: 288) for visceral leishmaniasis. Our risk maps predicted the highest numbers of infected people in the states of Minas Gerais and Pará for visceral and cutaneous leishmaniasis, respectively. Conclusions/Significance Our spatially explicit, high-resolution incidence maps identified priority areas where leishmaniasis control efforts should be targeted with the ultimate goal to reduce disease incidence. PMID:23675545
Schur, Nadine; Hürlimann, Eveline; Stensgaard, Anna-Sofie; Chimfwembe, Kingford; Mushinge, Gabriel; Simoonga, Christopher; Kabatereine, Narcis B; Kristensen, Thomas K; Utzinger, Jürg; Vounatsou, Penelope
2013-11-01
Schistosomiasis remains one of the most prevalent parasitic diseases in the tropics and subtropics, but current statistics are outdated due to demographic and ecological transformations and ongoing control efforts. Reliable risk estimates are important to plan and evaluate interventions in a spatially explicit and cost-effective manner. We analysed a large ensemble of georeferenced survey data derived from an open-access neglected tropical diseases database to create smooth empirical prevalence maps for Schistosoma mansoni and Schistosoma haematobium for a total of 13 countries of eastern Africa. Bayesian geostatistical models based on climatic and other environmental data were used to account for potential spatial clustering in spatially structured exposures. Geostatistical variable selection was employed to reduce the set of covariates. Alignment factors were implemented to combine surveys on different age-groups and to acquire separate estimates for individuals aged ≤20 years and entire communities. Prevalence estimates were combined with population statistics to obtain country-specific numbers of Schistosoma infections. We estimate that 122 million individuals in eastern Africa are currently infected with either S. mansoni, or S. haematobium, or both species concurrently. Country-specific population-adjusted prevalence estimates range between 12.9% (Uganda) and 34.5% (Mozambique) for S. mansoni and between 11.9% (Djibouti) and 40.9% (Mozambique) for S. haematobium. Our models revealed that infection risk in Burundi, Eritrea, Ethiopia, Kenya, Rwanda, Somalia and Sudan might be considerably higher than previously reported, while in Mozambique and Tanzania, the risk might be lower than current estimates suggest. Our empirical, large-scale, high-resolution infection risk estimates for S. mansoni and S. haematobium in eastern Africa can guide future control interventions and provide a benchmark for subsequent monitoring and evaluation activities.
Schur, Nadine; Hürlimann, Eveline; Stensgaard, Anna-Sofie; Chimfwembe, Kingford; Mushinge, Gabriel; Simoonga, Christopher; Kabatereine, Narcis B; Kristensen, Thomas K; Utzinger, Jürg; Vounatsou, Penelope
2013-11-01
Schistosomiasis remains one of the most prevalent parasitic diseases in the tropics and subtropics, but current statistics are outdated due to demographic and ecological transformations and ongoing control efforts. Reliable risk estimates are important to plan and evaluate interventions in a spatially explicit and cost-effective manner. We analysed a large ensemble of georeferenced survey data derived from an open-access neglected tropical diseases database to create smooth empirical prevalence maps for Schistosoma mansoni and Schistosoma haematobium for a total of 13 countries of eastern Africa. Bayesian geostatistical models based on climatic and other environmental data were used to account for potential spatial clustering in spatially structured exposures. Geostatistical variable selection was employed to reduce the set of covariates. Alignment factors were implemented to combine surveys on different age-groups and to acquire separate estimates for individuals aged ≤20 years and entire communities. Prevalence estimates were combined with population statistics to obtain country-specific numbers of Schistosoma infections. We estimate that 122 million individuals in eastern Africa are currently infected with either S. mansoni, or S. haematobium, or both species concurrently. Country-specific population-adjusted prevalence estimates range between 12.9% (Uganda) and 34.5% (Mozambique) for S. mansoni and between 11.9% (Djibouti) and 40.9% (Mozambique) for S. haematobium. Our models revealed that infection risk in Burundi, Eritrea, Ethiopia, Kenya, Rwanda, Somalia and Sudan might be considerably higher than previously reported, while in Mozambique and Tanzania, the risk might be lower than current estimates suggest. Our empirical, large-scale, high-resolution infection risk estimates for S. mansoni and S. haematobium in eastern Africa can guide future control interventions and provide a benchmark for subsequent monitoring and evaluation activities. PMID:22019933
Sedda, Luigi; Mweempwa, Cornelius; Ducheyne, Els; De Pus, Claudia; Hendrickx, Guy; Rogers, David J.
2014-01-01
For the first time a Bayesian geostatistical version of the Moran Curve, a logarithmic form of the Ricker stock recruitment curve, is proposed that is able to give an estimate of net change in population demographic rates considering components such as fertility and density dependent and density independent mortalities. The method is applied to spatio-temporally referenced count data of tsetse flies obtained from fly-rounds. The model is a linear regression with three components: population rate of change estimated from the Moran curve, an explicit spatio-temporal covariance, and the observation error optimised within a Bayesian framework. The model was applied to the three main climate seasons of Zambia (rainy – January to April, cold-dry – May to August, and hot-dry – September to December) taking into account land surface temperature and (seasonally changing) cattle distribution. The model shows a maximum positive net change during the hot-dry season and a minimum between the rainy and cold-dry seasons. Density independent losses are correlated positively with day-time land surface temperature and negatively with night-time land surface temperature and cattle distribution. The inclusion of density dependent mortality increases considerably the goodness of fit of the model. Cross validation with an independent dataset taken from the same area resulted in a very accurate estimate of tsetse catches. In general, the overall framework provides an important tool for vector control and eradication by identifying vector population concentrations and local vector demographic rates. It can also be applied to the case of sustainable harvesting of natural populations. PMID:24755848
Kara G. Eby
2010-08-01
At the Idaho National Laboratory (INL) Cs-137 concentrations above the U.S. Environmental Protection Agency risk-based threshold of 0.23 pCi/g may increase the risk of human mortality due to cancer. As a leader in nuclear research, the INL has been conducting nuclear activities for decades. Elevated anthropogenic radionuclide levels including Cs-137 are a result of atmospheric weapons testing, the Chernobyl accident, and nuclear activities occurring at the INL site. Therefore environmental monitoring and long-term surveillance of Cs-137 is required to evaluate risk. However, due to the large land area involved, frequent and comprehensive monitoring is limited. Developing a spatial model that predicts Cs-137 concentrations at unsampled locations will enhance the spatial characterization of Cs-137 in surface soils, provide guidance for an efficient monitoring program, and pinpoint areas requiring mitigation strategies. The predictive model presented herein is based on applied geostatistics using a Bayesian analysis of environmental characteristics across the INL site, which provides kriging spatial maps of both Cs-137 estimates and prediction errors. Comparisons are presented of two different kriging methods, showing that the use of secondary information (i.e., environmental characteristics) can provide improved prediction performance in some areas of the INL site.
Oluwole, Akinola S.; Ekpo, Uwem F.; Karagiannis-Voules, Dimitrios-Alexios; Abe, Eniola M.; Olamiju, Francisca O.; Isiyaku, Sunday; Okoronkwo, Chukwu; Saka, Yisa; Nebe, Obiageli J.; Braide, Eka I.; Mafiana, Chiedu F.; Utzinger, Jürg; Vounatsou, Penelope
2015-01-01
Background The acceleration of the control of soil-transmitted helminth (STH) infections in Nigeria, emphasizing preventive chemotherapy, has become imperative in light of the global fight against neglected tropical diseases. Predictive risk maps are an important tool to guide and support control activities. Methodology STH infection prevalence data were obtained from surveys carried out in 2011 using standard protocols. Data were geo-referenced and collated in a nationwide, geographic information system database. Bayesian geostatistical models with remotely sensed environmental covariates and variable selection procedures were utilized to predict the spatial distribution of STH infections in Nigeria. Principal Findings We found that hookworm, Ascaris lumbricoides, and Trichuris trichiura infections are endemic in 482 (86.8%), 305 (55.0%), and 55 (9.9%) locations, respectively. Hookworm and A. lumbricoides infection co-exist in 16 states, while the three species are co-endemic in 12 states. Overall, STHs are endemic in 20 of the 36 states of Nigeria, including the Federal Capital Territory of Abuja. The observed prevalence at endemic locations ranged from 1.7% to 51.7% for hookworm, from 1.6% to 77.8% for A. lumbricoides, and from 1.0% to 25.5% for T. trichiura. Model-based predictions ranged from 0.7% to 51.0% for hookworm, from 0.1% to 82.6% for A. lumbricoides, and from 0.0% to 18.5% for T. trichiura. Our models suggest that day land surface temperature and dense vegetation are important predictors of the spatial distribution of STH infection in Nigeria. In 2011, a total of 5.7 million (13.8%) school-aged children were predicted to be infected with STHs in Nigeria. Mass treatment at the local government area level for annual or bi-annual treatment of the school-aged population in Nigeria in 2011, based on World Health Organization prevalence thresholds, were estimated at 10.2 million tablets. Conclusions/Significance The predictive risk maps and estimated
Model Selection for Geostatistical Models
Hoeting, Jennifer A.; Davis, Richard A.; Merton, Andrew A.; Thompson, Sandra E.
2006-02-01
We consider the problem of model selection for geospatial data. Spatial correlation is typically ignored in the selection of explanatory variables and this can influence model selection results. For example, the inclusion or exclusion of particular explanatory variables may not be apparent when spatial correlation is ignored. To address this problem, we consider the Akaike Information Criterion (AIC) as applied to a geostatistical model. We offer a heuristic derivation of the AIC in this context and provide simulation results that show that using AIC for a geostatistical model is superior to the often used approach of ignoring spatial correlation in the selection of explanatory variables. These ideas are further demonstrated via a model for lizard abundance. We also employ the principle of minimum description length (MDL) to variable selection for the geostatistical model. The effect of sampling design on the selection of explanatory covariates is also explored.
An interactive Bayesian geostatistical inverse protocol for hydraulic tomography
NASA Astrophysics Data System (ADS)
Fienen, Michael N.; Clemo, Tom; Kitanidis, Peter K.
2008-12-01
Hydraulic tomography is a powerful technique for characterizing heterogeneous hydrogeologic parameters. An explicit trade-off between characterization based on measurement misfit and subjective characterization using prior information is presented. We apply a Bayesian geostatistical inverse approach that is well suited to accommodate a flexible model with the level of complexity driven by the data and explicitly considering uncertainty. Prior information is incorporated through the selection of a parameter covariance model characterizing continuity and providing stability. Often, discontinuities in the parameter field, typically caused by geologic contacts between contrasting lithologic units, necessitate subdivision into zones across which there is no correlation among hydraulic parameters. We propose an interactive protocol in which zonation candidates are implied from the data and are evaluated using cross validation and expert knowledge. Uncertainty introduced by limited knowledge of dynamic regional conditions is mitigated by using drawdown rather than native head values. An adjoint state formulation of MODFLOW-2000 is used to calculate sensitivities which are used both for the solution to the inverse problem and to guide protocol decisions. The protocol is tested using synthetic two-dimensional steady state examples in which the wells are located at the edge of the region of interest.
An interactive Bayesian geostatistical inverse protocol for hydraulic tomography
Fienen, Michael N.; Clemo, Tom; Kitanidis, Peter K.
2008-01-01
Hydraulic tomography is a powerful technique for characterizing heterogeneous hydrogeologic parameters. An explicit trade-off between characterization based on measurement misfit and subjective characterization using prior information is presented. We apply a Bayesian geostatistical inverse approach that is well suited to accommodate a flexible model with the level of complexity driven by the data and explicitly considering uncertainty. Prior information is incorporated through the selection of a parameter covariance model characterizing continuity and providing stability. Often, discontinuities in the parameter field, typically caused by geologic contacts between contrasting lithologic units, necessitate subdivision into zones across which there is no correlation among hydraulic parameters. We propose an interactive protocol in which zonation candidates are implied from the data and are evaluated using cross validation and expert knowledge. Uncertainty introduced by limited knowledge of dynamic regional conditions is mitigated by using drawdown rather than native head values. An adjoint state formulation of MODFLOW-2000 is used to calculate sensitivities which are used both for the solution to the inverse problem and to guide protocol decisions. The protocol is tested using synthetic two-dimensional steady state examples in which the wells are located at the edge of the region of interest.
2010-01-01
Background The Zambia Malaria Indicator Survey (ZMIS) of 2006 was the first nation-wide malaria survey, which combined parasitological data with other malaria indicators such as net use, indoor residual spraying and household related aspects. The survey was carried out by the Zambian Ministry of Health and partners with the objective of estimating the coverage of interventions and malaria related burden in children less than five years. In this study, the ZMIS data were analysed in order (i) to estimate an empirical high-resolution parasitological risk map in the country and (ii) to assess the relation between malaria interventions and parasitaemia risk after adjusting for environmental and socio-economic confounders. Methods The parasitological risk was predicted from Bayesian geostatistical and spatially independent models relating parasitaemia risk and environmental/climatic predictors of malaria. A number of models were fitted to capture the (potential) non-linearity in the malaria-environment relation and to identify the elapsing time between environmental effects and parasitaemia risk. These models included covariates (a) in categorical scales and (b) in penalized and basis splines terms. Different model validation methods were used to identify the best fitting model. Model-based risk predictions at unobserved locations were obtained via Bayesian predictive distributions for the best fitting model. Results Model validation indicated that linear environmental predictors were able to fit the data as well as or even better than more complex non-linear terms and that the data do not support spatial dependence. Overall the averaged population-adjusted parasitaemia risk was 20.0% in children less than five years with the highest risk predicted in the northern (38.3%) province. The odds of parasitaemia in children living in a household with at least one bed net decreases by 40% (CI: 12%, 61%) compared to those without bed nets. Conclusions The map of parasitaemia
Bayesian Geostatistical Analysis and Prediction of Rhodesian Human African Trypanosomiasis
Wardrop, Nicola A.; Atkinson, Peter M.; Gething, Peter W.; Fèvre, Eric M.; Picozzi, Kim; Kakembo, Abbas S. L.; Welburn, Susan C.
2010-01-01
Background The persistent spread of Rhodesian human African trypanosomiasis (HAT) in Uganda in recent years has increased concerns of a potential overlap with the Gambian form of the disease. Recent research has aimed to increase the evidence base for targeting control measures by focusing on the environmental and climatic factors that control the spatial distribution of the disease. Objectives One recent study used simple logistic regression methods to explore the relationship between prevalence of Rhodesian HAT and several social, environmental and climatic variables in two of the most recently affected districts of Uganda, and suggested the disease had spread into the study area due to the movement of infected, untreated livestock. Here we extend this study to account for spatial autocorrelation, incorporate uncertainty in input data and model parameters and undertake predictive mapping for risk of high HAT prevalence in future. Materials and Methods Using a spatial analysis in which a generalised linear geostatistical model is used in a Bayesian framework to account explicitly for spatial autocorrelation and incorporate uncertainty in input data and model parameters we are able to demonstrate a more rigorous analytical approach, potentially resulting in more accurate parameter and significance estimates and increased predictive accuracy, thereby allowing an assessment of the validity of the livestock movement hypothesis given more robust parameter estimation and appropriate assessment of covariate effects. Results Analysis strongly supports the theory that Rhodesian HAT was imported to the study area via the movement of untreated, infected livestock from endemic areas. The confounding effect of health care accessibility on the spatial distribution of Rhodesian HAT and the linkages between the disease's distribution and minimum land surface temperature have also been confirmed via the application of these methods. Conclusions Predictive mapping indicates an
Bayesian geostatistics in health cartography: the perspective of malaria
Patil, Anand P.; Gething, Peter W.; Piel, Frédéric B.; Hay, Simon I.
2011-01-01
Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision. PMID:21420361
Bayesian geostatistics in health cartography: the perspective of malaria.
Patil, Anand P; Gething, Peter W; Piel, Frédéric B; Hay, Simon I
2011-06-01
Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision.
Fienen, Michael N.; D'Oria, Marco; Doherty, John E.; Hunt, Randall J.
2013-01-01
The application bgaPEST is a highly parameterized inversion software package implementing the Bayesian Geostatistical Approach in a framework compatible with the parameter estimation suite PEST. Highly parameterized inversion refers to cases in which parameters are distributed in space or time and are correlated with one another. The Bayesian aspect of bgaPEST is related to Bayesian probability theory in which prior information about parameters is formally revised on the basis of the calibration dataset used for the inversion. Conceptually, this approach formalizes the conditionality of estimated parameters on the speciﬁc data and model available. The geostatistical component of the method refers to the way in which prior information about the parameters is used. A geostatistical autocorrelation function is used to enforce structure on the parameters to avoid overﬁtting and unrealistic results. Bayesian Geostatistical Approach is designed to provide the smoothest solution that is consistent with the data. Optionally, users can specify a level of ﬁt or estimate a balance between ﬁt and model complexity informed by the data. Groundwater and surface-water applications are used as examples in this text, but the possible uses of bgaPEST extend to any distributed parameter applications.
NASA Astrophysics Data System (ADS)
Painter, S. L.; Jiang, Y.; Woodbury, A. D.
2002-12-01
The Edwards Aquifer, a highly heterogeneous karst aquifer located in south central Texas, is the sole source of drinking water for more than one million people. Hydraulic conductivity (K) measurements in the Edwards Aquifer are sparse, highly variable (log-K variance of 6.4), and are mostly from single-well drawdown tests that are appropriate for the spatial scale of a few meters. To support ongoing efforts to develop a groundwater management (MODFLOW) model of the San Antonio segment of the Edwards Aquifer, a multistep procedure was developed to assign hydraulic parameters to the 402 m x 402 m computational cells intended for the management model. The approach used a combination of nonparametric geostatistical analysis, stochastic simulation, numerical upscaling, and automatic model calibration based on Bayesian updating [1,2]. Indicator correlograms reveal a nested spatial structure in the well-test K of the confined zone, with practical correlation ranges of 3,600 and 15,000 meters and a large nugget effect. The fitted geostatistical model was used in unconditional stochastic simulations by the sequential indicator simulation method. The resulting realizations of K, defined at the scale of the well tests, were then numerically upscaled to the block scale. A new geostatistical model was fitted to the upscaled values. The upscaled model was then used to cokrige the block-scale K based on the well-test K. The resulting K map was then converted to transmissivity (T) using deterministically mapped aquifer thickness. When tested in a forward groundwater model, the upscaled T reproduced hydraulic heads better than a simple kriging of the well-test values (mean error of -3.9 meter and mean-absolute-error of 12 meters, as compared with -13 and 17 meters for the simple kriging). As the final step in the study, the upscaled T map was used as the prior distribution in an inverse procedure based on Bayesian updating [1,2]. When input to the forward groundwater model, the
Boysen, Courtney; Davis, Elizabeth G; Beard, Laurie A; Lubbers, Brian V; Raghavan, Ram K
2015-01-01
Kansas witnessed an unprecedented outbreak in Corynebacterium pseudotuberculosis infection among horses, a disease commonly referred to as pigeon fever during fall 2012. Bayesian geostatistical models were developed to identify key environmental and climatic risk factors associated with C. pseudotuberculosis infection in horses. Positive infection status among horses (cases) was determined by positive test results for characteristic abscess formation, positive bacterial culture on purulent material obtained from a lanced abscess (n = 82), or positive serologic evidence of exposure to organism (≥ 1:512)(n = 11). Horses negative for these tests (n = 172)(controls) were considered free of infection. Information pertaining to horse demographics and stabled location were obtained through review of medical records and/or contact with horse owners via telephone. Covariate information for environmental and climatic determinants were obtained from USDA (soil attributes), USGS (land use/land cover), and NASA MODIS and NASA Prediction of Worldwide Renewable Resources (climate). Candidate covariates were screened using univariate regression models followed by Bayesian geostatistical models with and without covariates. The best performing model indicated a protective effect for higher soil moisture content (OR = 0.53, 95% CrI = 0.25, 0.71), and detrimental effects for higher land surface temperature (≥ 35°C) (OR = 2.81, 95% CrI = 2.21, 3.85) and habitat fragmentation (OR = 1.31, 95% CrI = 1.27, 2.22) for C. pseudotuberculosis infection status in horses, while age, gender and breed had no effect. Preventative and ecoclimatic significance of these findings are discussed.
Boysen, Courtney; Davis, Elizabeth G.; Beard, Laurie A.; Lubbers, Brian V.; Raghavan, Ram K.
2015-01-01
Kansas witnessed an unprecedented outbreak in Corynebacterium pseudotuberculosis infection among horses, a disease commonly referred to as pigeon fever during fall 2012. Bayesian geostatistical models were developed to identify key environmental and climatic risk factors associated with C. pseudotuberculosis infection in horses. Positive infection status among horses (cases) was determined by positive test results for characteristic abscess formation, positive bacterial culture on purulent material obtained from a lanced abscess (n = 82), or positive serologic evidence of exposure to organism (≥1:512)(n = 11). Horses negative for these tests (n = 172)(controls) were considered free of infection. Information pertaining to horse demographics and stabled location were obtained through review of medical records and/or contact with horse owners via telephone. Covariate information for environmental and climatic determinants were obtained from USDA (soil attributes), USGS (land use/land cover), and NASA MODIS and NASA Prediction of Worldwide Renewable Resources (climate). Candidate covariates were screened using univariate regression models followed by Bayesian geostatistical models with and without covariates. The best performing model indicated a protective effect for higher soil moisture content (OR = 0.53, 95% CrI = 0.25, 0.71), and detrimental effects for higher land surface temperature (≥35°C) (OR = 2.81, 95% CrI = 2.21, 3.85) and habitat fragmentation (OR = 1.31, 95% CrI = 1.27, 2.22) for C. pseudotuberculosis infection status in horses, while age, gender and breed had no effect. Preventative and ecoclimatic significance of these findings are discussed. PMID:26473728
Boysen, Courtney; Davis, Elizabeth G; Beard, Laurie A; Lubbers, Brian V; Raghavan, Ram K
2015-01-01
Kansas witnessed an unprecedented outbreak in Corynebacterium pseudotuberculosis infection among horses, a disease commonly referred to as pigeon fever during fall 2012. Bayesian geostatistical models were developed to identify key environmental and climatic risk factors associated with C. pseudotuberculosis infection in horses. Positive infection status among horses (cases) was determined by positive test results for characteristic abscess formation, positive bacterial culture on purulent material obtained from a lanced abscess (n = 82), or positive serologic evidence of exposure to organism (≥ 1:512)(n = 11). Horses negative for these tests (n = 172)(controls) were considered free of infection. Information pertaining to horse demographics and stabled location were obtained through review of medical records and/or contact with horse owners via telephone. Covariate information for environmental and climatic determinants were obtained from USDA (soil attributes), USGS (land use/land cover), and NASA MODIS and NASA Prediction of Worldwide Renewable Resources (climate). Candidate covariates were screened using univariate regression models followed by Bayesian geostatistical models with and without covariates. The best performing model indicated a protective effect for higher soil moisture content (OR = 0.53, 95% CrI = 0.25, 0.71), and detrimental effects for higher land surface temperature (≥ 35°C) (OR = 2.81, 95% CrI = 2.21, 3.85) and habitat fragmentation (OR = 1.31, 95% CrI = 1.27, 2.22) for C. pseudotuberculosis infection status in horses, while age, gender and breed had no effect. Preventative and ecoclimatic significance of these findings are discussed. PMID:26473728
Constraining geostatistical models with hydrological data to improve prediction realism
NASA Astrophysics Data System (ADS)
Demyanov, V.; Rojas, T.; Christie, M.; Arnold, D.
2012-04-01
Geostatistical models reproduce spatial correlation based on the available on site data and more general concepts about the modelled patters, e.g. training images. One of the problem of modelling natural systems with geostatistics is in maintaining realism spatial features and so they agree with the physical processes in nature. Tuning the model parameters to the data may lead to geostatistical realisations with unrealistic spatial patterns, which would still honour the data. Such model would result in poor predictions, even though although fit the available data well. Conditioning the model to a wider range of relevant data provide a remedy that avoid producing unrealistic features in spatial models. For instance, there are vast amounts of information about the geometries of river channels that can be used in describing fluvial environment. Relations between the geometrical channel characteristics (width, depth, wave length, amplitude, etc.) are complex and non-parametric and are exhibit a great deal of uncertainty, which is important to propagate rigorously into the predictive model. These relations can be described within a Bayesian approach as multi-dimensional prior probability distributions. We propose a way to constrain multi-point statistics models with intelligent priors obtained from analysing a vast collection of contemporary river patterns based on previously published works. We applied machine learning techniques, namely neural networks and support vector machines, to extract multivariate non-parametric relations between geometrical characteristics of fluvial channels from the available data. An example demonstrates how ensuring geological realism helps to deliver more reliable prediction of a subsurface oil reservoir in a fluvial depositional environment.
Factor-based Geostatistics for Groundwater Modeling
NASA Astrophysics Data System (ADS)
Savelyeva, E.; Pavlova, M.
2012-04-01
Analysis of groundwater levels is an important stage preceding modeling the filtration and migration processes in the hydro-geological environment. The boundary conditions are due to a pressure field, which strongly depends on groundwater levels, their spatial and temporal variability. Hydro-physical measurements are usually performed at a set of unhomogeneously spatially distributed wells according to some temporal scheme. Thus, it is an irregular spatio-temporal data set with a whole luggage of problems concerning organization of a spatio-temporal metrics system. These problems also affect modeling of a spatio-temporal correlation structure. There are different ways how to overcome these problems and obtain a reasonable model of spatio-temporal correlation structures. But still all these approaches are limited in future forecasting features. This work proposes an alternative approach - a factor-based space-time geostatistics. This method opens a set of possibilities concerning future modeling: possibility to use additional information to present different future scenario, characterization of uncertainty, probabilistic description of critical events. The basic idea is to replace a system of spatially correlated wells by a set of independent factors compressing data with a possibility of back transformation at the prescribed level of accuracy. Factors can be obtained by principle component analysis, independent sources and artificial neural network with a "bottle-neck". The selection of a method depends on the features of initial data and the process under study. All factors are time series nevertheless how they were obtained. A set of factors contains the main features of the groundwater level patterns. Groundwater levels modeling and forecasting is performed through modeling of these time series. This work considers three different stochastic approaches for modeling and forecasting of time series with hydrological origins: stochastic process with a deterministic
High Performance Geostatistical Modeling of Biospheric Resources
NASA Astrophysics Data System (ADS)
Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.
2004-12-01
We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
Fienen, M.; Hunt, R.; Krabbenhoft, D.; Clemo, T.
2009-01-01
Flow path delineation is a valuable tool for interpreting the subsurface hydrogeochemical environment. Different types of data, such as groundwater flow and transport, inform different aspects of hydrogeologie parameter values (hydraulic conductivity in this case) which, in turn, determine flow paths. This work combines flow and transport information to estimate a unified set of hydrogeologic parameters using the Bayesian geostatistical inverse approach. Parameter flexibility is allowed by using a highly parameterized approach with the level of complexity informed by the data. Despite the effort to adhere to the ideal of minimal a priori structure imposed on the problem, extreme contrasts in parameters can result in the need to censor correlation across hydrostratigraphic bounding surfaces. These partitions segregate parameters into faci??s associations. With an iterative approach in which partitions are based on inspection of initial estimates, flow path interpretation is progressively refined through the inclusion of more types of data. Head observations, stable oxygen isotopes (18O/16O) ratios), and tritium are all used to progressively refine flow path delineation on an isthmus between two lakes in the Trout Lake watershed, northern Wisconsin, United States. Despite allowing significant parameter freedom by estimating many distributed parameter values, a smooth field is obtained. Copyright 2009 by the American Geophysical Union.
NASA Astrophysics Data System (ADS)
Fienen, M.; Hunt, R.; Krabbenhoft, D.; Clemo, T.
2009-08-01
Flow path delineation is a valuable tool for interpreting the subsurface hydrogeochemical environment. Different types of data, such as groundwater flow and transport, inform different aspects of hydrogeologic parameter values (hydraulic conductivity in this case) which, in turn, determine flow paths. This work combines flow and transport information to estimate a unified set of hydrogeologic parameters using the Bayesian geostatistical inverse approach. Parameter flexibility is allowed by using a highly parameterized approach with the level of complexity informed by the data. Despite the effort to adhere to the ideal of minimal a priori structure imposed on the problem, extreme contrasts in parameters can result in the need to censor correlation across hydrostratigraphic bounding surfaces. These partitions segregate parameters into facies associations. With an iterative approach in which partitions are based on inspection of initial estimates, flow path interpretation is progressively refined through the inclusion of more types of data. Head observations, stable oxygen isotopes (18O/16O ratios), and tritium are all used to progressively refine flow path delineation on an isthmus between two lakes in the Trout Lake watershed, northern Wisconsin, United States. Despite allowing significant parameter freedom by estimating many distributed parameter values, a smooth field is obtained.
Geostatistical modelling of household malaria in Malawi
NASA Astrophysics Data System (ADS)
Chirombo, J.; Lowe, R.; Kazembe, L.
2012-04-01
Malaria is one of the most important diseases in the world today, common in tropical and subtropical areas with sub-Saharan Africa being the region most burdened, including Malawi. This region has the right combination of biotic and abiotic components, including socioeconomic, climatic and environmental factors that sustain transmission of the disease. Differences in these conditions across the country consequently lead to spatial variation in risk of the disease. Analysis of nationwide survey data that takes into account this spatial variation is crucial in a resource constrained country like Malawi for targeted allocation of scare resources in the fight against malaria. Previous efforts to map malaria risk in Malawi have been based on limited data collected from small surveys. The Malaria Indicator Survey conducted in 2010 is the most comprehensive malaria survey carried out in Malawi and provides point referenced data for the study. The data has been shown to be spatially correlated. We use Bayesian logistic regression models with spatial correlation to model the relationship between malaria presence in children and covariates such as socioeconomic status of households and meteorological conditions. This spatial model is then used to assess how malaria varies spatially and a malaria risk map for Malawi is produced. By taking intervention measures into account, the developed model is used to assess whether they have an effect on the spatial distribution of the disease and Bayesian kriging is used to predict areas where malaria risk is more likely to increase. It is hoped that this study can help reveal areas that require more attention from the authorities in the continuing fight against malaria, particularly in children under the age of five.
Distribution of Modelling Spatial Processes Using Geostatistical Analysis
NASA Astrophysics Data System (ADS)
Grynyshyna-Poliuga, Oksana; Stanislawska, Iwona; Swiatek, Anna
The Geostatistical Analyst uses sample points taken at different locations in a landscape and creates (interpolates) a continuous surface. The Geostatistical Analyst provides two groups of interpolation techniques: deterministic and geostatistical. All methods rely on the similarity of nearby sample points to create the surface. Deterministic techniques use mathematical functions for interpolation. Geostatistics relies on both statistical and mathematical methods, which can be used to create surfaces and assess the uncertainty of the predictions. The first step in geostatistical analysis is variography: computing and modelling a semivariogram. A semivariogram is one of the significant functions to indicate spatial correlation in observations measured at sample locations. It is commonly represented as a graph that shows the variance in measure with distance between all pairs of sampled locations. Such a graph is helpful to build a mathematical model that describes the variability of the measure with location. Modeling of relationship among sample locations to indicate the variability of the measure with distance of separation is called semivariogram modelling. It is applied to applications involving estimating the value of a measure at a new location. Our work presents the analysis of the data following the steps as given below: identification of data set periods, constructing and modelling the empirical semivariogram for single location and using the Kriging mapping function as modelling of TEC maps in mid-latitude during disturbed and quiet days. Based on the semivariogram, weights for the kriging interpolation are estimated. Additional observations do, in general, not provide relevant extra information to the interpolation, because the spatial correlation is well described with the semivariogram. Keywords: Semivariogram, Kriging, modelling, Geostatistics, TEC
Kennedy, Paula L; Woodbury, Allan D
2002-01-01
In ground water flow and transport modeling, the heterogeneous nature of porous media has a considerable effect on the resulting flow and solute transport. Some method of generating the heterogeneous field from a limited dataset of uncertain measurements is required. Bayesian updating is one method that interpolates from an uncertain dataset using the statistics of the underlying probability distribution function. In this paper, Bayesian updating was used to determine the heterogeneous natural log transmissivity field for a carbonate and a sandstone aquifer in southern Manitoba. It was determined that the transmissivity in m2/sec followed a natural log normal distribution for both aquifers with a mean of -7.2 and - 8.0 for the carbonate and sandstone aquifers, respectively. The variograms were calculated using an estimator developed by Li and Lake (1994). Fractal nature was not evident in the variogram from either aquifer. The Bayesian updating heterogeneous field provided good results even in cases where little data was available. A large transmissivity zone in the sandstone aquifer was created by the Bayesian procedure, which is not a reflection of any deterministic consideration, but is a natural outcome of updating a prior probability distribution function with observations. The statistical model returns a result that is very reasonable; that is homogeneous in regions where little or no information is available to alter an initial state. No long range correlation trends or fractal behavior of the log-transmissivity field was observed in either aquifer over a distance of about 300 km. PMID:12019642
Gstat: a program for geostatistical modelling, prediction and simulation
NASA Astrophysics Data System (ADS)
Pebesma, Edzer J.; Wesseling, Cees G.
1998-01-01
Gstat is a computer program for variogram modelling, and geostatistical prediction and simulation. It provides a generic implementation of the multivariable linear model with trends modelled as a linear function of coordinate polynomials or of user-defined base functions, and independent or dependent, geostatistically modelled, residuals. Simulation in gstat comprises conditional or unconditional (multi-) Gaussian sequential simulation of point values or block averages, or (multi-) indicator sequential simulation. Besides many of the popular options found in other geostatistical software packages, gstat offers the unique combination of (i) an interactive user interface for modelling variograms and generalized covariances (residual variograms), that uses the device-independent plotting program gnuplot for graphical display, (ii) support for several ascii and binary data and map file formats for input and output, (iii) a concise, intuitive and flexible command language, (iv) user customization of program defaults, (v) no built-in limits, and (vi) free, portable ANSI-C source code. This paper describes the class of problems gstat can solve, and addresses aspects of efficiency and implementation, managing geostatistical projects, and relevant technical details.
NASA Astrophysics Data System (ADS)
Li, A.; Glenn, N. F.
2012-12-01
Biomass of vegetation is critical for carbon cycle research. Estimating biomass from field survey data is laborious and/or destructive and thus retrieving biomass from remote sensing data may be advantageous. Most remote sensing biomass studies have focused on forest ecosystems, while few have focused on low stature vegetation, such as grasses in semi-arid environments. Biomass estimates for grass are significant for studying wildlife habitat, assessing fuel loads, and studying climate change response in semi-arid regions. Recent research has demonstrated the ability of small footprint airborne laser scanning (ALS) data to extract sagebrush height characteristics and the ability of terrestrial laser scanning (TLS) data to estimate vegetation volume over semi-arid rangelands. ALS has somewhat lower resolution than TLS, but has improved spatial coverage over TLS. Combining ALS and TLS is a powerful tool to estimate biomass on regional scales. Bayesian geostatistics, also known as Bayesian Maximum Entropy (BME), can fuse multiple data sources across scales and provide estimation uncertainties for the integration of ALS and TLS data for grass biomass. Regression models are used to approximately delineate the relationship between field biomass measurements and TLS derived height and shape metrics. We then consider TLS plot-level data at the point scale with ALS data at the area scale. The regularization method is utilized to establish the scaling relations between TLS-derived and ALS-derived metrics. The metric maps from the ALS level are reconstructed using a BME method based on regularized variograms. We gain biomass and estimation uncertainty on the regional scale by introducing updated metrics into the model. In order to evaluate the effectiveness of the BME method, we develop simple independent regression models by assuming the TLS-derived metrics as ground reference data. Therefore, the regression model is used to correct the ALS-estimated values and we retrieve
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
High resolution sequence stratigraphic concepts applied to geostatistical modeling
Desaubliaux, G.; De Lestang, A.P.; Eschard, R.
1995-08-01
Lithofacies simulations using a high resolution 3D grid allow to enhance the geometries of internal heterogeneities of reservoirs. In this study the series simulated were the Ness formation, part of the Brent reservoir in the Dunbar field located in the Viking graben of the North Sea. Simulations results have been used to build the reservoir layering supporting the 3D grid used for reservoir engineering, and also used as a frame to study the effects of secondary diagenetic processes on petrophysical properties. The method used is based on a geostatistical study and integrates the following data: a geological model using sequence stratigraphic concepts to define lithofacies sequences and associated bounding surfaces, well data (cores and logs) used as database for geostatistical analysis and simulations, seismic data: a 3D seismic survey has been used to define the internal surfaces bounding the units, outcrop data: The Mesa Verde formation (Colorado, USA) has been used as an outcrop analog to calibrate geostatistical parameters for the simulations (horizontal range of the variograms). This study illustrates the capacity to use high resolution sequence stratigraphic concepts to improve the simulations of reservoirs when the lack of subsurface information reduce the accuracy of geostatistical analysis.
NASA Astrophysics Data System (ADS)
Hu, L.; Montzka, S. A.; Miller, B.; Andrews, A. E.; Miller, J. B.; Lehman, S.; Sweeney, C.; Miller, S. M.; Thoning, K. W.; Siso, C.; Atlas, E. L.; Blake, D. R.; De Gouw, J. A.; Gilman, J.; Dutton, G. S.; Elkins, J. W.; Hall, B. D.; Chen, H.; Fischer, M. L.; Mountain, M. E.; Nehrkorn, T.; Biraud, S.; Tans, P. P.
2015-12-01
Global atmospheric observations suggest substantial ongoing emissions of carbon tetrachloride (CCl4) despite a 100% phase-out of production for dispersive uses since 1996 in developed countries and 2010 in other countries. Little progress has been made in understanding the causes of these ongoing emissions or identifying their contributing sources. In this study, we employed multiple inverse modeling techniques (i.e. Bayesian and geostatistical inversions) to assimilate CCl4 mole fractions observed from the National Oceanic and Atmospheric Administration (NOAA) flask-air sampling network over the US, and quantify its national and regional emissions during 2008 - 2012. Average national total emissions of CCl4 between 2008 and 2012 determined from these observations and an ensemble of inversions range between 2.1 and 6.1 Gg yr-1. This emission is substantially larger than the mean of 0.06 Gg/yr reported to the US EPA Toxics Release Inventory over these years, suggesting that under-reported emissions or non-reporting sources make up the bulk of CCl4 emissions from the US. But while the inventory does not account for the magnitude of observationally-derived CCl4 emissions, the regional distribution of derived and inventory emissions is similar. Furthermore, when considered relative to the distribution of uncapped landfills or population, the variability in measured mole fractions was most consistent with the distribution of industrial sources (i.e., those from the Toxics Release Inventory). Our results suggest that emissions from the US only account for a small fraction of the global on-going emissions of CCl4 (30 - 80 Gg yr-1 over this period). Finally, to ascertain the importance of the US emissions relative to the unaccounted global emission rate we considered multiple approaches to extrapolate our results to other countries and the globe.
The extension of geostatistical spatial analysis model and its application to datum land appraisal
NASA Astrophysics Data System (ADS)
Fu, Feihong; Li, Xuefei; Zou, Rong
2007-06-01
Geostatistical method can reflect quantitatively variable spatial distribution characteristic, and through produces many different theoretical models to reflect quantitatively the uncertain attribute because of lacking material. But geostatistics is taken a new discipline, it also exists the probability of extension. The extension of ordinary geostatistics includes mainly three aspects: the treatment of outliers in geostatistical spatial data, fitting the variogram and selecting Kriging estimate neighborhood. And it introduces the basic mentality of applying geostatistical space analytical model to appraise datum land price base on analyzing the feasibility.
Fractal and geostatistical methods for modeling of a fracture network
Chiles, J.P.
1988-08-01
The modeling of fracture networks is useful for fluid flow and rock mechanics studies. About 6600 fracture traces were recorded on drifts of a uranium mine in a granite massif. The traces have an extension of 0.20-20 m. The network was studied by fractal and by geostatistical methods but can be considered neither as a fractal with a constant dimension nor a set of purely randomly located fractures. Two kinds of generalization of conventional models can still provide more flexibility for the characterization of the network: (a) a nonscaling fractal model with variable similarity dimension (for a 2-D network of traces, the dimension varying from 2 for the 10-m scale to 1 for the centimeter scale, (b) a parent-daughter model with a regionalized density; the geostatistical study allows a 3-D model to be established where: fractures are assumed to be discs; fractures are grouped in clusters or swarms; and fracturation density is regionalized (with two ranges at about 30 and 300 m). The fractal model is easy to fit and to simulate along a line, but 2-D and 3-D simulations are more difficult. The geostatistical model is more complex, but easy to simulate, even in 3-D.
Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics
NASA Astrophysics Data System (ADS)
Hristopulos, Dionissios T.
2015-12-01
Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.
Chen, Xingyuan; Murakami, Haruko; Hahn, Melanie S.; Hammond, Glenn E.; Rockhold, Mark L.; Zachara, John M.; Rubin, Yoram
2012-06-01
Tracer testing under natural or forced gradient flow holds the potential to provide useful information for characterizing subsurface properties, through monitoring, modeling and interpretation of the tracer plume migration in an aquifer. Non-reactive tracer experiments were conducted at the Hanford 300 Area, along with constant-rate injection tests and electromagnetic borehole flowmeter (EBF) profiling. A Bayesian data assimilation technique, the method of anchored distributions (MAD) [Rubin et al., 2010], was applied to assimilate the experimental tracer test data with the other types of data and to infer the three-dimensional heterogeneous structure of the hydraulic conductivity in the saturated zone of the Hanford formation. In this study, the Bayesian prior information on the underlying random hydraulic conductivity field was obtained from previous field characterization efforts using the constant-rate injection tests and the EBF data. The posterior distribution of the conductivity field was obtained by further conditioning the field on the temporal moments of tracer breakthrough curves at various observation wells. MAD was implemented with the massively-parallel three-dimensional flow and transport code PFLOTRAN to cope with the highly transient flow boundary conditions at the site and to meet the computational demands of MAD. A synthetic study proved that the proposed method could effectively invert tracer test data to capture the essential spatial heterogeneity of the three-dimensional hydraulic conductivity field. Application of MAD to actual field data shows that the hydrogeological model, when conditioned on the tracer test data, can reproduce the tracer transport behavior better than the field characterized without the tracer test data. This study successfully demonstrates that MAD can sequentially assimilate multi-scale multi-type field data through a consistent Bayesian framework.
Examples of improved reservoir modeling through geostatistical data integration
Bashore, W.M.; Araktingi, U.G.
1994-12-31
Results from four case studies are presented to demonstrate improvements in reservoir modeling and subsequent flow predictions through various uses of geostatistical integration methods. Specifically, these cases highlight improvements gained from (1) better understanding of reservoir geometries through 3D visualization, (2) forward modeling to assess the value of new data prior to acquisition and integration, (3) assessment of reduced uncertainty in porosity prediction through integration of seismic acoustic impedance, and (4) integration of crosswell tomographic and reflection data. The intent of each of these examples is to quantify the add-value of geological and geophysical data integration in engineering terms such as fluid-flow results and reservoir property predictions.
Slater, Hannah; Michael, Edwin
2013-01-01
There is increasing interest to control or eradicate the major neglected tropical diseases. Accurate modelling of the geographic distributions of parasitic infections will be crucial to this endeavour. We used 664 community level infection prevalence data collated from the published literature in conjunction with eight environmental variables, altitude and population density, and a multivariate Bayesian generalized linear spatial model that allows explicit accounting for spatial autocorrelation and incorporation of uncertainty in input data and model parameters, to construct the first spatially-explicit map describing LF prevalence distribution in Africa. We also ran the best-fit model against predictions made by the HADCM3 and CCCMA climate models for 2050 to predict the likely distributions of LF under future climate and population changes. We show that LF prevalence is strongly influenced by spatial autocorrelation between locations but is only weakly associated with environmental covariates. Infection prevalence, however, is found to be related to variations in population density. All associations with key environmental/demographic variables appear to be complex and non-linear. LF prevalence is predicted to be highly heterogenous across Africa, with high prevalences (>20%) estimated to occur primarily along coastal West and East Africa, and lowest prevalences predicted for the central part of the continent. Error maps, however, indicate a need for further surveys to overcome problems with data scarcity in the latter and other regions. Analysis of future changes in prevalence indicates that population growth rather than climate change per se will represent the dominant factor in the predicted increase/decrease and spread of LF on the continent. We indicate that these results could play an important role in aiding the development of strategies that are best able to achieve the goals of parasite elimination locally and globally in a manner that may also account
Modeling fine-scale soil surface structure using geostatistics
NASA Astrophysics Data System (ADS)
Croft, H.; Anderson, K.; Brazier, R. E.; Kuhn, N. J.
2013-04-01
There is widespread recognition that spatially distributed information on soil surface roughness (SSR) is required for hydrological and geomorphological applications. Such information is necessary to describe variability in soil structure, which is highly heterogeneous in time and space, to parameterize hydrology and erosion models and to understand the temporal evolution of the soil surface in response to rainfall. This paper demonstrates how results from semivariogram analysis can quantify key elements of SSR for such applications. Three soil types (silt, silt loam, and silty clay) were used to show how different types of structural variance in SSR evolve during simulated rainfall events. All three soil types were progressively degraded using artificial rainfall to produce a series of roughness states. A calibrated laser profiling instrument was used to measure SSR over a 10 cm × 10 cm spatial extent, at a 2 mm resolution. These data were geostatistically analyzed in the context of aggregate breakdown and soil crusting. The results show that such processes are represented by a quantifiable decrease in sill variance, from 7.81 (control) to 0.94 (after 60 min of rainfall). Soil surface features such as soil cracks, tillage lines and erosional areas were quantified by local maxima in semivariance at a given length scale. This research demonstrates that semivariogram analysis can retrieve spatiotemporal variations in soil surface condition; in order to provide information on hydrological pathways. Consequently, geostatistically derived SSR shows strong potential for inclusion as spatial information in hydrology and erosion models to represent complex surface processes at different soil structural scales.
Model-Based Geostatistical Mapping of the Prevalence of Onchocerca volvulus in West Africa
O’Hanlon, Simon J.; Slater, Hannah C.; Cheke, Robert A.; Boatin, Boakye A.; Coffeng, Luc E.; Pion, Sébastien D. S.; Boussinesq, Michel; Zouré, Honorat G. M.; Stolk, Wilma A.; Basáñez, María-Gloria
2016-01-01
Background The initial endemicity (pre-control prevalence) of onchocerciasis has been shown to be an important determinant of the feasibility of elimination by mass ivermectin distribution. We present the first geostatistical map of microfilarial prevalence in the former Onchocerciasis Control Programme in West Africa (OCP) before commencement of antivectorial and antiparasitic interventions. Methods and Findings Pre-control microfilarial prevalence data from 737 villages across the 11 constituent countries in the OCP epidemiological database were used as ground-truth data. These 737 data points, plus a set of statistically selected environmental covariates, were used in a Bayesian model-based geostatistical (B-MBG) approach to generate a continuous surface (at pixel resolution of 5 km x 5km) of microfilarial prevalence in West Africa prior to the commencement of the OCP. Uncertainty in model predictions was measured using a suite of validation statistics, performed on bootstrap samples of held-out validation data. The mean Pearson’s correlation between observed and estimated prevalence at validation locations was 0.693; the mean prediction error (average difference between observed and estimated values) was 0.77%, and the mean absolute prediction error (average magnitude of difference between observed and estimated values) was 12.2%. Within OCP boundaries, 17.8 million people were deemed to have been at risk, 7.55 million to have been infected, and mean microfilarial prevalence to have been 45% (range: 2–90%) in 1975. Conclusions and Significance This is the first map of initial onchocerciasis prevalence in West Africa using B-MBG. Important environmental predictors of infection prevalence were identified and used in a model out-performing those without spatial random effects or environmental covariates. Results may be compared with recent epidemiological mapping efforts to find areas of persisting transmission. These methods may be extended to areas where
NASA Astrophysics Data System (ADS)
Riva, Monica; Panzeri, Marco; Guadagnini, Alberto; Neuman, Shlomo P.
2011-07-01
We analyze theoretically the ability of model quality (sometimes termed information or discrimination) criteria such as the negative log likelihood NLL, Bayesian criteria BIC and KIC and information theoretic criteria AIC, AICc, and HIC to estimate (1) the parameter vector ? of the variogram of hydraulic log conductivity (Y = ln K), and (2) statistical parameters ? and ? proportional to head and log conductivity measurement error variances, respectively, in the context of geostatistical groundwater flow inversion. Our analysis extends the work of Hernandez et al. (2003, 2006) and Riva et al. (2009), who developed nonlinear stochastic inverse algorithms that allow conditioning estimates of steady state and transient hydraulic heads, fluxes and their associated uncertainty on information about conductivity and head data collected in a randomly heterogeneous confined aquifer. Their algorithms are based on recursive numerical approximations of exact nonlocal conditional equations describing the mean and (co)variance of groundwater flow. Log conductivity is parameterized geostatistically based on measured values at discrete locations and unknown values at discrete "pilot points." Optionally, the maximum likelihood function on which the inverse estimation of Y at pilot points is based may include a regularization term reflecting prior information about Y. The relative weight ? assigned to this term and its components ? and ?, as well as ? are evaluated separately from other model parameters to avoid bias and instability. This evaluation is done on the basis of criteria such as NLL, KIC, BIC, HIC, AIC, and AICc. We demonstrate theoretically that, whereas all these six criteria make it possible to estimate ?, KIC alone allows one to estimate validly ? and ? (and thus ?). We illustrate this discriminatory power of KIC numerically by using a differential evolution genetic search algorithm to minimize it in the context of a two-dimensional steady state groundwater flow
NASA Astrophysics Data System (ADS)
Illman, Walter A.; Berg, Steven J.; Zhao, Zhanfeng
2015-05-01
The robust performance of hydraulic tomography (HT) based on geostatistics has been demonstrated through numerous synthetic, laboratory, and field studies. While geostatistical inverse methods offer many advantages, one key disadvantage is its highly parameterized nature, which renders it computationally intensive for large-scale problems. Another issue is that geostatistics-based HT may produce overly smooth images of subsurface heterogeneity when there are few monitoring interval data. Therefore, some may question the utility of the geostatistical inversion approach in certain situations and seek alternative approaches. To investigate these issues, we simultaneously calibrated different groundwater models with varying subsurface conceptualizations and parameter resolutions using a laboratory sandbox aquifer. The compared models included: (1) isotropic and anisotropic effective parameter models; (2) a heterogeneous model that faithfully represents the geological features; and (3) a heterogeneous model based on geostatistical inverse modeling. The performance of these models was assessed by quantitatively examining the results from model calibration and validation. Calibration data consisted of steady state drawdown data from eight pumping tests and validation data consisted of data from 16 separate pumping tests not used in the calibration effort. Results revealed that the geostatistical inversion approach performed the best among the approaches compared, although the geological model that faithfully represented stratigraphy came a close second. In addition, when the number of pumping tests available for inverse modeling was small, the geological modeling approach yielded more robust validation results. This suggests that better knowledge of stratigraphy obtained via geophysics or other means may contribute to improved results for HT.
Bayesian Model Averaging for Propensity Score Analysis
ERIC Educational Resources Information Center
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Bayesian stable isotope mixing models
In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixtur...
Geostatistical Inverse Modeling for Natural Attenuation of Hydrocarbons in Groundwater
NASA Astrophysics Data System (ADS)
Hosseini, A. H.; Deutsch, C. V.; Mendoza, C. A.; Biggar, K. W.
2008-12-01
Parameter uncertainty for natural attenuation has been previously studied in the context of characterizing the uncertainty in the field measured biodegradation rate constant. Natural attenuation response variables (e.g. solute concentrations) should be stated in terms of a number of model parameters in such a way that (1) the most important mechanisms contributing to natural attenuation of petroleum hydrocarbons are simulated, (2) the independent variables (model parameters) and their uncertainty can be estimated using the available observations and prior information and (3) the model is not over-parameterized. Extensive sensitivity analyses show that the source term, aquifer heterogeneity and biodegradation rate of contaminants are the most important factors affecting the fate of dissolved petroleum hydrocarbon (PHC) contaminants in groundwater. A geostatistical inverse modeling approach is developed to quantify uncertainty in source geometry, source dissolution rate, aquifer heterogeneity and biodegradation rate constant. Multiple joint realizations of source geometry and aquifer transmissivity are constructed by distance function (DF) algorithm and sequential self calibration (SSC) approach. A gradient-based optimization approach is then adapted to condition the joint realizations to a number of observed concentrations recorded over a specific monitoring period. The conditioned joint realizations are then ranked based on their goodness of fit and used in the subsequent prediction of uncertainty in the response variables such as downstream concentrations, plume length and contaminant mass loaded into the aquifer. The inverse modeling approach and its associated calculation of sensitivity coefficients show that an extended monitoring period is significantly important in well-posedness of the problem; and an uncertainty in occurrence of the spill can have a minor impact on the modeling results as long as the observation data are collected while the contaminant
Bayesian kinematic earthquake source models
NASA Astrophysics Data System (ADS)
Minson, S. E.; Simons, M.; Beck, J. L.; Genrich, J. F.; Galetzka, J. E.; Chowdhury, F.; Owen, S. E.; Webb, F.; Comte, D.; Glass, B.; Leiva, C.; Ortega, F. H.
2009-12-01
Most coseismic, postseismic, and interseismic slip models are based on highly regularized optimizations which yield one solution which satisfies the data given a particular set of regularizing constraints. This regularization hampers our ability to answer basic questions such as whether seismic and aseismic slip overlap or instead rupture separate portions of the fault zone. We present a Bayesian methodology for generating kinematic earthquake source models with a focus on large subduction zone earthquakes. Unlike classical optimization approaches, Bayesian techniques sample the ensemble of all acceptable models presented as an a posteriori probability density function (PDF), and thus we can explore the entire solution space to determine, for example, which model parameters are well determined and which are not, or what is the likelihood that two slip distributions overlap in space. Bayesian sampling also has the advantage that all a priori knowledge of the source process can be used to mold the a posteriori ensemble of models. Although very powerful, Bayesian methods have up to now been of limited use in geophysical modeling because they are only computationally feasible for problems with a small number of free parameters due to what is called the "curse of dimensionality." However, our methodology can successfully sample solution spaces of many hundreds of parameters, which is sufficient to produce finite fault kinematic earthquake models. Our algorithm is a modification of the tempered Markov chain Monte Carlo (tempered MCMC or TMCMC) method. In our algorithm, we sample a "tempered" a posteriori PDF using many MCMC simulations running in parallel and evolutionary computation in which models which fit the data poorly are preferentially eliminated in favor of models which better predict the data. We present results for both synthetic test problems as well as for the 2007 Mw 7.8 Tocopilla, Chile earthquake, the latter of which is constrained by InSAR, local high
Geostatistical Modeling of Evolving Landscapes by Means of Image Quilting
NASA Astrophysics Data System (ADS)
Mendes, J. H.; Caers, J.; Scheidt, C.
2015-12-01
Realistic geological representation of subsurface heterogeneity remains an important outstanding challenge. While many geostatistical methods exist for representing sedimentary systems, such as multiple-point geostatistics, rule-based methods or Boolean methods, the question of what the prior uncertainty on parameters (or training images) of such algorithms are, remains outstanding. In this initial work, we investigate the use of flume experiments to constrain better such prior uncertainty and to start understanding what information should be provided to geostatistical algorithms. In particular, we study the use of image quilting as a novel multiple-point method for generating fast geostatistical realizations once a training image is provided. Image quilting is a method emanating from computer graphics where patterns are extracted from training images and then stochastically quilted along a raster path to create stochastic variation of the stated training image. In this initial study, we use a flume experiment and extract 10 training images as representative for the variability of the evolving landscape over a period of 136 minutes. The training images consists of wet/dry regions obtained from overhead shots taken over the flume experiment. To investigate whether such image quilting reproduces the same variability of the evolving landscape in terms of wet/dry regions, we generate multiple realizations with all 10 training images and compare that variability with the variability seen in the entire flume experiment. By proper tuning of the quilting parameters we find generally reasonable agreement with the flume experiment.
NASA Astrophysics Data System (ADS)
Reyes, J.; Vizuete, W.; Serre, M. L.; Xu, Y.
2015-12-01
The EPA employs a vast monitoring network to measure ambient PM2.5 concentrations across the United States with one of its goals being to quantify exposure within the population. However, there are several areas of the country with sparse monitoring spatially and temporally. One means to fill in these monitoring gaps is to use PM2.5 modeled estimates from Chemical Transport Models (CTMs) specifically the Community Multi-scale Air Quality (CMAQ) model. CMAQ is able to provide complete spatial coverage but is subject to systematic and random error due to model uncertainty. Due to the deterministic nature of CMAQ, often these uncertainties are not quantified. Much effort is employed to quantify the efficacy of these models through different metrics of model performance. Currently evaluation is specific to only locations with observed data. Multiyear studies across the United States are challenging because the error and model performance of CMAQ are not uniform over such large space/time domains. Error changes regionally and temporally. Because of the complex mix of species that constitute PM2.5, CMAQ error is also a function of increasing PM2.5 concentration. To address this issue we introduce a model performance evaluation for PM2.5 CMAQ that is regionalized and non-linear. This model performance evaluation leads to error quantification for each CMAQ grid. Areas and time periods of error being better qualified. The regionalized error correction approach is non-linear and is therefore more flexible at characterizing model performance than approaches that rely on linearity assumptions and assume homoscedasticity of CMAQ predictions errors. Corrected CMAQ data are then incorporated into the modern geostatistical framework of Bayesian Maximum Entropy (BME). Through cross validation it is shown that incorporating error-corrected CMAQ data leads to more accurate estimates than just using observed data by themselves.
Use of geostatistical modeling to capture complex geology in finite-element analyses
Rautman, C.A.; Longenbaugh, R.S.; Ryder, E.E.
1995-12-01
This paper summarizes a number of transient thermal analyses performed for a representative two-dimensional cross section of volcanic tuffs at Yucca Mountain using the finite element, nonlinear heat-conduction code COYOTE-II. In addition to conventional design analyses, in which material properties are formulated as a uniform single material and as horizontally layered, internally uniform matters, an attempt was made to increase the resemblance of the thermal property field to the actual geology by creating two fairly complex, geologically realistic models. The first model was created by digitizing an existing two-dimensional geologic cross section of Yucca Mountain. The second model was created using conditional geostatistical simulation. Direct mapping of geostatistically generated material property fields onto finite element computational meshes was demonstrated to yield temperature fields approximately equivalent to those generated through more conventional procedures. However, the ability to use the geostatistical models offers a means of simplifying the physical-process analyses.
Giardina, Federica; Franke, Jonas; Vounatsou, Penelope
2015-11-26
The study of malaria spatial epidemiology has benefited from recent advances in geographic information system and geostatistical modelling. Significant progress in earth observation technologies has led to the development of moderate, high and very high resolution imagery. Extensive literature exists on the relationship between malaria and environmental/climatic factors in different geographical areas, but few studies have linked human malaria parasitemia survey data with remote sensing-derived land cover/land use variables and very few have used Earth Observation products. Comparison among the different resolution products to model parasitemia has not yet been investigated. In this study, we probe a proximity measure to incorporate different land cover classes and assess the effect of the spatial resolution of remotely sensed land cover and elevation on malaria risk estimation in Mozambique after adjusting for other environmental factors at a fixed spatial resolution. We used data from the Demographic and Health survey carried out in 2011, which collected malaria parasitemia data on children from 0 to 5 years old, analysing them with a Bayesian geostatistical model. We compared the risk predicted using land cover and elevation at moderate resolution with the risk obtained employing the same variables at high resolution. We used elevation data at moderate and high resolution and the land cover layer from the Moderate Resolution Imaging Spectroradiometer as well as the one produced by MALAREO, a project covering part of Mozambique during 2010-2012 that was funded by the European Union's 7th Framework Program. Moreover, the number of infected children was predicted at different spatial resolutions using AFRIPOP population data and the enhanced population data generated by the MALAREO project for comparison of estimates. The Bayesian geostatistical model showed that the main determinants of malaria presence are precipitation and day temperature. However, the presence
Geostatistical modeling of a heterogeneous site bordering the Venice lagoon, Italy.
Trevisani, Sebastiano; Fabbri, Paolo
2010-01-01
Geostatistical methods are well suited for analyzing the local and spatial uncertainties that accompany the modeling of highly heterogeneous three-dimensional (3D) geological architectures. The spatial modeling of 3D hydrogeological architectures is crucial for polluted site characterization, in regards to both groundwater modeling and planning remediation procedures. From this perspective, the polluted site of Porto Marghera, located on the periphery of the Venice lagoon, represents an interesting example. For this site, the available dense spatial sampling network, with 769 boreholes over an area of 6 km(2), allows us to evaluate the high geological heterogeneity by means of indicator kriging and sequential indicator simulation. We show that geostatistical methodologies and ad hoc post processing of geostatistical analysis results allow us to effectively analyze the high hydrogeological heterogeneity of the studied site.
Bayesian Networks for Social Modeling
Whitney, Paul D.; White, Amanda M.; Walsh, Stephen J.; Dalton, Angela C.; Brothers, Alan J.
2011-03-28
This paper describes a body of work developed over the past five years. The work addresses the use of Bayesian network (BN) models for representing and predicting social/organizational behaviors. The topics covered include model construction, validation, and use. These topics show the bulk of the lifetime of such model, beginning with construction, moving to validation and other aspects of model ‘critiquing’, and finally demonstrating how the modeling approach might be used to inform policy analysis. To conclude, we discuss limitations of using BN for this activity and suggest remedies to address those limitations. The primary benefits of using a well-developed computational, mathematical, and statistical modeling structure, such as BN, are 1) there are significant computational, theoretical and capability bases on which to build 2) ability to empirically critique the model, and potentially evaluate competing models for a social/behavioral phenomena.
Modeling Diagnostic Assessments with Bayesian Networks
ERIC Educational Resources Information Center
Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego
2007-01-01
This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…
NASA Astrophysics Data System (ADS)
Yan, Hongxiang; Moradkhani, Hamid
2016-08-01
Assimilation of satellite soil moisture and streamflow data into a distributed hydrologic model has received increasing attention over the past few years. This study provides a detailed analysis of the joint and separate assimilation of streamflow and Advanced Scatterometer (ASCAT) surface soil moisture into a distributed Sacramento Soil Moisture Accounting (SAC-SMA) model, with the use of recently developed particle filter-Markov chain Monte Carlo (PF-MCMC) method. Performance is assessed over the Salt River Watershed in Arizona, which is one of the watersheds without anthropogenic effects in Model Parameter Estimation Experiment (MOPEX). A total of five data assimilation (DA) scenarios are designed and the effects of the locations of streamflow gauges and the ASCAT soil moisture on the predictions of soil moisture and streamflow are assessed. In addition, a geostatistical model is introduced to overcome the significantly biased satellite soil moisture and also discontinuity issue. The results indicate that: (1) solely assimilating outlet streamflow can lead to biased soil moisture estimation; (2) when the study area can only be partially covered by the satellite data, the geostatistical approach can estimate the soil moisture for those uncovered grid cells; (3) joint assimilation of streamflow and soil moisture from geostatistical modeling can further improve the surface soil moisture prediction. This study recommends that the geostatistical model is a helpful tool to aid the remote sensing technique and the hydrologic DA study.
Bayesian inference for OPC modeling
NASA Astrophysics Data System (ADS)
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
A geostatistical methodology to assess the accuracy of unsaturated flow models
Smoot, J.L.; Williams, R.E.
1996-04-01
The Pacific Northwest National Laboratory spatiotemporal movement of water injected into (PNNL) has developed a Hydrologic unsaturated sediments at the Hanford Site in Evaluation Methodology (HEM) to assist the Washington State was used to develop a new U.S. Nuclear Regulatory Commission in method for evaluating mathematical model evaluating the potential that infiltrating meteoric predictions. Measured water content data were water will produce leachate at commercial low- interpolated geostatistically to a 16 x 16 x 36 level radioactive waste disposal sites. Two key grid at several time intervals. Then a issues are raised in the HEM: (1) evaluation of mathematical model was used to predict water mathematical models that predict facility content at the same grid locations at the selected performance, and (2) estimation of the times. Node-by-node comparison of the uncertainty associated with these mathematical mathematical model predictions with the model predictions. The technical objective of geostatistically interpolated values was this research is to adapt geostatistical tools conducted. The method facilitates a complete commonly used for model parameter estimation accounting and categorization of model error at to the problem of estimating the spatial every node. The comparison suggests that distribution of the dependent variable to be model results generally are within measurement calculated by the model. To fulfill this error. The worst model error occurs in silt objective, a database describing the lenses and is in excess of measurement error.
NASA Astrophysics Data System (ADS)
martin, manuel; Lacarce, Eva; Meersmans, Jeroen; Orton, Thomas; Saby, Nicolas; Paroissien, Jean-Baptiste; Jolivet, Claudy; Boulonne, Line; Arrouays, Dominique
2013-04-01
Soil organic carbon (SOC) plays a major role in the global carbon budget. It can act as a source or a sink of atmospheric carbon, thereby possibly influencing the course of climate change. Improving the tools that model the spatial distributions of SOC stocks at national scales is a priority, both for monitoring changes in SOC and as an input for global carbon cycles studies. In this paper, first, we considered several increasingly complex boosted regression trees (BRT), a convenient and efficient multiple regression model from the statistical learning field. Further, we considered and a robust geostatistical approach coupled to the BRT models. Testing the different approaches was performed on the dataset from the French Soil Monitoring Network, with a consistent cross-validation procedure. We showed that the BRT models, given its ease of use and its predictive performance, could be preferred to geostatistical models for SOC mapping at the national scale, and if possible be joined with geostatistical models. This conclusion is valid provided that care is exercised in model fitting and validating, that the dataset does not allow for modeling local spatial autocorrelations, as it is the case for many national systematic sampling schemes, and when good quality data about SOC drivers included in the models is available.
Rautman, C.A.
1995-09-01
Two-dimensional, heterogeneous, spatially correlated models of thermal conductivity and bulk density have been created for a representative, east-west cross section of Yucca Mountain, Nevada, using geostatistical simulation. The thermal conductivity models are derived from spatially correlated, surrogate material-property models of porosity, through a multiple linear-regression equation, which expresses thermal conductivity as a function of porosity and initial temperature and saturation. Bulk-density values were obtained through a similar, linear-regression relationship with porosity. The use of a surrogate-property allows the use of spatially much-more-abundant porosity measurements to condition the simulations. Modeling was conducted in stratigraphic coordinates to represent original depositional continuity of material properties and the completed models were transformed to real-world coordinates to capture present-day tectonic tilting and faulting of the material-property units. Spatial correlation lengths required for geostatistical modeling were assumed, but are based on the results of previous transect-sampling and geostatistical-modeling work.
Bayesian Uncertainty Analyses Via Deterministic Model
NASA Astrophysics Data System (ADS)
Krzysztofowicz, R.
2001-05-01
Rational decision-making requires that the total uncertainty about a variate of interest (a predictand) be quantified in terms of a probability distribution, conditional on all available information and knowledge. Suppose the state-of-knowledge is embodied in a deterministic model, which is imperfect and outputs only an estimate of the predictand. Fundamentals are presented of three Bayesian approaches to producing a probability distribution of the predictand via any deterministic model. The Bayesian Processor of Output (BPO) quantifies the total uncertainty in terms of a posterior distribution, conditional on model output. The Bayesian Processor of Ensemble (BPE) quantifies the total uncertainty in terms of a posterior distribution, conditional on an ensemble of model output. The Bayesian Forecasting System (BFS) decomposes the total uncertainty into input uncertainty and model uncertainty, which are characterized independently and then integrated into a predictive distribution.
Bayesian Methods for High Dimensional Linear Models
Mallick, Himel; Yi, Nengjun
2013-01-01
In this article, we present a selective overview of some recent developments in Bayesian model and variable selection methods for high dimensional linear models. While most of the reviews in literature are based on conventional methods, we focus on recently developed methods, which have proven to be successful in dealing with high dimensional variable selection. First, we give a brief overview of the traditional model selection methods (viz. Mallow’s Cp, AIC, BIC, DIC), followed by a discussion on some recently developed methods (viz. EBIC, regularization), which have occupied the minds of many statisticians. Then, we review high dimensional Bayesian methods with a particular emphasis on Bayesian regularization methods, which have been used extensively in recent years. We conclude by briefly addressing the asymptotic behaviors of Bayesian variable selection methods for high dimensional linear models under different regularity conditions. PMID:24511433
Bayesian Methods for High Dimensional Linear Models.
Mallick, Himel; Yi, Nengjun
2013-06-01
In this article, we present a selective overview of some recent developments in Bayesian model and variable selection methods for high dimensional linear models. While most of the reviews in literature are based on conventional methods, we focus on recently developed methods, which have proven to be successful in dealing with high dimensional variable selection. First, we give a brief overview of the traditional model selection methods (viz. Mallow's Cp, AIC, BIC, DIC), followed by a discussion on some recently developed methods (viz. EBIC, regularization), which have occupied the minds of many statisticians. Then, we review high dimensional Bayesian methods with a particular emphasis on Bayesian regularization methods, which have been used extensively in recent years. We conclude by briefly addressing the asymptotic behaviors of Bayesian variable selection methods for high dimensional linear models under different regularity conditions.
Bayesian Modeling of a Human MMORPG Player
NASA Astrophysics Data System (ADS)
Synnaeve, Gabriel; Bessière, Pierre
2011-03-01
This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.
NASA Astrophysics Data System (ADS)
Linde, Niklas; Lochbühler, Tobias; Dogan, Mine; Van Dam, Remke L.
2015-12-01
We propose a new framework to compare alternative geostatistical descriptions of a given site. Multiple realizations of each of the considered geostatistical models and their corresponding tomograms (based on inversion of noise-contaminated simulated data) are used as a multivariate training image. The training image is scanned with a direct sampling algorithm to obtain conditional realizations of hydraulic conductivity that are not only in agreement with the geostatistical model, but also honor the spatially varying resolution of the site-specific tomogram. Model comparison is based on the quality of the simulated geophysical data from the ensemble of conditional realizations. The tomogram in this study is obtained by inversion of cross-hole ground-penetrating radar (GPR) first-arrival travel time data acquired at the MAcro-Dispersion Experiment (MADE) site in Mississippi (USA). Various heterogeneity descriptions ranging from multi-Gaussian fields to fields with complex multiple-point statistics inferred from outcrops are considered. Under the assumption that the relationship between porosity and hydraulic conductivity inferred from local measurements is valid, we find that conditioned multi-Gaussian realizations and derivatives thereof can explain the crosshole geophysical data. A training image based on an aquifer analog from Germany was found to be in better agreement with the geophysical data than the one based on the local outcrop, which appears to under-represent high hydraulic conductivity zones. These findings are only based on the information content in a single resolution-limited tomogram and extending the analysis to tracer or higher resolution surface GPR data might lead to different conclusions (e.g., that discrete facies boundaries are necessary). Our framework makes it possible to identify inadequate geostatistical models and petrophysical relationships, effectively narrowing the space of possible heterogeneity representations.
Karagiannis-Voules, Dimitrios-Alexios; Odermatt, Peter; Biedermann, Patricia; Khieu, Virak; Schär, Fabian; Muth, Sinuon; Utzinger, Jürg; Vounatsou, Penelope
2015-01-01
Soil-transmitted helminth infections are intimately connected with poverty. Yet, there is a paucity of using socioeconomic proxies in spatially explicit risk profiling. We compiled household-level socioeconomic data pertaining to sanitation, drinking-water, education and nutrition from readily available Demographic and Health Surveys, Multiple Indicator Cluster Surveys and World Health Surveys for Cambodia and aggregated the data at village level. We conducted a systematic review to identify parasitological surveys and made every effort possible to extract, georeference and upload the data in the open source Global Neglected Tropical Diseases database. Bayesian geostatistical models were employed to spatially align the village-aggregated socioeconomic predictors with the soil-transmitted helminth infection data. The risk of soil-transmitted helminth infection was predicted at a grid of 1×1km covering Cambodia. Additionally, two separate individual-level spatial analyses were carried out, for Takeo and Preah Vihear provinces, to assess and quantify the association between soil-transmitted helminth infection and socioeconomic indicators at an individual level. Overall, we obtained socioeconomic proxies from 1624 locations across the country. Surveys focussing on soil-transmitted helminth infections were extracted from 16 sources reporting data from 238 unique locations. We found that the risk of soil-transmitted helminth infection from 2000 onwards was considerably lower than in surveys conducted earlier. Population-adjusted prevalences for school-aged children from 2000 onwards were 28.7% for hookworm, 1.5% for Ascaris lumbricoides and 0.9% for Trichuris trichiura. Surprisingly, at the country-wide analyses, we did not find any significant association between soil-transmitted helminth infection and village-aggregated socioeconomic proxies. Based also on the individual-level analyses we conclude that socioeconomic proxies might not be good predictors at an
Karagiannis-Voules, Dimitrios-Alexios; Odermatt, Peter; Biedermann, Patricia; Khieu, Virak; Schär, Fabian; Muth, Sinuon; Utzinger, Jürg; Vounatsou, Penelope
2015-01-01
Soil-transmitted helminth infections are intimately connected with poverty. Yet, there is a paucity of using socioeconomic proxies in spatially explicit risk profiling. We compiled household-level socioeconomic data pertaining to sanitation, drinking-water, education and nutrition from readily available Demographic and Health Surveys, Multiple Indicator Cluster Surveys and World Health Surveys for Cambodia and aggregated the data at village level. We conducted a systematic review to identify parasitological surveys and made every effort possible to extract, georeference and upload the data in the open source Global Neglected Tropical Diseases database. Bayesian geostatistical models were employed to spatially align the village-aggregated socioeconomic predictors with the soil-transmitted helminth infection data. The risk of soil-transmitted helminth infection was predicted at a grid of 1×1km covering Cambodia. Additionally, two separate individual-level spatial analyses were carried out, for Takeo and Preah Vihear provinces, to assess and quantify the association between soil-transmitted helminth infection and socioeconomic indicators at an individual level. Overall, we obtained socioeconomic proxies from 1624 locations across the country. Surveys focussing on soil-transmitted helminth infections were extracted from 16 sources reporting data from 238 unique locations. We found that the risk of soil-transmitted helminth infection from 2000 onwards was considerably lower than in surveys conducted earlier. Population-adjusted prevalences for school-aged children from 2000 onwards were 28.7% for hookworm, 1.5% for Ascaris lumbricoides and 0.9% for Trichuris trichiura. Surprisingly, at the country-wide analyses, we did not find any significant association between soil-transmitted helminth infection and village-aggregated socioeconomic proxies. Based also on the individual-level analyses we conclude that socioeconomic proxies might not be good predictors at an
A conceptual sedimentological-geostatistical model of aquifer heterogeneity based on outcrop studies
Davis, J.M.
1994-01-01
Three outcrop studies were conducted in deposits of different depositional environments. At each site, permeability measurements were obtained with an air-minipermeameter developed as part of this study. In addition, the geological units were mapped with either surveying, photographs, or both. Geostatistical analysis of the permeability data was performed to estimate the characteristics of the probability distribution function and the spatial correlation structure. The information obtained from the geological mapping was then compared with the results of the geostatistical analysis for any relationships that may exist. The main field site was located in the Albuquerque Basin of central New Mexico at an outcrop of the Pliocene-Pleistocene Sierra Ladrones Formation. The second study was conducted on the walls of waste pits in alluvial fan deposits at the Nevada Test Site. The third study was conducted on an outcrop of an eolian deposit (miocene) south of Socorro, New Mexico. The results of the three studies were then used to construct a conceptual model relating depositional environment to geostatistical models of heterogeneity. The model presented is largely qualitative but provides a basis for further hypothesis formulation and testing.
2D Forward Modeling of Gravity Data Using Geostatistically Generated Subsurface Density Variations
NASA Astrophysics Data System (ADS)
Phelps, G. A.
2015-12-01
Two-dimensional (2D) forward models of synthetic gravity anomalies are calculated and compared to observed gravity anomalies using geostatistical models of density variations in the subsurface, constrained by geologic data. These models have an advantage over forward gravity models generated using polygonal bodies of homogeneous density because the homogeneous density restriction is relaxed, allowing density variations internal to geologic bodies to be considered. By discretizing the subsurface and calculating the cumulative gravitational effect of each cell, multiple forward models can be generated for a given geologic body, which expands the exploration of the solution space. Furthermore, the stochastic models can be designed to match the observed statistical properties of the internal densities of the geologic units being modeled. The results of such stochastically generated forward gravity models can then be compared with the observed data. To test this modeling approach, we compared stochastic forward gravity models of 2D geologic cross-sections to gravity data collected along a profile across the Vaca Fault near Fairfield, California. Three conceptual geologic models were created, each representing a distinct fault block scenario (normal, strike-slip, reverse) with four rock units in each model. Using fixed rock unit boundaries, the units were populated with geostatistically generated density values, characterized by their respective histogram and vertical variogram. The horizontal variogram could not be estimated because of lack of data, and was therefore left as a free parameter. Each fault block model had multiple geostatistical realizations of density associated with it. Forward models of gravity were then generated from the fault block model realizations, and rejection sampling was used to determine viable fault block density models. Given the constraints on subsurface density, the normal and strike-slip fault model were the most likely.
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Illman, Walter A.; Wöhling, Thomas; Nowak, Wolfgang
2015-12-01
Groundwater modelers face the challenge of how to assign representative parameter values to the studied aquifer. Several approaches are available to parameterize spatial heterogeneity in aquifer parameters. They differ in their conceptualization and complexity, ranging from homogeneous models to heterogeneous random fields. While it is common practice to invest more effort into data collection for models with a finer resolution of heterogeneities, there is a lack of advice which amount of data is required to justify a certain level of model complexity. In this study, we propose to use concepts related to Bayesian model selection to identify this balance. We demonstrate our approach on the characterization of a heterogeneous aquifer via hydraulic tomography in a sandbox experiment (Illman et al., 2010). We consider four increasingly complex parameterizations of hydraulic conductivity: (1) Effective homogeneous medium, (2) geology-based zonation, (3) interpolation by pilot points, and (4) geostatistical random fields. First, we investigate the shift in justified complexity with increasing amount of available data by constructing a model confusion matrix. This matrix indicates the maximum level of complexity that can be justified given a specific experimental setup. Second, we determine which parameterization is most adequate given the observed drawdown data. Third, we test how the different parameterizations perform in a validation setup. The results of our test case indicate that aquifer characterization via hydraulic tomography does not necessarily require (or justify) a geostatistical description. Instead, a zonation-based model might be a more robust choice, but only if the zonation is geologically adequate.
Hierarchical Bayesian models of cognitive development.
Glassen, Thomas; Nitsch, Verena
2016-06-01
This article provides an introductory overview of the state of research on Hierarchical Bayesian Modeling in cognitive development. First, a brief historical summary and a definition of hierarchies in Bayesian modeling are given. Subsequently, some model structures are described based on four examples in the literature. These are models for the development of the shape bias, for learning ontological kinds and causal schemata as well as for the categorization of objects. The Bayesian modeling approach is then compared with the connectionist and nativist modeling paradigms and considered in view of Marr's (1982) three description levels of information-processing mechanisms. In this context, psychologically plausible algorithms and ideas of their neural implementation are presented. In addition to criticism and limitations of the approach, research needs are identified. PMID:27222110
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
ERIC Educational Resources Information Center
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Testing Bayesian models of human coincidence timing.
Miyazaki, Makoto; Nozaki, Daichi; Nakajima, Yasoichi
2005-07-01
A sensorimotor control task often requires an accurate estimation of the timing of the arrival of an external target (e.g., when hitting a pitched ball). Conventional studies of human timing processes have ignored the stochastic features of target timing: e.g., the speed of the pitched ball is not generally constant, but is variable. Interestingly, based on Bayesian theory, it has been recently shown that the human sensorimotor system achieves the optimal estimation by integrating sensory information with prior knowledge of the probabilistic structure of the target variation. In this study, we tested whether Bayesian integration is also implemented while performing a coincidence-timing type of sensorimotor task by manipulating the trial-by-trial variability (i.e., the prior distribution) of the target timing. As a result, within several hundred trials of learning, subjects were able to generate systematic timing behavior according to the width of the prior distribution, as predicted by the optimal Bayesian model. Considering the previous studies showing that the human sensorimotor system uses Bayesian integration in spacing and force-grading tasks, our result indicates that Bayesian integration is fundamental to all aspects of human sensorimotor control. Moreover, it was noteworthy that the subjects could adjust their behavior both when the prior distribution was switched from wide to narrow and vice versa, although the adjustment was slower in the former case. Based on a comparison with observations in a previous study, we discuss the flexibility and adaptability of Bayesian sensorimotor learning.
NASA Astrophysics Data System (ADS)
Hodgetts, David; Burnham, Brian; Head, William; Jonathan, Atunima; Rarity, Franklin; Seers, Thomas; Spence, Guy
2013-04-01
In the hydrocarbon industry stochastic approaches are the main method by which reservoirs are modelled. These stochastic modelling approaches require geostatistical information on the geometry and distribution of the geological elements of the reservoir. As the reservoir itself cannot be viewed directly (only indirectly via seismic and/or well log data) this leads to a great deal of uncertainty in the geostatistics used, therefore outcrop analogues are characterised to help obtain the geostatistical information required to model the reservoir. Lidar derived Digital Outcrop Model's (DOM's) provide the ability to collect large quantities of statistical information on the geological architecture of the outcrop, far more than is possible by field work alone as the DOM allows accurate measurements to be made in normally inaccessible parts of the exposure. This increases the size of the measured statistical dataset, which in turn results in an increase in statistical significance. There are, however, many problems and biases in the data which cannot be overcome by sample size alone. These biases, for example, may relate to the orientation, size and quality of exposure, as well as the resolution of the DOM itself. Stochastic modelling used in the hydrocarbon industry fall mainly into 4 generic approaches: 1) Object Modelling where the geology is defined by a set of simplistic shapes (such as channels), where parameters such as width, height and orientation, among others, can be defined. 2) Sequential Indicator Simulations where geological shapes are less well defined and the size and distribution are defined using variograms. 3) Multipoint statistics where training images are used to define shapes and relationships between geological elements and 4) Discrete Fracture Networks for fractures reservoirs where information on fracture size and distribution are required. Examples of using DOM's to assist with each of these modelling approaches are presented, highlighting the
An Integrated Bayesian Model for DIF Analysis
ERIC Educational Resources Information Center
Soares, Tufi M.; Goncalves, Flavio B.; Gamerman, Dani
2009-01-01
In this article, an integrated Bayesian model for differential item functioning (DIF) analysis is proposed. The model is integrated in the sense of modeling the responses along with the DIF analysis. This approach allows DIF detection and explanation in a simultaneous setup. Previous empirical studies and/or subjective beliefs about the item…
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
The Bayesian bridge between simple and universal kriging
Omre, H.; Halvorsen, K.B. )
1989-10-01
Kriging techniques are suited well for evaluation of continuous, spatial phenomena. Bayesian statistics are characterized by using prior qualified guesses on the model parameters. By merging kriging techniques and Bayesian theory, prior guesses may be used in a spatial setting. Partial knowledge of model parameters defines a continuum of models between what is named simple and universal kriging in geostatistical terminology. The Bayesian approach to kriging is developed and discussed, and a case study concerning depth conversion of seismic reflection times is presented.
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhaïl
2013-04-01
The present research deals with the exploration and modeling of a complex dataset of 200 measurement points of sediment pollution by heavy metals in Lake Geneva. The fundamental idea was to use multivariate Artificial Neural Networks (ANN) along with geostatistical models and tools in order to improve the accuracy and the interpretability of data modeling. The results obtained with ANN were compared to those of traditional geostatistical algorithms like ordinary (co)kriging and (co)kriging with an external drift. Exploratory data analysis highlighted a great variety of relationships (i.e. linear, non-linear, independence) between the 11 variables of the dataset (i.e. Cadmium, Mercury, Zinc, Copper, Titanium, Chromium, Vanadium and Nickel as well as the spatial coordinates of the measurement points and their depth). Then, exploratory spatial data analysis (i.e. anisotropic variography, local spatial correlations and moving window statistics) was carried out. It was shown that the different phenomena to be modeled were characterized by high spatial anisotropies, complex spatial correlation structures and heteroscedasticity. A feature selection procedure based on General Regression Neural Networks (GRNN) was also applied to create subsets of variables enabling to improve the predictions during the modeling phase. The basic modeling was conducted using a Multilayer Perceptron (MLP) which is a workhorse of ANN. MLP models are robust and highly flexible tools which can incorporate in a nonlinear manner different kind of high-dimensional information. In the present research, the input layer was made of either two (spatial coordinates) or three neurons (when depth as auxiliary information could possibly capture an underlying trend) and the output layer was composed of one (univariate MLP) to eight neurons corresponding to the heavy metals of the dataset (multivariate MLP). MLP models with three input neurons can be referred to as Artificial Neural Networks with EXternal
Heterogeneous Factor Analysis Models: A Bayesian Approach.
ERIC Educational Resources Information Center
Ansari, Asim; Jedidi, Kamel; Dube, Laurette
2002-01-01
Developed Markov Chain Monte Carlo procedures to perform Bayesian inference, model checking, and model comparison in heterogeneous factor analysis. Tested the approach with synthetic data and data from a consumption emotion study involving 54 consumers. Results show that traditional psychometric methods cannot fully capture the heterogeneity in…
Survey of Bayesian Models for Modelling of Stochastic Temporal Processes
Ng, B
2006-10-12
This survey gives an overview of popular generative models used in the modeling of stochastic temporal systems. In particular, this survey is organized into two parts. The first part discusses the discrete-time representations of dynamic Bayesian networks and dynamic relational probabilistic models, while the second part discusses the continuous-time representation of continuous-time Bayesian networks.
Building on crossvalidation for increasing the quality of geostatistical modeling
Olea, R.A.
2012-01-01
The random function is a mathematical model commonly used in the assessment of uncertainty associated with a spatially correlated attribute that has been partially sampled. There are multiple algorithms for modeling such random functions, all sharing the requirement of specifying various parameters that have critical influence on the results. The importance of finding ways to compare the methods and setting parameters to obtain results that better model uncertainty has increased as these algorithms have grown in number and complexity. Crossvalidation has been used in spatial statistics, mostly in kriging, for the analysis of mean square errors. An appeal of this approach is its ability to work with the same empirical sample available for running the algorithms. This paper goes beyond checking estimates by formulating a function sensitive to conditional bias. Under ideal conditions, such function turns into a straight line, which can be used as a reference for preparing measures of performance. Applied to kriging, deviations from the ideal line provide sensitivity to the semivariogram lacking in crossvalidation of kriging errors and are more sensitive to conditional bias than analyses of errors. In terms of stochastic simulation, in addition to finding better parameters, the deviations allow comparison of the realizations resulting from the applications of different methods. Examples show improvements of about 30% in the deviations and approximately 10% in the square root of mean square errors between reasonable starting modelling and the solutions according to the new criteria. ?? 2011 US Government.
Boden, Sven; Rogiers, Bart; Jacques, Diederik
2013-09-01
Decommissioning of nuclear building structures usually leads to large amounts of low level radioactive waste. Using a reliable method to determine the contamination depth is indispensable prior to the start of decontamination works and also for minimizing the radioactive waste volume and the total workload. The method described in this paper is based on geostatistical modeling of in situ gamma-ray spectroscopy measurements using the multiple photo peak method. The method has been tested on the floor of the waste gas surge tank room within the BR3 (Belgian Reactor 3) decommissioning project and has delivered adequate results.
Boden, Sven; Rogiers, Bart; Jacques, Diederik
2013-09-01
Decommissioning of nuclear building structures usually leads to large amounts of low level radioactive waste. Using a reliable method to determine the contamination depth is indispensable prior to the start of decontamination works and also for minimizing the radioactive waste volume and the total workload. The method described in this paper is based on geostatistical modeling of in situ gamma-ray spectroscopy measurements using the multiple photo peak method. The method has been tested on the floor of the waste gas surge tank room within the BR3 (Belgian Reactor 3) decommissioning project and has delivered adequate results. PMID:23722072
Joint space-time geostatistical model for air quality surveillance
NASA Astrophysics Data System (ADS)
Russo, A.; Soares, A.; Pereira, M. J.
2009-04-01
Air pollution and peoples' generalized concern about air quality are, nowadays, considered to be a global problem. Although the introduction of rigid air pollution regulations has reduced pollution from industry and power stations, the growing number of cars on the road poses a new pollution problem. Considering the characteristics of the atmospheric circulation and also the residence times of certain pollutants in the atmosphere, a generalized and growing interest on air quality issues led to research intensification and publication of several articles with quite different levels of scientific depth. As most natural phenomena, air quality can be seen as a space-time process, where space-time relationships have usually quite different characteristics and levels of uncertainty. As a result, the simultaneous integration of space and time is not an easy task to perform. This problem is overcome by a variety of methodologies. The use of stochastic models and neural networks to characterize space-time dispersion of air quality is becoming a common practice. The main objective of this work is to produce an air quality model which allows forecasting critical concentration episodes of a certain pollutant by means of a hybrid approach, based on the combined use of neural network models and stochastic simulations. A stochastic simulation of the spatial component with a space-time trend model is proposed to characterize critical situations, taking into account data from the past and a space-time trend from the recent past. To identify near future critical episodes, predicted values from neural networks are used at each monitoring station. In this paper, we describe the design of a hybrid forecasting tool for ambient NO2 concentrations in Lisbon, Portugal.
Estimating malaria burden in Nigeria: a geostatistical modelling approach.
Onyiri, Nnadozie
2015-01-01
This study has produced a map of malaria prevalence in Nigeria based on available data from the Mapping Malaria Risk in Africa (MARA) database, including all malaria prevalence surveys in Nigeria that could be geolocated, as well as data collected during fieldwork in Nigeria between March and June 2007. Logistic regression was fitted to malaria prevalence to identify significant demographic (age) and environmental covariates in STATA. The following environmental covariates were included in the spatial model: the normalized difference vegetation index, the enhanced vegetation index, the leaf area index, the land surface temperature for day and night, land use/landcover (LULC), distance to water bodies, and rainfall. The spatial model created suggests that the two main environmental covariates correlating with malaria presence were land surface temperature for day and rainfall. It was also found that malaria prevalence increased with distance to water bodies up to 4 km. The malaria risk map estimated from the spatial model shows that malaria prevalence in Nigeria varies from 20% in certain areas to 70% in others. The highest prevalence rates were found in the Niger Delta states of Rivers and Bayelsa, the areas surrounding the confluence of the rivers Niger and Benue, and also isolated parts of the north-eastern and north-western parts of the country. Isolated patches of low malaria prevalence were found to be scattered around the country with northern Nigeria having more such areas than the rest of the country. Nigeria's belt of middle regions generally has malaria prevalence of 40% and above. PMID:26618305
Hierarchical Bayesian Models of Subtask Learning
ERIC Educational Resources Information Center
Anglim, Jeromy; Wynton, Sarah K. A.
2015-01-01
The current study used Bayesian hierarchical methods to challenge and extend previous work on subtask learning consistency. A general model of individual-level subtask learning was proposed focusing on power and exponential functions with constraints to test for inconsistency. To study subtask learning, we developed a novel computer-based booking…
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Del Monego, Maurici; Ribeiro, Paulo Justiniano; Ramos, Patrícia
2015-04-01
In this work, kriging with covariates is used to model and map the spatial distribution of salinity measurements gathered by an autonomous underwater vehicle in a sea outfall monitoring campaign aiming to distinguish the effluent plume from the receiving waters and characterize its spatial variability in the vicinity of the discharge. Four different geostatistical linear models for salinity were assumed, where the distance to diffuser, the west-east positioning, and the south-north positioning were used as covariates. Sample variograms were fitted by the Matèrn models using weighted least squares and maximum likelihood estimation methods as a way to detect eventual discrepancies. Typically, the maximum likelihood method estimated very low ranges which have limited the kriging process. So, at least for these data sets, weighted least squares showed to be the most appropriate estimation method for variogram fitting. The kriged maps show clearly the spatial variation of salinity, and it is possible to identify the effluent plume in the area studied. The results obtained show some guidelines for sewage monitoring if a geostatistical analysis of the data is in mind. It is important to treat properly the existence of anomalous values and to adopt a sampling strategy that includes transects parallel and perpendicular to the effluent dispersion. PMID:25345922
Del Monego, Maurici; Ribeiro, Paulo Justiniano; Ramos, Patrícia
2015-04-01
In this work, kriging with covariates is used to model and map the spatial distribution of salinity measurements gathered by an autonomous underwater vehicle in a sea outfall monitoring campaign aiming to distinguish the effluent plume from the receiving waters and characterize its spatial variability in the vicinity of the discharge. Four different geostatistical linear models for salinity were assumed, where the distance to diffuser, the west-east positioning, and the south-north positioning were used as covariates. Sample variograms were fitted by the Matèrn models using weighted least squares and maximum likelihood estimation methods as a way to detect eventual discrepancies. Typically, the maximum likelihood method estimated very low ranges which have limited the kriging process. So, at least for these data sets, weighted least squares showed to be the most appropriate estimation method for variogram fitting. The kriged maps show clearly the spatial variation of salinity, and it is possible to identify the effluent plume in the area studied. The results obtained show some guidelines for sewage monitoring if a geostatistical analysis of the data is in mind. It is important to treat properly the existence of anomalous values and to adopt a sampling strategy that includes transects parallel and perpendicular to the effluent dispersion.
Singdahlsen, D.S. )
1991-06-01
The East Painter structure is a doubly plunging, asymmetric anticline formed on the hanging wall of a back-thrust imbricate near the leading edge of the Absaroka Thrust. The Jurassic Nugget Sandstone is the productive horizon in the East Painter structure. The approximately 900-ft-thick Nugget is a stratigraphically complex and heterogeneous unit deposited by eolian processes in a complex erg setting. The high degree of heterogeneity iwthin the Nugget results from variations in grain size, sorting, mineralogy, and degree and distribution of lamination. The Nugget is comprised of dune, transitional toeset, and interdune facies, each exhibiting different porosity and permeability distributions. Gacies architecture results in both vertical and horizontal stratification of the reservoir. Adequate representation of reservoir heterogeneity is the key to successful modeling of past and future production performance. In addition, a detailed geologic model, based on depositional environment, must be integrated into the simulation to ensure realistic results. Geostatistics provide a method for modeling the spatial reservoir property distirbution while honoring all data values at their sample location. Conditional simulation is a geostatistical technique that generates several equally probably realizations that observe the data and spatial constraints imposed upon them while including fractal variability. Flow simulations of multiple reservoir realizations can provide a probability distribution of reservoir performance that can be used to evaluate risk associated with a project caused by the imcomplete sampling of the reservoir property distribution.
NASA Astrophysics Data System (ADS)
Michalak, Anna M.; Kitanidis, Peter K.
2004-08-01
As the incidence of groundwater contamination continues to grow, a number of inverse modeling methods have been developed to address forensic groundwater problems. In this work the geostatistical approach to inverse modeling is extended to allow for the recovery of the antecedent distribution of a contaminant at a given point back in time, which is critical to the assessment of historical exposure to contamination. Such problems are typically strongly underdetermined, with a large number of points at which the distribution is to be estimated. To address this challenge, the computational efficiency of the new method is increased through the application of the adjoint state method. In addition, the adjoint problem is presented in a format that allows for the reuse of existing groundwater flow and transport codes as modules in the inverse modeling algorithm. As demonstrated in the presented applications, the geostatistical approach combined with the adjoint state method allow for a historical multidimensional contaminant distribution to be recovered even in heterogeneous media, where a numerical solution is required for the forward problem.
A Bayesian approach for the multiplicative binomial regression model
NASA Astrophysics Data System (ADS)
Paraíba, Carolina C. M.; Diniz, Carlos A. R.; Pires, Rubiane M.
2012-10-01
In the present paper, we focus our attention on Altham's multiplicative binomial model under the Bayesian perspective, modeling both the probability of success and the dispersion parameters. We present results based on a simulated data set to access the quality of Bayesian estimates and Bayesian diagnostic for model assessment.
Normativity, interpretation, and Bayesian models
Oaksford, Mike
2014-01-01
It has been suggested that evaluative normativity should be expunged from the psychology of reasoning. A broadly Davidsonian response to these arguments is presented. It is suggested that two distinctions, between different types of rationality, are more permeable than this argument requires and that the fundamental objection is to selecting theories that make the most rational sense of the data. It is argued that this is inevitable consequence of radical interpretation where understanding others requires assuming they share our own norms of reasoning. This requires evaluative normativity and it is shown that when asked to evaluate others’ arguments participants conform to rational Bayesian norms. It is suggested that logic and probability are not in competition and that the variety of norms is more limited than the arguments against evaluative normativity suppose. Moreover, the universality of belief ascription suggests that many of our norms are universal and hence evaluative. It is concluded that the union of evaluative normativity and descriptive psychology implicit in Davidson and apparent in the psychology of reasoning is a good thing. PMID:24860519
Rhea, L.; Person, M.; Marsily, G. de; Ledoux, E.; Galli, A.
1994-11-01
This paper critically evaluates the utility of two different geostatistical methods in tracing long-distance oil migration through sedimentary basins. Geostatistical models of petroleum migration based on kriging and the conditional simulation method are assessed by comparing them to {open_quotes}known{close_quotes} oil migration rates and directions through a numerical carrier bed. In this example, the numerical carrier bed, which serves as {open_quotes}ground truth{close_quotes} in the study, incorporates a synthetic permeability field generated using the method of turning bands. Different representations of lateral permeability heterogeneity of the carrier bed are incorporated into a quasi-three-dimensional model of secondary oil migration. The geometric configuration of the carrier bed is intended to represent migration conditions within the center of a saucer-shaped intracratonic sag basin. In all of the numerical experiments, oil is sourced in the lowest 10% of a saucer-shaped carrier bed and migrates 10-14 km outward in a radial fashion by buoyancy. The effects of vertical permeability variations on secondary oil migration were not considered in the study.
A comparative study of Gaussian geostatistical models and Gaussian Markov random field models1
Song, Hae-Ryoung; Fuentes, Montserrat; Ghosh, Sujit
2008-01-01
Gaussian geostatistical models (GGMs) and Gaussian Markov random fields (GM-RFs) are two distinct approaches commonly used in spatial models for modeling point referenced and areal data, respectively. In this paper, the relations between GGMs and GMRFs are explored based on approximations of GMRFs by GGMs, and approximations of GGMs by GMRFs. Two new metrics of approximation are proposed: (i) the Kullback-Leibler discrepancy of spectral densities and (ii) the chi-squared distance between spectral densities. The distances between the spectral density functions of GGMs and GMRFs measured by these metrics are minimized to obtain the approximations of GGMs and GMRFs. The proposed methodologies are validated through several empirical studies. We compare the performance of our approach to other methods based on covariance functions, in terms of the average mean squared prediction error and also the computational time. A spatial analysis of a dataset on PM2.5 collected in California is presented to illustrate the proposed method. PMID:19337581
Hierarchical Bayesian model updating for structural identification
NASA Astrophysics Data System (ADS)
Behmanesh, Iman; Moaveni, Babak; Lombaert, Geert; Papadimitriou, Costas
2015-12-01
A new probabilistic finite element (FE) model updating technique based on Hierarchical Bayesian modeling is proposed for identification of civil structural systems under changing ambient/environmental conditions. The performance of the proposed technique is investigated for (1) uncertainty quantification of model updating parameters, and (2) probabilistic damage identification of the structural systems. Accurate estimation of the uncertainty in modeling parameters such as mass or stiffness is a challenging task. Several Bayesian model updating frameworks have been proposed in the literature that can successfully provide the "parameter estimation uncertainty" of model parameters with the assumption that there is no underlying inherent variability in the updating parameters. However, this assumption may not be valid for civil structures where structural mass and stiffness have inherent variability due to different sources of uncertainty such as changing ambient temperature, temperature gradient, wind speed, and traffic loads. Hierarchical Bayesian model updating is capable of predicting the overall uncertainty/variability of updating parameters by assuming time-variability of the underlying linear system. A general solution based on Gibbs Sampler is proposed to estimate the joint probability distributions of the updating parameters. The performance of the proposed Hierarchical approach is evaluated numerically for uncertainty quantification and damage identification of a 3-story shear building model. Effects of modeling errors and incomplete modal data are considered in the numerical study.
Posterior Predictive Bayesian Phylogenetic Model Selection
Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-01-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892
Piel, Frédéric B; Patil, Anand P; Howes, Rosalind E; Nyangiri, Oscar A; Gething, Peter W; Dewi, Mewahyu; Temperley, William H; Williams, Thomas N; Weatherall, David J; Hay, Simon I
2013-01-01
Summary Background Reliable estimates of populations affected by diseases are necessary to guide efficient allocation of public health resources. Sickle haemoglobin (HbS) is the most common and clinically significant haemoglobin structural variant, but no contemporary estimates exist of the global populations affected. Moreover, the precision of available national estimates of heterozygous (AS) and homozygous (SS) neonates is unknown. We aimed to provide evidence-based estimates at various scales, with uncertainty measures. Methods Using a database of sickle haemoglobin surveys, we created a contemporary global map of HbS allele frequency distribution within a Bayesian geostatistical model. The pairing of this map with demographic data enabled calculation of global, regional, and national estimates of the annual number of AS and SS neonates. Subnational estimates were also calculated in data-rich areas. Findings Our map shows subnational spatial heterogeneities and high allele frequencies across most of sub-Saharan Africa, the Middle East, and India, as well as gene flow following migrations to western Europe and the eastern coast of the Americas. Accounting for local heterogeneities and demographic factors, we estimated that the global number of neonates affected by HbS in 2010 included 5 476 000 (IQR 5 291 000–5 679 000) AS neonates and 312 000 (294 000–330 000) SS neonates. These global estimates are higher than previous conservative estimates. Important differences predicted at the national level are discussed. Interpretation HbS will have an increasing effect on public health systems. Our estimates can help countries and the international community gauge the need for appropriate diagnoses and genetic counselling to reduce the number of neonates affected. Similar mapping and modelling methods could be used for other inherited disorders. Funding The Wellcome Trust. PMID:23103089
Local Geostatistical Models and Big Data in Hydrological and Ecological Applications
NASA Astrophysics Data System (ADS)
Hristopulos, Dionissios
2015-04-01
The advent of the big data era creates new opportunities for environmental and ecological modelling but also presents significant challenges. The availability of remote sensing images and low-cost wireless sensor networks implies that spatiotemporal environmental data to cover larger spatial domains at higher spatial and temporal resolution for longer time windows. Handling such voluminous data presents several technical and scientific challenges. In particular, the geostatistical methods used to process spatiotemporal data need to overcome the dimensionality curse associated with the need to store and invert large covariance matrices. There are various mathematical approaches for addressing the dimensionality problem, including change of basis, dimensionality reduction, hierarchical schemes, and local approximations. We present a Stochastic Local Interaction (SLI) model that can be used to model local correlations in spatial data. SLI is a random field model suitable for data on discrete supports (i.e., regular lattices or irregular sampling grids). The degree of localization is determined by means of kernel functions and appropriate bandwidths. The strength of the correlations is determined by means of coefficients. In the "plain vanilla" version the parameter set involves scale and rigidity coefficients as well as a characteristic length. The latter determines in connection with the rigidity coefficient the correlation length of the random field. The SLI model is based on statistical field theory and extends previous research on Spartan spatial random fields [2,3] from continuum spaces to explicitly discrete supports. The SLI kernel functions employ adaptive bandwidths learned from the sampling spatial distribution [1]. The SLI precision matrix is expressed explicitly in terms of the model parameter and the kernel function. Hence, covariance matrix inversion is not necessary for parameter inference that is based on leave-one-out cross validation. This property
Modelling ambient ozone in an urban area using an objective model and geostatistical algorithms
NASA Astrophysics Data System (ADS)
Moral, Francisco J.; Rebollo, Francisco J.; Valiente, Pablo; López, Fernando; Muñoz de la Peña, Arsenio
2012-12-01
Ground-level tropospheric ozone is one of the air pollutants of most concern. Ozone levels continue to exceed both target values and the long-term objectives established in EU legislation to protect human health and prevent damage to ecosystems, agricultural crops and materials. Researchers or decision-makers frequently need information about atmospheric pollution patterns in urbanized areas. The preparation of this type of information is a complex task, due to the influence of several factors and their variability over time. In this work, some results of urban ozone distribution patterns in the city of Badajoz, which is the largest (140,000 inhabitants) and most industrialized city in Extremadura region (southwest Spain) are shown. Twelve sampling campaigns, one per month, were carried out to measure ambient air ozone concentrations, during periods that were selected according to favourable conditions to ozone production, using an automatic portable analyzer. Later, to evaluate the overall ozone level at each sampling location during the time interval considered, the measured ozone data were analysed using a new methodology based on the formulation of the Rasch model. As a result, a measure of overall ozone level which consolidates the monthly ground-level ozone measurements was obtained, getting moreover information about the influence on the overall ozone level of each monthly ozone measure. Finally, overall ozone level at locations where no measurements were available was estimated with geostatistical techniques and hazard assessment maps based on the spatial distribution of ozone were also generated.
Bayesian Recurrent Neural Network for Language Modeling.
Chien, Jen-Tzung; Ku, Yuan-Chu
2016-02-01
A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.
Geostatistical three-dimensional modeling of oolite shoals, St. Louis Limestone, southwest Kansas
Qi, L.; Carr, T.R.; Goldstein, R.H.
2007-01-01
In the Hugoton embayment of southwestern Kansas, reservoirs composed of relatively thin (<4 m; <13.1 ft) oolitic deposits within the St. Louis Limestone have produced more than 300 million bbl of oil. The geometry and distribution of oolitic deposits control the heterogeneity of the reservoirs, resulting in exploration challenges and relatively low recovery. Geostatistical three-dimensional (3-D) models were constructed to quantify the geometry and spatial distribution of oolitic reservoirs, and the continuity of flow units within Big Bow and Sand Arroyo Creek fields. Lithofacies in uncored wells were predicted from digital logs using a neural network. The tilting effect from the Laramide orogeny was removed to construct restored structural surfaces at the time of deposition. Well data and structural maps were integrated to build 3-D models of oolitic reservoirs using stochastic simulations with geometry data. Three-dimensional models provide insights into the distribution, the external and internal geometry of oolitic deposits, and the sedimentologic processes that generated reservoir intervals. The structural highs and general structural trend had a significant impact on the distribution and orientation of the oolitic complexes. The depositional pattern and connectivity analysis suggest an overall aggradation of shallow-marine deposits during pulses of relative sea level rise followed by deepening near the top of the St. Louis Limestone. Cemented oolitic deposits were modeled as barriers and baffles and tend to concentrate at the edge of oolitic complexes. Spatial distribution of porous oolitic deposits controls the internal geometry of rock properties. Integrated geostatistical modeling methods can be applicable to other complex carbonate or siliciclastic reservoirs in shallow-marine settings. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
Goovaerts, Pierre; Trinh, Hoa T; Demond, Avery H; Towey, Timothy; Chang, Shu-Chi; Gwinn, Danielle; Hong, Biling; Franzblau, Alfred; Garabrant, David; Gillespie, Brenda W; Lepkowski, James; Adriaens, Peter
2008-05-15
A key component in any investigation of cause-effect relationships between point source pollution, such as an incinerator, and human health is the availability of measurements and/or accurate models of exposure at the same scale or geography as the health data. Geostatistics allows one to simulate the spatial distribution of pollutant concentrations over various spatial supports while incorporating both field data and predictions of deterministic dispersion models. This methodology was used in a companion paper to identify the census blocks that have a high probability of exceeding a given level of dioxin TEQ (toxic equivalents) around an incinerator in Midland, MI. This geostatistical model, along with population data, provided guidance for the collection of 51 new soil data, which permits the verification of the geostatistical predictions, and calibration of the model. Each new soil measurement was compared to the set of 100 TEQ values simulated at the closest grid node. The correlation between the measured concentration and the averaged simulated value is moderate (0.44), and the actual concentrations are clearly overestimated in the vicinity of the plant property line. Nevertheless, probability intervals computed from simulated TEQ values provide an accurate model of uncertainty: the proportion of observations that fall within these intervals exceeds what is expected from the model. Simulation-based probability intervals are also narrower than the intervals derived from the global histogram of the data, which demonstrates the greater precision of the geostatistical approach. Log-normal ordinary kriging provided fairly similar estimation results for the small and well-sampled area used in this validation study; however, the model of uncertainty was not always accurate. The regression analysis and geostatistical simulation were then conducted using the combined set of 53 original and 51 new soil samples, leading to an updated model for the spatial distribution of
Bayesian population modeling of drug dosing adherence.
Fellows, Kelly; Stoneking, Colin J; Ramanathan, Murali
2015-10-01
Adherence is a frequent contributing factor to variations in drug concentrations and efficacy. The purpose of this work was to develop an integrated population model to describe variation in adherence, dose-timing deviations, overdosing and persistence to dosing regimens. The hybrid Markov chain-von Mises method for modeling adherence in individual subjects was extended to the population setting using a Bayesian approach. Four integrated population models for overall adherence, the two-state Markov chain transition parameters, dose-timing deviations, overdosing and persistence were formulated and critically compared. The Markov chain-Monte Carlo algorithm was used for identifying distribution parameters and for simulations. The model was challenged with medication event monitoring system data for 207 hypertension patients. The four Bayesian models demonstrated good mixing and convergence characteristics. The distributions of adherence, dose-timing deviations, overdosing and persistence were markedly non-normal and diverse. The models varied in complexity and the method used to incorporate inter-dependence with the preceding dose in the two-state Markov chain. The model that incorporated a cooperativity term for inter-dependence and a hyperbolic parameterization of the transition matrix probabilities was identified as the preferred model over the alternatives. The simulated probability densities from the model satisfactorily fit the observed probability distributions of adherence, dose-timing deviations, overdosing and persistence parameters in the sample patients. The model also adequately described the median and observed quartiles for these parameters. The Bayesian model for adherence provides a parsimonious, yet integrated, description of adherence in populations. It may find potential applications in clinical trial simulations and pharmacokinetic-pharmacodynamic modeling. PMID:26319548
Bayesian model selection analysis of WMAP3
Parkinson, David; Mukherjee, Pia; Liddle, Andrew R.
2006-06-15
We present a Bayesian model selection analysis of WMAP3 data using our code CosmoNest. We focus on the density perturbation spectral index n{sub S} and the tensor-to-scalar ratio r, which define the plane of slow-roll inflationary models. We find that while the Bayesian evidence supports the conclusion that n{sub S}{ne}1, the data are not yet powerful enough to do so at a strong or decisive level. If tensors are assumed absent, the current odds are approximately 8 to 1 in favor of n{sub S}{ne}1 under our assumptions, when WMAP3 data is used together with external data sets. WMAP3 data on its own is unable to distinguish between the two models. Further, inclusion of r as a parameter weakens the conclusion against the Harrison-Zel'dovich case (n{sub S}=1, r=0), albeit in a prior-dependent way. In appendices we describe the CosmoNest code in detail, noting its ability to supply posterior samples as well as to accurately compute the Bayesian evidence. We make a first public release of CosmoNest, now available at www.cosmonest.org.
Enhancing multiple-point geostatistical modeling: 1. Graph theory and pattern adjustment
NASA Astrophysics Data System (ADS)
Tahmasebi, Pejman; Sahimi, Muhammad
2016-03-01
In recent years, higher-order geostatistical methods have been used for modeling of a wide variety of large-scale porous media, such as groundwater aquifers and oil reservoirs. Their popularity stems from their ability to account for qualitative data and the great flexibility that they offer for conditioning the models to hard (quantitative) data, which endow them with the capability for generating realistic realizations of porous formations with very complex channels, as well as features that are mainly a barrier to fluid flow. One group of such models consists of pattern-based methods that use a set of data points for generating stochastic realizations by which the large-scale structure and highly-connected features are reproduced accurately. The cross correlation-based simulation (CCSIM) algorithm, proposed previously by the authors, is a member of this group that has been shown to be capable of simulating multimillion cell models in a matter of a few CPU seconds. The method is, however, sensitive to pattern's specifications, such as boundaries and the number of replicates. In this paper the original CCSIM algorithm is reconsidered and two significant improvements are proposed for accurately reproducing large-scale patterns of heterogeneities in porous media. First, an effective boundary-correction method based on the graph theory is presented by which one identifies the optimal cutting path/surface for removing the patchiness and discontinuities in the realization of a porous medium. Next, a new pattern adjustment method is proposed that automatically transfers the features in a pattern to one that seamlessly matches the surrounding patterns. The original CCSIM algorithm is then combined with the two methods and is tested using various complex two- and three-dimensional examples. It should, however, be emphasized that the methods that we propose in this paper are applicable to other pattern-based geostatistical simulation methods.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach. PMID:26442771
NASA Astrophysics Data System (ADS)
Goovaerts, P.; Avruskin, G.; Meliker, J.; Slotnick, M.; Jacquez, G.; Nriagu, J.
2003-12-01
Assessment of the health risks associated with exposure to elevated levels of arsenic in drinking water has become the subject of considerable interest and some controversy in both regulatory and public health communities. The objective of this research is to explore the factors that have contributed to the observed geographic co-clustering in bladder cancer mortality and arsenic concentrations in drinking water in Michigan. A corner stone is the building of a probabilistic space-time model of arsenic concentrations, accounting for information collected at private residential wells and the hydrogeochemistry of the area. Because of the small changes in concentration observed in time, the study has focused on the spatial variability of arsenic, which can be considerable over very short distances. Various geostatistical techniques, based either on lognormal or indicator transforms of the data to accommodate the highly skewed distribution, have been compared using a cross validation procedure. The most promising approach involves a soft indicator coding of arsenic measurements, which allows one to account for data below the detection limit and the magnitude of measurement errors. Prior probabilities of exceeding various arsenic thresholds are also derived from secondary information, such as type of bedrock and surficial material, well casing depth, using logistic regression. Both well and secondary data are combined using kriging, leading to a non-parametric assessment of the uncertainty attached to arsenic concentration at each node of a 500m grid. This geostatistical model can be used to map either the expected arsenic concentration, the probability that it exceeds any giventhreshold, or the variance of the prediction indicating where supplementary information should be collected. The accuracy and precision of these local probability distributions is assessed using cross validation.
Bayesian residual analysis for beta-binomial regression models
NASA Astrophysics Data System (ADS)
Pires, Rubiane Maria; Diniz, Carlos Alberto Ribeiro
2012-10-01
The beta-binomial regression model is an alternative model to the sum of any sequence of equicorrelated binary variables with common probability of success p. In this work a Bayesian perspective of this model is presented considering different link functions and different correlation structures. A general Bayesian residual analysis for this model, a issue which is often neglected in Bayesian analysis, using the residuals based on the predicted values obtained by the conditional predictive ordinate [1], the residuals based on the posterior distribution of the model parameters [2] and the Bayesian deviance residual [3] are presented in order to check the assumptions in the model.
Bayesian Kinematic Finite Fault Source Models (Invited)
NASA Astrophysics Data System (ADS)
Minson, S. E.; Simons, M.; Beck, J. L.
2010-12-01
Finite fault earthquake source models are inherently under-determined: there is no unique solution to the inverse problem of determining the rupture history at depth as a function of time and space when our data are only limited observations at the Earth's surface. Traditional inverse techniques rely on model constraints and regularization to generate one model from the possibly broad space of all possible solutions. However, Bayesian methods allow us to determine the ensemble of all possible source models which are consistent with the data and our a priori assumptions about the physics of the earthquake source. Until now, Bayesian techniques have been of limited utility because they are computationally intractable for problems with as many free parameters as kinematic finite fault models. We have developed a methodology called Cascading Adaptive Tempered Metropolis In Parallel (CATMIP) which allows us to sample very high-dimensional problems in a parallel computing framework. The CATMIP algorithm combines elements of simulated annealing and genetic algorithms with the Metropolis algorithm to dynamically optimize the algorithm's efficiency as it runs. We will present synthetic performance tests of finite fault models made with this methodology as well as a kinematic source model for the 2007 Mw 7.7 Tocopilla, Chile earthquake. This earthquake was well recorded by multiple ascending and descending interferograms and a network of high-rate GPS stations whose records can be used as near-field seismograms.
Posterior predictive Bayesian phylogenetic model selection.
Lewis, Paul O; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn
2014-05-01
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand-Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. PMID:24193892
Moving beyond qualitative evaluations of Bayesian models of cognition.
Hemmer, Pernille; Tauber, Sean; Steyvers, Mark
2015-06-01
Bayesian models of cognition provide a powerful way to understand the behavior and goals of individuals from a computational point of view. Much of the focus in the Bayesian cognitive modeling approach has been on qualitative model evaluations, where predictions from the models are compared to data that is often averaged over individuals. In many cognitive tasks, however, there are pervasive individual differences. We introduce an approach to directly infer individual differences related to subjective mental representations within the framework of Bayesian models of cognition. In this approach, Bayesian data analysis methods are used to estimate cognitive parameters and motivate the inference process within a Bayesian cognitive model. We illustrate this integrative Bayesian approach on a model of memory. We apply the model to behavioral data from a memory experiment involving the recall of heights of people. A cross-validation analysis shows that the Bayesian memory model with inferred subjective priors predicts withheld data better than a Bayesian model where the priors are based on environmental statistics. In addition, the model with inferred priors at the individual subject level led to the best overall generalization performance, suggesting that individual differences are important to consider in Bayesian models of cognition.
A Bayesian Shrinkage Approach for AMMI Models
de Oliveira, Luciano Antonio; Nuvunga, Joel Jorge; Pamplona, Andrezza Kéllen Alves
2015-01-01
Linear-bilinear models, especially the additive main effects and multiplicative interaction (AMMI) model, are widely applicable to genotype-by-environment interaction (GEI) studies in plant breeding programs. These models allow a parsimonious modeling of GE interactions, retaining a small number of principal components in the analysis. However, one aspect of the AMMI model that is still debated is the selection criteria for determining the number of multiplicative terms required to describe the GE interaction pattern. Shrinkage estimators have been proposed as selection criteria for the GE interaction components. In this study, a Bayesian approach was combined with the AMMI model with shrinkage estimators for the principal components. A total of 55 maize genotypes were evaluated in nine different environments using a complete blocks design with three replicates. The results show that the traditional Bayesian AMMI model produces low shrinkage of singular values but avoids the usual pitfalls in determining the credible intervals in the biplot. On the other hand, Bayesian shrinkage AMMI models have difficulty with the credible interval for model parameters, but produce stronger shrinkage of the principal components, converging to GE matrices that have more shrinkage than those obtained using mixed models. This characteristic allowed more parsimonious models to be chosen, and resulted in models being selected that were similar to those obtained by the Cornelius F-test (α = 0.05) in traditional AMMI models and cross validation based on leave-one-out. This characteristic allowed more parsimonious models to be chosen and more GEI pattern retained on the first two components. The resulting model chosen by posterior distribution of singular value was also similar to those produced by the cross-validation approach in traditional AMMI models. Our method enables the estimation of credible interval for AMMI biplot plus the choice of AMMI model based on direct posterior
A Bayesian Shrinkage Approach for AMMI Models.
da Silva, Carlos Pereira; de Oliveira, Luciano Antonio; Nuvunga, Joel Jorge; Pamplona, Andrezza Kéllen Alves; Balestre, Marcio
2015-01-01
Linear-bilinear models, especially the additive main effects and multiplicative interaction (AMMI) model, are widely applicable to genotype-by-environment interaction (GEI) studies in plant breeding programs. These models allow a parsimonious modeling of GE interactions, retaining a small number of principal components in the analysis. However, one aspect of the AMMI model that is still debated is the selection criteria for determining the number of multiplicative terms required to describe the GE interaction pattern. Shrinkage estimators have been proposed as selection criteria for the GE interaction components. In this study, a Bayesian approach was combined with the AMMI model with shrinkage estimators for the principal components. A total of 55 maize genotypes were evaluated in nine different environments using a complete blocks design with three replicates. The results show that the traditional Bayesian AMMI model produces low shrinkage of singular values but avoids the usual pitfalls in determining the credible intervals in the biplot. On the other hand, Bayesian shrinkage AMMI models have difficulty with the credible interval for model parameters, but produce stronger shrinkage of the principal components, converging to GE matrices that have more shrinkage than those obtained using mixed models. This characteristic allowed more parsimonious models to be chosen, and resulted in models being selected that were similar to those obtained by the Cornelius F-test (α = 0.05) in traditional AMMI models and cross validation based on leave-one-out. This characteristic allowed more parsimonious models to be chosen and more GEI pattern retained on the first two components. The resulting model chosen by posterior distribution of singular value was also similar to those produced by the cross-validation approach in traditional AMMI models. Our method enables the estimation of credible interval for AMMI biplot plus the choice of AMMI model based on direct posterior
Model Comparison of Bayesian Semiparametric and Parametric Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Xia, Ye-Mao; Pan, Jun-Hao; Lee, Sik-Yum
2011-01-01
Structural equation models have wide applications. One of the most important issues in analyzing structural equation models is model comparison. This article proposes a Bayesian model comparison statistic, namely the "L[subscript nu]"-measure for both semiparametric and parametric structural equation models. For illustration purposes, we consider…
A Nonparametric Bayesian Model for Nested Clustering.
Lee, Juhee; Müller, Peter; Zhu, Yitan; Ji, Yuan
2016-01-01
We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering. PMID:26519174
Model feedback in Bayesian propensity score estimation.
Zigler, Corwin M; Watts, Krista; Yeh, Robert W; Wang, Yun; Coull, Brent A; Dominici, Francesca
2013-03-01
Methods based on the propensity score comprise one set of valuable tools for comparative effectiveness research and for estimating causal effects more generally. These methods typically consist of two distinct stages: (1) a propensity score stage where a model is fit to predict the propensity to receive treatment (the propensity score), and (2) an outcome stage where responses are compared in treated and untreated units having similar values of the estimated propensity score. Traditional techniques conduct estimation in these two stages separately; estimates from the first stage are treated as fixed and known for use in the second stage. Bayesian methods have natural appeal in these settings because separate likelihoods for the two stages can be combined into a single joint likelihood, with estimation of the two stages carried out simultaneously. One key feature of joint estimation in this context is "feedback" between the outcome stage and the propensity score stage, meaning that quantities in a model for the outcome contribute information to posterior distributions of quantities in the model for the propensity score. We provide a rigorous assessment of Bayesian propensity score estimation to show that model feedback can produce poor estimates of causal effects absent strategies that augment propensity score adjustment with adjustment for individual covariates. We illustrate this phenomenon with a simulation study and with a comparative effectiveness investigation of carotid artery stenting versus carotid endarterectomy among 123,286 Medicare beneficiaries hospitlized for stroke in 2006 and 2007. PMID:23379793
BAYESIAN MODEL DETERMINATION FOR GEOSTATISTICAL REGRESSION MODELS. (R829095C001)
The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Bayesian model comparison of solar flare spectra
NASA Astrophysics Data System (ADS)
Ireland, J.; Holman, G.
2012-12-01
The detailed understanding of solar flares requires an understanding of the physics of accelerated electrons, since electrons carry a large fraction of the total energy released in a flare. Hard X-ray energy flux spectral observations of solar flares can be fit with different parameterized models of the interaction of the flare-accelerated electrons with the solar plasma. Each model describes different possible physical effects that may occur in solar flares. Bayesian model comparison provides a technique for assessing which model best describes the data. The advantage of this technique over others is that it can fully account for the different number and type of parameters in each model. We demonstrate this using Ramaty High Energy Solar Spectroscopic Imager (RHESSI) spectral data from the GOES (Geostationary Operational Environmental Satellite) X4.8 flare of 23-July-2002. We suggest that the observed spectrum can be reproduced using two different parameterized models of the flare electron content. The first model assumes that the flare-accelerated electron spectrum consisting of a single power law with a fixed low energy cutoff assumed to be below the range of fitted X-ray energies, interacting with a non-uniformly ionized target. The second model assumes that the flare-accelerated electron spectrum has a broken power law and a low energy cutoff, which interacts with a fully ionized target plasma. The low energy cutoff in this model is a parameter used in fitting the data. We will introduce and use Bayesian model comparison techniques to decide which model best explains the observed data. This work is funded by the NASA Solar and Heliospheric Physics program.
Experience With Bayesian Image Based Surface Modeling
NASA Technical Reports Server (NTRS)
Stutz, John C.
2005-01-01
Bayesian surface modeling from images requires modeling both the surface and the image generation process, in order to optimize the models by comparing actual and generated images. Thus it differs greatly, both conceptually and in computational difficulty, from conventional stereo surface recovery techniques. But it offers the possibility of using any number of images, taken under quite different conditions, and by different instruments that provide independent and often complementary information, to generate a single surface model that fuses all available information. I describe an implemented system, with a brief introduction to the underlying mathematical models and the compromises made for computational efficiency. I describe successes and failures achieved on actual imagery, where we went wrong and what we did right, and how our approach could be improved. Lastly I discuss how the same approach can be extended to distinct types of instruments, to achieve true sensor fusion.
Bayesian Lasso for Semiparametric Structural Equation Models
Guo, Ruixin; Zhu, Hongtu; Chow, Sy-Miin; Ibrahim, Joseph G.
2011-01-01
Summary There has been great interest in developing nonlinear structural equation models and associated statistical inference procedures, including estimation and model selection methods. In this paper a general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables. A basis representation is used to approximate these nonparametric functions in the structural equation and the Bayesian Lasso method coupled with a Markov Chain Monte Carlo (MCMC) algorithm is used for simultaneous estimation and model selection. The proposed method is illustrated using a simulation study and data from the Affective Dynamics and Individual Differences (ADID) study. Results demonstrate that our method can accurately estimate the unknown parameters and correctly identify the true underlying model. PMID:22376150
NASA Astrophysics Data System (ADS)
Blessent, Daniela; Therrien, René; Lemieux, Jean-Michel
2011-12-01
This paper presents numerical simulations of a series of hydraulic interference tests conducted in crystalline bedrock at Olkiluoto (Finland), a potential site for the disposal of the Finnish high-level nuclear waste. The tests are in a block of crystalline bedrock of about 0.03 km3 that contains low-transmissivity fractures. Fracture density, orientation, and fracture transmissivity are estimated from Posiva Flow Log (PFL) measurements in boreholes drilled in the rock block. On the basis of those data, a geostatistical approach relying on a transitional probability and Markov chain models is used to define a conceptual model based on stochastic fractured rock facies. Four facies are defined, from sparsely fractured bedrock to highly fractured bedrock. Using this conceptual model, three-dimensional groundwater flow is then simulated to reproduce interference pumping tests in either open or packed-off boreholes. Hydraulic conductivities of the fracture facies are estimated through automatic calibration using either hydraulic heads or both hydraulic heads and PFL flow rates as targets for calibration. The latter option produces a narrower confidence interval for the calibrated hydraulic conductivities, therefore reducing the associated uncertainty and demonstrating the usefulness of the measured PFL flow rates. Furthermore, the stochastic facies conceptual model is a suitable alternative to discrete fracture network models to simulate fluid flow in fractured geological media.
A Hierarchical Bayesian Model for Crowd Emotions
Urizar, Oscar J.; Baig, Mirza S.; Barakova, Emilia I.; Regazzoni, Carlo S.; Marcenaro, Lucio; Rauterberg, Matthias
2016-01-01
Estimation of emotions is an essential aspect in developing intelligent systems intended for crowded environments. However, emotion estimation in crowds remains a challenging problem due to the complexity in which human emotions are manifested and the capability of a system to perceive them in such conditions. This paper proposes a hierarchical Bayesian model to learn in unsupervised manner the behavior of individuals and of the crowd as a single entity, and explore the relation between behavior and emotions to infer emotional states. Information about the motion patterns of individuals are described using a self-organizing map, and a hierarchical Bayesian network builds probabilistic models to identify behaviors and infer the emotional state of individuals and the crowd. This model is trained and tested using data produced from simulated scenarios that resemble real-life environments. The conducted experiments tested the efficiency of our method to learn, detect and associate behaviors with emotional states yielding accuracy levels of 74% for individuals and 81% for the crowd, similar in performance with existing methods for pedestrian behavior detection but with novel concepts regarding the analysis of crowds. PMID:27458366
A Hierarchical Bayesian Model for Crowd Emotions.
Urizar, Oscar J; Baig, Mirza S; Barakova, Emilia I; Regazzoni, Carlo S; Marcenaro, Lucio; Rauterberg, Matthias
2016-01-01
Estimation of emotions is an essential aspect in developing intelligent systems intended for crowded environments. However, emotion estimation in crowds remains a challenging problem due to the complexity in which human emotions are manifested and the capability of a system to perceive them in such conditions. This paper proposes a hierarchical Bayesian model to learn in unsupervised manner the behavior of individuals and of the crowd as a single entity, and explore the relation between behavior and emotions to infer emotional states. Information about the motion patterns of individuals are described using a self-organizing map, and a hierarchical Bayesian network builds probabilistic models to identify behaviors and infer the emotional state of individuals and the crowd. This model is trained and tested using data produced from simulated scenarios that resemble real-life environments. The conducted experiments tested the efficiency of our method to learn, detect and associate behaviors with emotional states yielding accuracy levels of 74% for individuals and 81% for the crowd, similar in performance with existing methods for pedestrian behavior detection but with novel concepts regarding the analysis of crowds. PMID:27458366
A Hierarchical Bayesian Model for Crowd Emotions.
Urizar, Oscar J; Baig, Mirza S; Barakova, Emilia I; Regazzoni, Carlo S; Marcenaro, Lucio; Rauterberg, Matthias
2016-01-01
Estimation of emotions is an essential aspect in developing intelligent systems intended for crowded environments. However, emotion estimation in crowds remains a challenging problem due to the complexity in which human emotions are manifested and the capability of a system to perceive them in such conditions. This paper proposes a hierarchical Bayesian model to learn in unsupervised manner the behavior of individuals and of the crowd as a single entity, and explore the relation between behavior and emotions to infer emotional states. Information about the motion patterns of individuals are described using a self-organizing map, and a hierarchical Bayesian network builds probabilistic models to identify behaviors and infer the emotional state of individuals and the crowd. This model is trained and tested using data produced from simulated scenarios that resemble real-life environments. The conducted experiments tested the efficiency of our method to learn, detect and associate behaviors with emotional states yielding accuracy levels of 74% for individuals and 81% for the crowd, similar in performance with existing methods for pedestrian behavior detection but with novel concepts regarding the analysis of crowds.
Jones, Matt; Love, Bradley C
2011-08-01
The prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology - namely, Behaviorism and evolutionary psychology - that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls
Hopes and Cautions in Implementing Bayesian Structural Equation Modeling
ERIC Educational Resources Information Center
MacCallum, Robert C.; Edwards, Michael C.; Cai, Li
2012-01-01
Muthen and Asparouhov (2012) have proposed and demonstrated an approach to model specification and estimation in structural equation modeling (SEM) using Bayesian methods. Their contribution builds on previous work in this area by (a) focusing on the translation of conventional SEM models into a Bayesian framework wherein parameters fixed at zero…
Merging Digital Surface Models Implementing Bayesian Approaches
NASA Astrophysics Data System (ADS)
Sadeq, H.; Drummond, J.; Li, Z.
2016-06-01
In this research different DSMs from different sources have been merged. The merging is based on a probabilistic model using a Bayesian Approach. The implemented data have been sourced from very high resolution satellite imagery sensors (e.g. WorldView-1 and Pleiades). It is deemed preferable to use a Bayesian Approach when the data obtained from the sensors are limited and it is difficult to obtain many measurements or it would be very costly, thus the problem of the lack of data can be solved by introducing a priori estimations of data. To infer the prior data, it is assumed that the roofs of the buildings are specified as smooth, and for that purpose local entropy has been implemented. In addition to the a priori estimations, GNSS RTK measurements have been collected in the field which are used as check points to assess the quality of the DSMs and to validate the merging result. The model has been applied in the West-End of Glasgow containing different kinds of buildings, such as flat roofed and hipped roofed buildings. Both quantitative and qualitative methods have been employed to validate the merged DSM. The validation results have shown that the model was successfully able to improve the quality of the DSMs and improving some characteristics such as the roof surfaces, which consequently led to better representations. In addition to that, the developed model has been compared with the well established Maximum Likelihood model and showed similar quantitative statistical results and better qualitative results. Although the proposed model has been applied on DSMs that were derived from satellite imagery, it can be applied to any other sourced DSMs.
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology. PMID:23687472
Warnery, E; Ielsch, G; Lajaunie, C; Cale, E; Wackernagel, H; Debayle, C; Guillevic, J
2015-01-01
information, which is exhaustive throughout France, could help in estimating the telluric gamma dose rates. Such an approach is possible using multivariate geostatistics and cokriging. Multi-collocated cokriging has been performed on 1*1 km(2) cells over the domain. This model used gamma dose rate measurement results and GUP classes. Our results provide useful information on the variability of the natural terrestrial gamma radiation in France ('natural background') and exposure data for epidemiological studies and risk assessment from low dose chronic exposures. PMID:25464050
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.
Orbanz, Peter; Roy, Daniel M
2015-02-01
The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation. Dirichlet process clustering, Gaussian process regression, and many other parametric and nonparametric Bayesian models fall within the remit of this framework; many problems arising in modern data analysis do not. This article provides an introduction to Bayesian models of graphs, matrices, and other data that can be modeled by random structures. We describe results in probability theory that generalize de Finetti's theorem to such data and discuss their relevance to nonparametric Bayesian modeling. With the basic ideas in place, we survey example models available in the literature; applications of such models include collaborative filtering, link prediction, and graph and network analysis. We also highlight connections to recent developments in graph theory and probability, and sketch the more general mathematical foundation of Bayesian methods for other types of data beyond sequences and arrays. PMID:26353253
A Bayesian Analysis of Finite Mixtures in the LISREL Model.
ERIC Educational Resources Information Center
Zhu, Hong-Tu; Lee, Sik-Yum
2001-01-01
Proposes a Bayesian framework for estimating finite mixtures of the LISREL model. The model augments the observed data of the manifest variables with the latent variables and allocation variables and uses the Gibbs sampler to obtain the Bayesian solution. Discusses other associated statistical inferences. (SLD)
NASA Astrophysics Data System (ADS)
Vasquez, D. A.; Swift, J. N.; Tan, S.; Darrah, T. H.
2013-12-01
The integration of precise geochemical analyses with quantitative engineering modeling into an interactive GIS system allows for a sophisticated and efficient method of reservoir engineering and characterization. Geographic Information Systems (GIS) is utilized as an advanced technique for oil field reservoir analysis by combining field engineering and geological/geochemical spatial datasets with the available systematic modeling and mapping methods to integrate the information into a spatially correlated first-hand approach in defining surface and subsurface characteristics. Three key methods of analysis include: 1) Geostatistical modeling to create a static and volumetric 3-dimensional representation of the geological body, 2) Numerical modeling to develop a dynamic and interactive 2-dimensional model of fluid flow across the reservoir and 3) Noble gas geochemistry to further define the physical conditions, components and history of the geologic system. Results thus far include using engineering algorithms for interpolating electrical well log properties across the field (spontaneous potential, resistivity) yielding a highly accurate and high-resolution 3D model of rock properties. Results so far also include using numerical finite difference methods (crank-nicholson) to solve for equations describing the distribution of pressure across field yielding a 2D simulation model of fluid flow across reservoir. Ongoing noble gas geochemistry results will also include determination of the source, thermal maturity and the extent/style of fluid migration (connectivity, continuity and directionality). Future work will include developing an inverse engineering algorithm to model for permeability, porosity and water saturation.This combination of new and efficient technological and analytical capabilities is geared to provide a better understanding of the field geology and hydrocarbon dynamics system with applications to determine the presence of hydrocarbon pay zones (or
A sensorimotor paradigm for Bayesian model selection.
Genewein, Tim; Braun, Daniel A
2012-01-01
Sensorimotor control is thought to rely on predictive internal models in order to cope efficiently with uncertain environments. Recently, it has been shown that humans not only learn different internal models for different tasks, but that they also extract common structure between tasks. This raises the question of how the motor system selects between different structures or models, when each model can be associated with a range of different task-specific parameters. Here we design a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup, where one of the dimensions can be mapped to a model variable and the other dimension to the parameter variable. By introducing probe trials that are neutral in the parameter dimension, we can directly test for model selection. We found that model selection procedures based on Bayesian statistics provided a better explanation for subjects' choice behavior than simple non-probabilistic heuristics. Our experimental design lends itself to the general study of model selection in a sensorimotor context as it allows to separately query model and parameter variables from subjects. PMID:23125827
NASA Astrophysics Data System (ADS)
You, Jiong; Pei, Zhiyuan
2015-01-01
With the development of remote sensing technology, its applications in agriculture monitoring systems, crop mapping accuracy, and spatial distribution are more and more being explored by administrators and users. Uncertainty in crop mapping is profoundly affected by the spatial pattern of spectral reflectance values obtained from the applied remote sensing data. Errors in remotely sensed crop cover information and the propagation in derivative products need to be quantified and handled correctly. Therefore, this study discusses the methods of error modeling for uncertainty characterization in crop mapping using GF-1 multispectral imagery. An error modeling framework based on geostatistics is proposed, which introduced the sequential Gaussian simulation algorithm to explore the relationship between classification errors and the spectral signature from remote sensing data source. On this basis, a misclassification probability model to produce a spatially explicit classification error probability surface for the map of a crop is developed, which realizes the uncertainty characterization for crop mapping. In this process, trend surface analysis was carried out to generate a spatially varying mean response and the corresponding residual response with spatial variation for the spectral bands of GF-1 multispectral imagery. Variogram models were employed to measure the spatial dependence in the spectral bands and the derived misclassification probability surfaces. Simulated spectral data and classification results were quantitatively analyzed. Through experiments using data sets from a region in the low rolling country located at the Yangtze River valley, it was found that GF-1 multispectral imagery can be used for crop mapping with a good overall performance, the proposal error modeling framework can be used to quantify the uncertainty in crop mapping, and the misclassification probability model can summarize the spatial variation in map accuracy and is helpful for
Model parameter updating using Bayesian networks
Treml, C. A.; Ross, Timothy J.
2004-01-01
This paper outlines a model parameter updating technique for a new method of model validation using a modified model reference adaptive control (MRAC) framework with Bayesian Networks (BNs). The model parameter updating within this method is generic in the sense that the model/simulation to be validated is treated as a black box. It must have updateable parameters to which its outputs are sensitive, and those outputs must have metrics that can be compared to that of the model reference, i.e., experimental data. Furthermore, no assumptions are made about the statistics of the model parameter uncertainty, only upper and lower bounds need to be specified. This method is designed for situations where a model is not intended to predict a complete point-by-point time domain description of the item/system behavior; rather, there are specific points, features, or events of interest that need to be predicted. These specific points are compared to the model reference derived from actual experimental data. The logic for updating the model parameters to match the model reference is formed via a BN. The nodes of this BN consist of updateable model input parameters and the specific output values or features of interest. Each time the model is executed, the input/output pairs are used to adapt the conditional probabilities of the BN. Each iteration further refines the inferred model parameters to produce the desired model output. After parameter updating is complete and model inputs are inferred, reliabilities for the model output are supplied. Finally, this method is applied to a simulation of a resonance control cooling system for a prototype coupled cavity linac. The results are compared to experimental data.
Bayesian model selection for LISA pathfinder
NASA Astrophysics Data System (ADS)
Karnesis, Nikolaos; Nofrarias, Miquel; Sopuerta, Carlos F.; Gibert, Ferran; Armano, Michele; Audley, Heather; Congedo, Giuseppe; Diepholz, Ingo; Ferraioli, Luigi; Hewitson, Martin; Hueller, Mauro; Korsakova, Natalia; McNamara, Paul W.; Plagnol, Eric; Vitale, Stefano
2014-03-01
The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise models and to test key technologies for future space-based gravitational-wave observatories similar to the eLISA concept. The data analysis team has developed complex three-dimensional models of the LISA Technology Package (LTP) experiment onboard the LPF. These models are used for simulations, but, more importantly, they will be used for parameter estimation purposes during flight operations. One of the tasks of the data analysis team is to identify the physical effects that contribute significantly to the properties of the instrument noise. A way of approaching this problem is to recover the essential parameters of a LTP model fitting the data. Thus, we want to define the simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, one has to estimate the so-called Bayes factor between two competing models. In our analysis, we use three main different methods to estimate it: the reversible jump Markov chain Monte Carlo method, the Schwarz criterion, and the Laplace approximation. They are applied to simulated LPF experiments in which the most probable LTP model that explains the observations is recovered. The same type of analysis presented in this paper is expected to be followed during flight operations. Moreover, the correlation of the output of the aforementioned methods with the design of the experiment is explored.
Newton, Einstein, Jeffreys and Bayesian model selection
NASA Astrophysics Data System (ADS)
Chettri, Samir; Batchelor, David; Campbell, William; Balakrishnan, Karthik
2005-11-01
In Jefferys and Berger apply Bayesian model selection to the problem of choosing between rival theories, in particular between Einstein's theory of general relativity (GR) and Newtonian gravity (NG). [1] presents a debate between Harold Jeffreys and Charles Poor regarding the observed 43''/century anomalous perhelion precession of Mercury. GR made a precise prediction of 42.98''/century while proponents of NG suggested several physical mechanisms that were eventually refuted, with the exception of a modified inverse square law. Using Bayes Factors (BF) and data available in 1921, shows that GR is preferable to NG by a factor of about 25 to 1. A scale for BF used by Jeffreys, suggests that this is positive to strong evidence for GR over modified NG but it is not very strong or even overwhelming. In this work we calculate the BF from the period 1921 till 1993. By 1960 we see that the BF, due to better data gathering techniques and advances in technology, had reached a factor of greater than 100 to 1, making GR strongly preferable to NG and by 1990 the BF reached 1000:1. Ironically while BF had reached a state of near certainty even in 1960 rival theories of gravitation were on the rise - notably the Brans-Dicke (BD) scalar-tensor theory of gravity. The BD theory is postulated in such a way that for small positive values of a scalar parameter ω, the BF would favor GR while the BF would approach unity with certainty as ω grows larger, at which point either theory would be prefered, i.e., it is a theory that cannot lose. Does this mean Bayesian model selection needs to be overthrown? This points to the need for cogent prior information guided by physics and physical experiment.
Advances in Bayesian Modeling in Educational Research
ERIC Educational Resources Information Center
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Bayesian Student Modeling and the Problem of Parameter Specification.
ERIC Educational Resources Information Center
Millan, Eva; Agosta, John Mark; Perez de la Cruz, Jose Luis
2001-01-01
Discusses intelligent tutoring systems and the application of Bayesian networks to student modeling. Considers reasons for not using Bayesian networks, including the computational complexity of the algorithms and the difficulty of knowledge acquisition, and proposes an approach to simplify knowledge acquisition that applies causal independence to…
A Tutorial Introduction to Bayesian Models of Cognitive Development
ERIC Educational Resources Information Center
Perfors, Amy; Tenenbaum, Joshua B.; Griffiths, Thomas L.; Xu, Fei
2011-01-01
We present an introduction to Bayesian inference as it is used in probabilistic models of cognitive development. Our goal is to provide an intuitive and accessible guide to the "what", the "how", and the "why" of the Bayesian approach: what sorts of problems and data the framework is most relevant for, and how and why it may be useful for…
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Bayesian modeling of differential gene expression.
Lewin, Alex; Richardson, Sylvia; Marshall, Clare; Glazier, Anne; Aitman, Tim
2006-03-01
We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations.
Bayesian analysis of the backreaction models
Kurek, Aleksandra; Bolejko, Krzysztof; Szydlowski, Marek
2010-03-15
We present a Bayesian analysis of four different types of backreaction models, which are based on the Buchert equations. In this approach, one considers a solution to the Einstein equations for a general matter distribution and then an average of various observable quantities is taken. Such an approach became of considerable interest when it was shown that it could lead to agreement with observations without resorting to dark energy. In this paper we compare the {Lambda}CDM model and the backreaction models with type Ia supernovae, baryon acoustic oscillations, and cosmic microwave background data, and find that the former is favored. However, the tested models were based on some particular assumptions about the relation between the average spatial curvature and the backreaction, as well as the relation between the curvature and curvature index. In this paper we modified the latter assumption, leaving the former unchanged. We find that, by varying the relation between the curvature and curvature index, we can obtain a better fit. Therefore, some further work is still needed--in particular, the relation between the backreaction and the curvature should be revisited in order to fully determine the feasibility of the backreaction models to mimic dark energy.
Bayesian methods for spatial upscaling of process-based forest ecosystem models
NASA Astrophysics Data System (ADS)
van Oijen, M.; Cameron, D.; Reinds, G.; Thomson, A.
2010-12-01
not proportional to carbon accumulation itself. In neither study was uncertainty quantification comprehensive. We therefore conclude with an overview of different upscaling methods to discuss the way forward towards a complete Bayesian framework. Six different methods of spatial upscaling are identified. The methods fall in three classes: (i) direct applications of the point-support model, (ii) extension of the point-support model with a geostatistical model, (iii) replacement of the original model with an emulator. Gaussian Process modelling can be used both for upscaling and emulation. The Bayesian perspective shows how output uncertainty can be quantified for each upscaling method. Reinds, G.J., Van Oijen, M. et al. (2008). Bayesian calibration of the VSD soil acidification model using European forest monitoring data. Geoderma 146: 475-488. Van Oijen, M. et al. (2005). Bayesian calibration of process-based forest models: bridging the gap between models and data. Tree Phys. 25: 915-927. Van Oijen, M. & Thomson, A. (2010). Towards Bayesian uncertainty quantification for forestry models used in the United Kingdom Greenhouse Gas Inventory for land use, land use change, and forestry. Clim. Change DOI:10.1007/s10584-010-9917-3.
NASA Astrophysics Data System (ADS)
Matiatos, Ioannis; Varouhakis, Emmanouil A.; Papadopoulou, Maria P.
2015-04-01
level and nitrate concentrations were produced and compared with those obtained from groundwater and mass transport numerical models. Preliminary results showed similar efficiency of the spatiotemporal geostatistical method with the numerical models. However data requirements of the former model were significantly less. Advantages and disadvantages of the methods performance were analysed and discussed indicating the characteristics of the different approaches.
Scale Mixture Models with Applications to Bayesian Inference
NASA Astrophysics Data System (ADS)
Qin, Zhaohui S.; Damien, Paul; Walker, Stephen
2003-11-01
Scale mixtures of uniform distributions are used to model non-normal data in time series and econometrics in a Bayesian framework. Heteroscedastic and skewed data models are also tackled using scale mixture of uniform distributions.
Stochastic model updating utilizing Bayesian approach and Gaussian process model
NASA Astrophysics Data System (ADS)
Wan, Hua-Ping; Ren, Wei-Xin
2016-03-01
Stochastic model updating (SMU) has been increasingly applied in quantifying structural parameter uncertainty from responses variability. SMU for parameter uncertainty quantification refers to the problem of inverse uncertainty quantification (IUQ), which is a nontrivial task. Inverse problem solved with optimization usually brings about the issues of gradient computation, ill-conditionedness, and non-uniqueness. Moreover, the uncertainty present in response makes the inverse problem more complicated. In this study, Bayesian approach is adopted in SMU for parameter uncertainty quantification. The prominent strength of Bayesian approach for IUQ problem is that it solves IUQ problem in a straightforward manner, which enables it to avoid the previous issues. However, when applied to engineering structures that are modeled with a high-resolution finite element model (FEM), Bayesian approach is still computationally expensive since the commonly used Markov chain Monte Carlo (MCMC) method for Bayesian inference requires a large number of model runs to guarantee the convergence. Herein we reduce computational cost in two aspects. On the one hand, the fast-running Gaussian process model (GPM) is utilized to approximate the time-consuming high-resolution FEM. On the other hand, the advanced MCMC method using delayed rejection adaptive Metropolis (DRAM) algorithm that incorporates local adaptive strategy with global adaptive strategy is employed for Bayesian inference. In addition, we propose the use of the powerful variance-based global sensitivity analysis (GSA) in parameter selection to exclude non-influential parameters from calibration parameters, which yields a reduced-order model and thus further alleviates the computational burden. A simulated aluminum plate and a real-world complex cable-stayed pedestrian bridge are presented to illustrate the proposed framework and verify its feasibility.
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Bayesian Case-deletion Model Complexity and Information Criterion
Zhu, Hongtu; Ibrahim, Joseph G.; Chen, Qingxia
2015-01-01
We establish a connection between Bayesian case influence measures for assessing the influence of individual observations and Bayesian predictive methods for evaluating the predictive performance of a model and comparing different models fitted to the same dataset. Based on such a connection, we formally propose a new set of Bayesian case-deletion model complexity (BCMC) measures for quantifying the effective number of parameters in a given statistical model. Its properties in linear models are explored. Adding some functions of BCMC to a conditional deviance function leads to a Bayesian case-deletion information criterion (BCIC) for comparing models. We systematically investigate some properties of BCIC and its connection with other information criteria, such as the Deviance Information Criterion (DIC). We illustrate the proposed methodology on linear mixed models with simulations and a real data example. PMID:26180578
Entropic Priors and Bayesian Model Selection
NASA Astrophysics Data System (ADS)
Brewer, Brendon J.; Francis, Matthew J.
2009-12-01
We demonstrate that the principle of maximum relative entropy (ME), used judiciously, can ease the specification of priors in model selection problems. The resulting effect is that models that make sharp predictions are disfavoured, weakening the usual Bayesian ``Occam's Razor.'' This is illustrated with a simple example involving what Jaynes called a ``sure thing'' hypothesis. Jaynes' resolution of the situation involved introducing a large number of alternative ``sure thing'' hypotheses that were possible before we observed the data. However, in more complex situations, it may not be possible to explicitly enumerate large numbers of alternatives. The entropic priors formalism produces the desired result without modifying the hypothesis space or requiring explicit enumeration of alternatives; all that is required is a good model for the prior predictive distribution for the data. This idea is illustrated with a simple rigged-lottery example, and we outline how this idea may help to resolve a recent debate amongst cosmologists: is dark energy a cosmological constant, or has it evolved with time in some way? And how shall we decide, when the data are in?
Bayesian analysis of a disability model for lung cancer survival.
Armero, C; Cabras, S; Castellanos, M E; Perra, S; Quirós, A; Oruezábal, M J; Sánchez-Rubio, J
2016-02-01
Bayesian reasoning, survival analysis and multi-state models are used to assess survival times for Stage IV non-small-cell lung cancer patients and the evolution of the disease over time. Bayesian estimation is done using minimum informative priors for the Weibull regression survival model, leading to an automatic inferential procedure. Markov chain Monte Carlo methods have been used for approximating posterior distributions and the Bayesian information criterion has been considered for covariate selection. In particular, the posterior distribution of the transition probabilities, resulting from the multi-state model, constitutes a very interesting tool which could be useful to help oncologists and patients make efficient and effective decisions.
Two-Stage Bayesian Model Averaging in Endogenous Variable Models.
Lenkoski, Alex; Eicher, Theo S; Raftery, Adrian E
2014-01-01
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed.
Two-Stage Bayesian Model Averaging in Endogenous Variable Models.
Lenkoski, Alex; Eicher, Theo S; Raftery, Adrian E
2014-01-01
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed. PMID:24223471
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Geostatistical Modeling of the Spatial Variability of Arsenic in Groundwater of Southeast Michigan
NASA Astrophysics Data System (ADS)
Avruskin, G.; Goovaerts, P.; Meliker, J.; Slotnick, M.; Jacquez, G. M.; Nriagu, J. O.
2004-12-01
The last decade has witnessed an increasing interest in assessing health risks caused by exposure to contaminants present in the soil, air, and water. A key component of any exposure study is a reliable model for the space-time distribution of pollutants. This paper compares the performances of multiGaussian and indicator kriging for modeling probabilistically the space-time distribution of arsenic concentrations in groundwater of Southeast Michigan, accounting for information collected at private residential wells and the hydrogeochemistry of the area. This model will later be combined with a space-time information system to assess the risk associated with exposure to low levels of arsenic in drinking water (typically 5-100 μ g/L), in particular for the development of bladder cancer. Because of the small changes in concentration observed in time, the study has focused on the spatial variability of arsenic. This study confirmed results in the literature that reported intense spatial non-homogeneity of As concentration, resulting in samples that greatly vary even when located a few meters apart. Indicator semivariograms further showed a better spatial connectivity of low concentrations while values exceeding 32 μ g/L (10% of wells) are spatially uncorrelated. Secondary information, such as proximity to Marshall Sandstone, helped only the prediction at a regional scale (i.e. beyond 15 kms), leaving the short-range variability largely unexplained. Several geostatistical tools were tailored to the features of the As dataset: (1) semivariogram values were standardized by the lag variance to correct for the preferential sampling of wells with high concentrations, (2) semivariogram modeling was conducted under the constraint of reproduction of the nugget effect inferred from colocated well measurements, (3) kriging systems were modified to account for repeated measurements at a series of wells while avoiding non-invertible kriging matrices, (4) kriging-based smoothing
Bayesian model reduction and empirical Bayes for group (DCM) studies.
Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter
2016-03-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction.
Bayesian model reduction and empirical Bayes for group (DCM) studies
Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter
2016-01-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570
Calibrating Bayesian Network Representations of Social-Behavioral Models
Whitney, Paul D.; Walsh, Stephen J.
2010-04-08
While human behavior has long been studied, recent and ongoing advances in computational modeling present opportunities for recasting research outcomes in human behavior. In this paper we describe how Bayesian networks can represent outcomes of human behavior research. We demonstrate a Bayesian network that represents political radicalization research – and show a corresponding visual representation of aspects of this research outcome. Since Bayesian networks can be quantitatively compared with external observations, the representation can also be used for empirical assessments of the research which the network summarizes. For a political radicalization model based on published research, we show this empirical comparison with data taken from the Minorities at Risk Organizational Behaviors database.
Viswanathan, Hari S; Robinson, Bruce A; Gable, Carl W; Carey, James W
2003-01-01
Retardation of certain radionuclides due to sorption to zeolitic minerals is considered one of the major barriers to contaminant transport in the unsaturated zone of Yucca Mountain. However, zeolitically altered areas are lower in permeability than unaltered regions, which raises the possibility that contaminants might bypass the sorptive zeolites. The relationship between hydrologic and chemical properties must be understood to predict the transport of radionuclides through zeolitically altered areas. In this study, we incorporate mineralogical information into an unsaturated zone transport model using geostatistical techniques to correlate zeolitic abundance to hydrologic and chemical properties. Geostatistical methods are used to develop variograms, kriging maps, and conditional simulations of zeolitic abundance. We then investigate, using flow and transport modeling on a heterogeneous field, the relationship between percent zeolitic alteration, permeability changes due to alteration, sorption due to alteration, and their overall effect on radionuclide transport. We compare these geostatistical simulations to a simplified threshold method in which each spatial location in the model is assigned either zeolitic or vitric properties based on the zeolitic abundance at that location. A key conclusion is that retardation due to sorption predicted by using the continuous distribution is larger than the retardation predicted by the threshold method. The reason for larger retardation when using the continuous distribution is a small but significant sorption at locations with low zeolitic abundance. If, for practical reasons, models with homogeneous properties within each layer are used, we recommend setting nonzero K(d)s in the vitric tuffs to mimic the more rigorous continuous distribution simulations. Regions with high zeolitic abundance may not be as effective in retarding radionuclides such as Neptunium since these rocks are lower in permeability and contaminants can
Prospective evaluation of a Bayesian model to predict organizational change.
Molfenter, Todd; Gustafson, Dave; Kilo, Chuck; Bhattacharya, Abhik; Olsson, Jesper
2005-01-01
This research examines a subjective Bayesian model's ability to predict organizational change outcomes and sustainability of those outcomes for project teams participating in a multi-organizational improvement collaborative. PMID:16093893
BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)
We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...
Which level of model complexity is justified by your data? A Bayesian answer
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Illman, Walter; Wöhling, Thomas; Nowak, Wolfgang
2016-04-01
When judging the plausibility and utility of a subsurface flow or transport model, the question of justifiability arises: which level of model complexity can still be justified by the available calibration data? Although it is common sense that more data are needed to reasonably constrain the parameter space of a more complex model, there is a lack of tools that can objectively quantify model justifiability as a function of the available data. We propose an approach to determine model justifiability in the context of comparing alternative conceptual models. Our approach rests on Bayesian model averaging (BMA). BMA yields posterior model probabilities that point the modeler to an optimal trade-off between model performance in reproducing a given calibration data set and model complexity. To find out which level of complexity can be justified by the available data, we disentangle the complexity component of the trade-off from its performance counterpart. Technically, we remove the performance component from the BMA analysis by replacing the actually observed data values with potential measurement values as predicted by the models. Our proposed analysis results in a "model confusion matrix". Based on this matrix, the modeler can identify the maximum level of model complexity that could possibly be justified by the available amount and type of data. As a side product, model (dis-)similarity is revealed. We have applied the model justifiability analysis to a case of aquifer characterization via hydraulic tomography. Four models of vastly different complexity have been proposed to represent the heterogeneity in hydraulic conductivity of a sandbox aquifer, ranging from a homogeneous medium to geostatistical random fields. We have used drawdown data from two to six pumping tests to condition the models and to determine model justifiability as a function of data set size. Our test case shows that a geostatistical parameterization scheme requires a substantial amount of
Geostatistical simulations for radon indoor with a nested model including the housing factor.
Cafaro, C; Giovani, C; Garavaglia, M
2016-01-01
The radon prone areas definition is matter of many researches in radioecology, since radon is considered a leading cause of lung tumours, therefore the authorities ask for support to develop an appropriate sanitary prevention strategy. In this paper, we use geostatistical tools to elaborate a definition accounting for some of the available information about the dwellings. Co-kriging is the proper interpolator used in geostatistics to refine the predictions by using external covariates. In advance, co-kriging is not guaranteed to improve significantly the results obtained by applying the common lognormal kriging. Here, instead, such multivariate approach leads to reduce the cross-validation residual variance to an extent which is deemed as satisfying. Furthermore, with the application of Monte Carlo simulations, the paradigm provides a more conservative radon prone areas definition than the one previously made by lognormal kriging. PMID:26547362
Geostatistical simulations for radon indoor with a nested model including the housing factor.
Cafaro, C; Giovani, C; Garavaglia, M
2016-01-01
The radon prone areas definition is matter of many researches in radioecology, since radon is considered a leading cause of lung tumours, therefore the authorities ask for support to develop an appropriate sanitary prevention strategy. In this paper, we use geostatistical tools to elaborate a definition accounting for some of the available information about the dwellings. Co-kriging is the proper interpolator used in geostatistics to refine the predictions by using external covariates. In advance, co-kriging is not guaranteed to improve significantly the results obtained by applying the common lognormal kriging. Here, instead, such multivariate approach leads to reduce the cross-validation residual variance to an extent which is deemed as satisfying. Furthermore, with the application of Monte Carlo simulations, the paradigm provides a more conservative radon prone areas definition than the one previously made by lognormal kriging.
Lee, K.H.
1997-09-01
Numerical and geostatistical analyses show that the artificial smoothing effect of kriging removes high permeability flow paths from hydrogeologic data sets, reducing simulated contaminant transport rates in heterogeneous vadose zone systems. therefore, kriging alone is not recommended for estimating the spatial distribution of soil hydraulic properties for contaminant transport analysis at vadose zone sites. Vadose zone transport if modeled more effectively by combining kriging with stochastic simulation to better represent the high degree of spatial variability usually found in the hydraulic properties of field soils. However, kriging is a viable technique for estimating the initial mass distribution of contaminants in the subsurface.
NASA Astrophysics Data System (ADS)
Frystacky, H.; Osorio-Murillo, C. A.; Over, M. W.; Kalbacher, T.; Gunnell, D.; Kolditz, O.; Ames, D.; Rubin, Y.
2013-12-01
The Method of Anchored Distributions (MAD) is a Bayesian technique for characterizing the uncertainty in geostatistical model parameters. Open-source software has been developed in a modular framework such that this technique can be applied to any forward model software via a driver. This presentation is about the driver that has been developed for OpenGeoSys (OGS), open-source software that can simulate many hydrogeological processes, including couple processes. MAD allows the use of multiple data types for conditioning the spatially random fields and assessing model parameter likelihood. For example, if simulating flow and mass transport, the inversion target variable could be hydraulic conductivity and the inversion data types could be head, concentration, or both. The driver detects from the OGS files which processes and variables are being used in a given project and allows MAD to prompt the user to choose those that are to be modeled or to be treated deterministically. In this way, any combination of processes allowed by OGS can have MAD applied. As for the software, there are two versions, each with its own OGS driver. A Windows desktop version is available as a graphical user interface and is ideal for the learning and teaching environment. High-throughput computing can even be achieved with this version via HTCondor if large projects want to be pursued in a computer lab. In addition to this desktop application, a Linux version is available equipped with MPI such that it can be run in parallel on a computer cluster. All releases can be downloaded from the MAD Codeplex site given below.
Evaluating Individualized Reading Programs: A Bayesian Model.
ERIC Educational Resources Information Center
Maxwell, Martha
Simple Bayesian approaches can be applied to answer specific questions in evaluating an individualized reading program. A small reading and study skills program located in the counseling center of a major research university collected and compiled data on student characteristics such as class, number of sessions attended, grade point average, and…
Ashton, Ruth A; Kefyalew, Takele; Rand, Alison; Sime, Heven; Assefa, Ashenafi; Mekasha, Addis; Edosa, Wasihun; Tesfaye, Gezahegn; Cano, Jorge; Teka, Hiwot; Reithinger, Richard; Pullan, Rachel L; Drakeley, Chris J; Brooker, Simon J
2015-07-01
Ethiopia has a diverse ecology and geography resulting in spatial and temporal variation in malaria transmission. Evidence-based strategies are thus needed to monitor transmission intensity and target interventions. A purposive selection of dried blood spots collected during cross-sectional school-based surveys in Oromia Regional State, Ethiopia, were tested for presence of antibodies against Plasmodium falciparum and P. vivax antigens. Spatially explicit binomial models of seroprevalence were created for each species using a Bayesian framework, and used to predict seroprevalence at 5 km resolution across Oromia. School seroprevalence showed a wider prevalence range than microscopy for both P. falciparum (0-50% versus 0-12.7%) and P. vivax (0-53.7% versus 0-4.5%), respectively. The P. falciparum model incorporated environmental predictors and spatial random effects, while P. vivax seroprevalence first-order trends were not adequately explained by environmental variables, and a spatial smoothing model was developed. This is the first demonstration of serological indicators being used to detect large-scale heterogeneity in malaria transmission using samples from cross-sectional school-based surveys. The findings support the incorporation of serological indicators into periodic large-scale surveillance such as Malaria Indicator Surveys, and with particular utility for low transmission and elimination settings.
Ashton, Ruth A.; Kefyalew, Takele; Rand, Alison; Sime, Heven; Assefa, Ashenafi; Mekasha, Addis; Edosa, Wasihun; Tesfaye, Gezahegn; Cano, Jorge; Teka, Hiwot; Reithinger, Richard; Pullan, Rachel L.; Drakeley, Chris J.; Brooker, Simon J.
2015-01-01
Ethiopia has a diverse ecology and geography resulting in spatial and temporal variation in malaria transmission. Evidence-based strategies are thus needed to monitor transmission intensity and target interventions. A purposive selection of dried blood spots collected during cross-sectional school-based surveys in Oromia Regional State, Ethiopia, were tested for presence of antibodies against Plasmodium falciparum and P. vivax antigens. Spatially explicit binomial models of seroprevalence were created for each species using a Bayesian framework, and used to predict seroprevalence at 5 km resolution across Oromia. School seroprevalence showed a wider prevalence range than microscopy for both P. falciparum (0–50% versus 0–12.7%) and P. vivax (0–53.7% versus 0–4.5%), respectively. The P. falciparum model incorporated environmental predictors and spatial random effects, while P. vivax seroprevalence first-order trends were not adequately explained by environmental variables, and a spatial smoothing model was developed. This is the first demonstration of serological indicators being used to detect large-scale heterogeneity in malaria transmission using samples from cross-sectional school-based surveys. The findings support the incorporation of serological indicators into periodic large-scale surveillance such as Malaria Indicator Surveys, and with particular utility for low transmission and elimination settings. PMID:25962770
A Practical Primer on Geostatistics
Olea, Ricardo A.
2009-01-01
THE CHALLENGE Most geological phenomena are extraordinarily complex in their interrelationships and vast in their geographical extension. Ordinarily, engineers and geoscientists are faced with corporate or scientific requirements to properly prepare geological models with measurements involving a small fraction of the entire area or volume of interest. Exact description of a system such as an oil reservoir is neither feasible nor economically possible. The results are necessarily uncertain. Note that the uncertainty is not an intrinsic property of the systems; it is the result of incomplete knowledge by the observer. THE AIM OF GEOSTATISTICS The main objective of geostatistics is the characterization of spatial systems that are incompletely known, systems that are common in geology. A key difference from classical statistics is that geostatistics uses the sampling location of every measurement. Unless the measurements show spatial correlation, the application of geostatistics is pointless. Ordinarily the need for additional knowledge goes beyond a few points, which explains the display of results graphically as fishnet plots, block diagrams, and maps. GEOSTATISTICAL METHODS Geostatistics is a collection of numerical techniques for the characterization of spatial attributes using primarily two tools: probabilistic models, which are used for spatial data in a manner similar to the way in which time-series analysis characterizes temporal data, or pattern recognition techniques. The probabilistic models are used as a way to handle uncertainty in results away from sampling locations, making a radical departure from alternative approaches like inverse distance estimation methods. DIFFERENCES WITH TIME SERIES On dealing with time-series analysis, users frequently concentrate their attention on extrapolations for making forecasts. Although users of geostatistics may be interested in extrapolation, the methods work at their best interpolating. This simple difference has
Technical note: Bayesian calibration of dynamic ruminant nutrition models.
Reed, K F; Arhonditsis, G B; France, J; Kebreab, E
2016-08-01
Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling.
NASA Astrophysics Data System (ADS)
Goeckede, M.; Yadav, V.; Mueller, K. L.; Gourdji, S. M.; Michalak, A. M.; Law, B. E.
2011-12-01
We designed a framework to train biogeophysics-biogeochemistry process models using atmospheric inverse modeling, multiple databases characterizing biosphere-atmosphere exchange, and advanced geostatistics. Our main objective is to reduce uncertainties in carbon cycle and climate projections by exploring the full spectrum of process representation, data assimilation and statistical tools currently available. Incorporating multiple high-quality data sources like eddy-covariance flux databases or biometric inventories has the potential to produce a rigorous data-constrained process model implementation. However, representation errors may bias spatially explicit model output when upscaling to regional to global scales. Atmospheric inverse modeling can be used to validate the regional representativeness of the fluxes, but each piece of prior information from the surface databases limits the ability of the inverse model to characterize the carbon cycle from the perspective of the atmospheric observations themselves. The use of geostatistical inverse modeling (GIM) holds the potential to overcome these limitations, replacing rigid prior patterns with information on how flux fields are correlated across time and space, as well as ancillary environmental data related to the carbon fluxes. We present results from a regional scale data assimilation study that focuses on generating terrestrial CO2 fluxes at high spatial and temporal resolution in the Pacific Northwest United States. Our framework couples surface fluxes from different biogeochemistry process models to very high resolution atmospheric transport using mesoscale modeling (WRF) and Lagrangian Particle dispersion (STILT). We use GIM to interpret the spatiotemporal differences between bottom-up and top-down flux fields. GIM results make it possible to link those differences to input parameters and processes, strengthening model parameterization and process understanding. Results are compared against independent
Bayesian Estimation of the Logistic Positive Exponent IRT Model
ERIC Educational Resources Information Center
Bolfarine, Heleno; Bazan, Jorge Luis
2010-01-01
A Bayesian inference approach using Markov Chain Monte Carlo (MCMC) is developed for the logistic positive exponent (LPE) model proposed by Samejima and for a new skewed Logistic Item Response Theory (IRT) model, named Reflection LPE model. Both models lead to asymmetric item characteristic curves (ICC) and can be appropriate because a symmetric…
Estimating tree height-diameter models with the Bayesian method.
Zhang, Xiongqing; Duan, Aiguo; Zhang, Jianguo; Xiang, Congwei
2014-01-01
Six candidate height-diameter models were used to analyze the height-diameter relationships. The common methods for estimating the height-diameter models have taken the classical (frequentist) approach based on the frequency interpretation of probability, for example, the nonlinear least squares method (NLS) and the maximum likelihood method (ML). The Bayesian method has an exclusive advantage compared with classical method that the parameters to be estimated are regarded as random variables. In this study, the classical and Bayesian methods were used to estimate six height-diameter models, respectively. Both the classical method and Bayesian method showed that the Weibull model was the "best" model using data1. In addition, based on the Weibull model, data2 was used for comparing Bayesian method with informative priors with uninformative priors and classical method. The results showed that the improvement in prediction accuracy with Bayesian method led to narrower confidence bands of predicted value in comparison to that for the classical method, and the credible bands of parameters with informative priors were also narrower than uninformative priors and classical method. The estimated posterior distributions for parameters can be set as new priors in estimating the parameters using data2.
Modelling blood-brain barrier partitioning using Bayesian neural nets.
Winkler, David A; Burden, Frank R
2004-07-01
We have employed three families of molecular molecular descriptors, together with Bayesian regularized neural nets, to model the partitioning of a diverse range of drugs and other small molecules across the blood-brain barrier (BBB). The relative efficacy of each descriptors class is compared, and the advantages of flexible, parsimonious, model free mapping methods, like Bayesian neural nets, illustrated. The relative importance of the molecular descriptors for the most predictive BBB model were determined by use of automatic relevance determination (ARD), and compared with the important descriptors from other literature models of BBB partitioning.
Bayesian failure probability model sensitivity study. Final report
Not Available
1986-05-30
The Office of the Manager, National Communications System (OMNCS) has developed a system-level approach for estimating the effects of High-Altitude Electromagnetic Pulse (HEMP) on the connectivity of telecommunications networks. This approach incorporates a Bayesian statistical model which estimates the HEMP-induced failure probabilities of telecommunications switches and transmission facilities. The purpose of this analysis is to address the sensitivity of the Bayesian model. This is done by systematically varying two model input parameters--the number of observations, and the equipment failure rates. Throughout the study, a non-informative prior distribution is used. The sensitivity of the Bayesian model to the noninformative prior distribution is investigated from a theoretical mathematical perspective.
On the Bayesian Nonparametric Generalization of IRT-Type Models
ERIC Educational Resources Information Center
San Martin, Ernesto; Jara, Alejandro; Rolin, Jean-Marie; Mouchart, Michel
2011-01-01
We study the identification and consistency of Bayesian semiparametric IRT-type models, where the uncertainty on the abilities' distribution is modeled using a prior distribution on the space of probability measures. We show that for the semiparametric Rasch Poisson counts model, simple restrictions ensure the identification of a general…
Bayesian non-parametrics and the probabilistic approach to modelling
Ghahramani, Zoubin
2013-01-01
Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman’s coalescent, Dirichlet diffusion trees and Wishart processes. PMID:23277609
Bayesian Network Models for Local Dependence among Observable Outcome Variables
ERIC Educational Resources Information Center
Almond, Russell G.; Mulder, Joris; Hemat, Lisa A.; Yan, Duanli
2009-01-01
Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task, which may be dependent. This article explores four design patterns for modeling locally dependent observations: (a) no context--ignores dependence among observables; (b) compensatory context--introduces…
Semiparametric Thurstonian Models for Recurrent Choices: A Bayesian Analysis
ERIC Educational Resources Information Center
Ansari, Asim; Iyengar, Raghuram
2006-01-01
We develop semiparametric Bayesian Thurstonian models for analyzing repeated choice decisions involving multinomial, multivariate binary or multivariate ordinal data. Our modeling framework has multiple components that together yield considerable flexibility in modeling preference utilities, cross-sectional heterogeneity and parameter-driven…
Schröder, Winfried
2006-05-01
By the example of environmental monitoring, some applications of geographic information systems (GIS), geostatistics, metadata banking, and Classification and Regression Trees (CART) are presented. These tools are recommended for mapping statistically estimated hot spots of vectors and pathogens. GIS were introduced as tools for spatially modelling the real world. The modelling can be done by mapping objects according to the spatial information content of data. Additionally, this can be supported by geostatistical and multivariate statistical modelling. This is demonstrated by the example of modelling marine habitats of benthic communities and of terrestrial ecoregions. Such ecoregionalisations may be used to predict phenomena based on the statistical relation between measurements of an interesting phenomenon such as, e.g., the incidence of medically relevant species and correlated characteristics of the ecoregions. The combination of meteorological data and data on plant phenology can enhance the spatial resolution of the information on climate change. To this end, meteorological and phenological data have to be correlated. To enable this, both data sets which are from disparate monitoring networks have to be spatially connected by means of geostatistical estimation. This is demonstrated by the example of transformation of site-specific data on plant phenology into surface data. The analysis allows for spatial comparison of the phenology during the two periods 1961-1990 and 1991-2002 covering whole Germany. The changes in both plant phenology and air temperature were proved to be statistically significant. Thus, they can be combined by GIS overlay technique to enhance the spatial resolution of the information on the climate change and use them for the prediction of vector incidences at the regional scale. The localisation of such risk hot spots can be done by geometrically merging surface data on promoting factors. This is demonstrated by the example of the
On the Adequacy of Bayesian Evaluations of Categorization Models: Reply to Vanpaemel and Lee (2012)
ERIC Educational Resources Information Center
Wills, Andy J.; Pothos, Emmanuel M.
2012-01-01
Vanpaemel and Lee (2012) argued, and we agree, that the comparison of formal models can be facilitated by Bayesian methods. However, Bayesian methods neither precede nor supplant our proposals (Wills & Pothos, 2012), as Bayesian methods can be applied both to our proposals and to their polar opposites. Furthermore, the use of Bayesian methods to…
Grisotto, Laura; Consonni, Dario; Cecconi, Lorenzo; Catelan, Dolores; Lagazio, Corrado; Bertazzi, Pier Alberto; Baccini, Michela; Biggeri, Annibale
2016-01-01
In this paper the focus is on environmental statistics, with the aim of estimating the concentration surface and related uncertainty of an air pollutant. We used air quality data recorded by a network of monitoring stations within a Bayesian framework to overcome difficulties in accounting for prediction uncertainty and to integrate information provided by deterministic models based on emissions meteorology and chemico-physical characteristics of the atmosphere. Several authors have proposed such integration, but all the proposed approaches rely on representativeness and completeness of existing air pollution monitoring networks. We considered the situation in which the spatial process of interest and the sampling locations are not independent. This is known in the literature as the preferential sampling problem, which if ignored in the analysis, can bias geostatistical inferences. We developed a Bayesian geostatistical model to account for preferential sampling with the main interest in statistical integration and uncertainty. We used PM10 data arising from the air quality network of the Environmental Protection Agency of Lombardy Region (Italy) and numerical outputs from the deterministic model. We specified an inhomogeneous Poisson process for the sampling locations intensities and a shared spatial random component model for the dependence between the spatial location of monitors and the pollution surface. We found greater predicted standard deviation differences in areas not properly covered by the air quality network. In conclusion, in this context inferences on prediction uncertainty may be misleading when geostatistical modelling does not take into account preferential sampling. PMID:27087040
Riva, Monica; Guadagnini, Alberto; Fernandez-Garcia, Daniel; Sanchez-Vila, Xavier; Ptak, Thomas
2008-10-23
We analyze the relative importance of the selection of (1) the geostatistical model depicting the structural heterogeneity of an aquifer, and (2) the basic processes to be included in the conceptual model, to describe the main aspects of solute transport at an experimental site. We focus on the results of a forced-gradient tracer test performed at the "Lauswiesen" experimental site, near Tübingen, Germany. In the experiment, NaBr is injected into a well located 52 m from a pumping well. Multilevel breakthrough curves (BTCs) are measured in the latter. We conceptualize the aquifer as a three-dimensional, doubly stochastic composite medium, where distributions of geomaterials and attributes, e.g., hydraulic conductivity (K) and porosity (phi), can be uncertain. Several alternative transport processes are considered: advection, advection-dispersion and/or mass-transfer between mobile and immobile regions. Flow and transport are tackled within a stochastic Monte Carlo framework to describe key features of the experimental BTCs, such as temporal moments, peak time, and pronounced tailing. We find that, regardless the complexity of the conceptual transport model adopted, an adequate description of heterogeneity is crucial for generating alternative equally likely realizations of the system that are consistent with (a) the statistical description of the heterogeneous system, as inferred from the data, and (b) salient features of the depth-averaged breakthrough curve, including preferential paths, slow release of mass particles, and anomalous spreading. While the available geostatistical characterization of heterogeneity can explain most of the integrated behavior of transport (depth-averaged breakthrough curve), not all multilevel BTCs are described with equal success. This suggests that transport models simply based on integrated measurements may not ensure an accurate representation of many of the important features required in three-dimensional transport models. PMID
Involving Stakeholders in Building Integrated Fisheries Models Using Bayesian Methods
NASA Astrophysics Data System (ADS)
Haapasaari, Päivi; Mäntyniemi, Samu; Kuikka, Sakari
2013-06-01
A participatory Bayesian approach was used to investigate how the views of stakeholders could be utilized to develop models to help understand the Central Baltic herring fishery. In task one, we applied the Bayesian belief network methodology to elicit the causal assumptions of six stakeholders on factors that influence natural mortality, growth, and egg survival of the herring stock in probabilistic terms. We also integrated the expressed views into a meta-model using the Bayesian model averaging (BMA) method. In task two, we used influence diagrams to study qualitatively how the stakeholders frame the management problem of the herring fishery and elucidate what kind of causalities the different views involve. The paper combines these two tasks to assess the suitability of the methodological choices to participatory modeling in terms of both a modeling tool and participation mode. The paper also assesses the potential of the study to contribute to the development of participatory modeling practices. It is concluded that the subjective perspective to knowledge, that is fundamental in Bayesian theory, suits participatory modeling better than a positivist paradigm that seeks the objective truth. The methodology provides a flexible tool that can be adapted to different kinds of needs and challenges of participatory modeling. The ability of the approach to deal with small data sets makes it cost-effective in participatory contexts. However, the BMA methodology used in modeling the biological uncertainties is so complex that it needs further development before it can be introduced to wider use in participatory contexts.
Bayesian Analysis of Order-Statistics Models for Ranking Data.
ERIC Educational Resources Information Center
Yu, Philip L. H.
2000-01-01
Studied the order-statistics models, extending the usual normal order-statistics model into one in which the underlying random variables followed a multivariate normal distribution. Used a Bayesian approach and the Gibbs sampling technique. Applied the proposed method to analyze presidential election data from the American Psychological…
Bayesian Estimation of the DINA Model with Gibbs Sampling
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2015-01-01
A Bayesian model formulation of the deterministic inputs, noisy "and" gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas,…
Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models
ERIC Educational Resources Information Center
Price, Larry R.
2012-01-01
The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…
A Bayesian Approach for Analyzing Longitudinal Structural Equation Models
ERIC Educational Resources Information Center
Song, Xin-Yuan; Lu, Zhao-Hua; Hser, Yih-Ing; Lee, Sik-Yum
2011-01-01
This article considers a Bayesian approach for analyzing a longitudinal 2-level nonlinear structural equation model with covariates, and mixed continuous and ordered categorical variables. The first-level model is formulated for measures taken at each time point nested within individuals for investigating their characteristics that are dynamically…
Bayesian Semiparametric Structural Equation Models with Latent Variables
ERIC Educational Resources Information Center
Yang, Mingan; Dunson, David B.
2010-01-01
Structural equation models (SEMs) with latent variables are widely useful for sparse covariance structure modeling and for inferring relationships among latent variables. Bayesian SEMs are appealing in allowing for the incorporation of prior information and in providing exact posterior distributions of unknowns, including the latent variables. In…
Bayesian Finite Mixtures for Nonlinear Modeling of Educational Data.
ERIC Educational Resources Information Center
Tirri, Henry; And Others
A Bayesian approach for finding latent classes in data is discussed. The approach uses finite mixture models to describe the underlying structure in the data and demonstrate that the possibility of using full joint probability models raises interesting new prospects for exploratory data analysis. The concepts and methods discussed are illustrated…
Bayesian methods for characterizing unknown parameters of material models
Emery, J. M.; Grigoriu, M. D.; Field Jr., R. V.
2016-02-04
A Bayesian framework is developed for characterizing the unknown parameters of probabilistic models for material properties. In this framework, the unknown parameters are viewed as random and described by their posterior distributions obtained from prior information and measurements of quantities of interest that are observable and depend on the unknown parameters. The proposed Bayesian method is applied to characterize an unknown spatial correlation of the conductivity field in the definition of a stochastic transport equation and to solve this equation by Monte Carlo simulation and stochastic reduced order models (SROMs). As a result, the Bayesian method is also employed tomore » characterize unknown parameters of material properties for laser welds from measurements of peak forces sustained by these welds.« less
Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images.
Shi, Zhiyuan; Hospedales, Timothy M; Xiang, Tao
2015-10-01
We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that "explaining away" inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and "push out" objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation. PMID:26340253
Maximum Likelihood Bayesian Averaging of Spatial Variability Models in Unsaturated Fractured Tuff
Ye, Ming; Neuman, Shlomo P.; Meyer, Philip D.
2004-05-25
Hydrologic analyses typically rely on a single conceptual-mathematical model. Yet hydrologic environments are open and complex, rendering them prone to multiple interpretations and mathematical descriptions. Adopting only one of these may lead to statistical bias and underestimation of uncertainty. Bayesian Model Averaging (BMA) provides an optimal way to combine the predictions of several competing models and to assess their joint predictive uncertainty. However, it tends to be computationally demanding and relies heavily on prior information about model parameters. We apply a maximum likelihood (ML) version of BMA (MLBMA) to seven alternative variogram models of log air permeability data from single-hole pneumatic injection tests in six boreholes at the Apache Leap Research Site (ALRS) in central Arizona. Unbiased ML estimates of variogram and drift parameters are obtained using Adjoint State Maximum Likelihood Cross Validation in conjunction with Universal Kriging and Generalized L east Squares. Standard information criteria provide an ambiguous ranking of the models, which does not justify selecting one of them and discarding all others as is commonly done in practice. Instead, we eliminate some of the models based on their negligibly small posterior probabilities and use the rest to project the measured log permeabilities by kriging onto a rock volume containing the six boreholes. We then average these four projections, and associated kriging variances, using the posterior probability of each model as weight. Finally, we cross-validate the results by eliminating from consideration all data from one borehole at a time, repeating the above process, and comparing the predictive capability of MLBMA with that of each individual model. We find that MLBMA is superior to any individual geostatistical model of log permeability among those we consider at the ALRS.
Bayesian approach to neural-network modeling with input uncertainty.
Wright, W A
1999-01-01
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural-network framework which allows for input noise provided that some model of the noise process exists. In the limit where the noise process is small and symmetric it is shown, using the Laplace approximation, that this method gives an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using a Markov chain Monte Carlo method, it is demonstrated that it is possible to infer the regression over the noiseless input.
Bayesian IRT Guessing Models for Partial Guessing Behaviors
ERIC Educational Resources Information Center
Cao, Jing; Stokes, S. Lynne
2008-01-01
According to the recent Nation's Report Card, 12th-graders failed to produce gains on the 2005 National Assessment of Educational Progress (NAEP) despite earning better grades on average. One possible explanation is that 12th-graders were not motivated taking the NAEP, which is a low-stakes test. We develop three Bayesian IRT mixture models to…
Shortlist B: A Bayesian Model of Continuous Speech Recognition
ERIC Educational Resources Information Center
Norris, Dennis; McQueen, James M.
2008-01-01
A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward…
A Bayesian A-optimal and model robust design criterion.
Zhou, Xiaojie; Joseph, Lawrence; Wolfson, David B; Bélisle, Patrick
2003-12-01
Suppose that the true model underlying a set of data is one of a finite set of candidate models, and that parameter estimation for this model is of primary interest. With this goal, optimal design must depend on a loss function across all possible models. A common method that accounts for model uncertainty is to average the loss over all models; this is the basis of what is known as Läuter's criterion. We generalize Läuter's criterion and show that it can be placed in a Bayesian decision theoretic framework, by extending the definition of Bayesian A-optimality. We use this generalized A-optimality to find optimal design points in an environmental safety setting. In estimating the smallest detectable trace limit in a water contamination problem, we obtain optimal designs that are quite different from those suggested by standard A-optimality.
A Bayesian Alternative for Multi-objective Ecohydrological Model Specification
NASA Astrophysics Data System (ADS)
Tang, Y.; Marshall, L. A.; Sharma, A.; Ajami, H.
2015-12-01
Process-based ecohydrological models combine the study of hydrological, physical, biogeochemical and ecological processes of the catchments, which are usually more complex and parametric than conceptual hydrological models. Thus, appropriate calibration objectives and model uncertainty analysis are essential for ecohydrological modeling. In recent years, Bayesian inference has become one of the most popular tools for quantifying the uncertainties in hydrological modeling with the development of Markov Chain Monte Carlo (MCMC) techniques. Our study aims to develop appropriate prior distributions and likelihood functions that minimize the model uncertainties and bias within a Bayesian ecohydrological framework. In our study, a formal Bayesian approach is implemented in an ecohydrological model which combines a hydrological model (HyMOD) and a dynamic vegetation model (DVM). Simulations focused on one objective likelihood (Streamflow/LAI) and multi-objective likelihoods (Streamflow and LAI) with different weights are compared. Uniform, weakly informative and strongly informative prior distributions are used in different simulations. The Kullback-leibler divergence (KLD) is used to measure the dis(similarity) between different priors and corresponding posterior distributions to examine the parameter sensitivity. Results show that different prior distributions can strongly influence posterior distributions for parameters, especially when the available data is limited or parameters are insensitive to the available data. We demonstrate differences in optimized parameters and uncertainty limits in different cases based on multi-objective likelihoods vs. single objective likelihoods. We also demonstrate the importance of appropriately defining the weights of objectives in multi-objective calibration according to different data types.
Uncertainties in ozone concentrations predicted with a Lagrangian photochemical air quality model have been estimated using Bayesian Monte Carlo (BMC) analysis. Bayesian Monte Carlo analysis provides a means of combining subjective "prior" uncertainty estimates developed ...
Polygenic Modeling with Bayesian Sparse Linear Mixed Models
Zhou, Xiang; Carbonetto, Peter; Stephens, Matthew
2013-01-01
Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html. PMID:23408905
A Bayesian nonlinear mixed-effects disease progression model
Kim, Seongho; Jang, Hyejeong; Wu, Dongfeng; Abrams, Judith
2016-01-01
A nonlinear mixed-effects approach is developed for disease progression models that incorporate variation in age in a Bayesian framework. We further generalize the probability model for sensitivity to depend on age at diagnosis, time spent in the preclinical state and sojourn time. The developed models are then applied to the Johns Hopkins Lung Project data and the Health Insurance Plan for Greater New York data using Bayesian Markov chain Monte Carlo and are compared with the estimation method that does not consider random-effects from age. Using the developed models, we obtain not only age-specific individual-level distributions, but also population-level distributions of sensitivity, sojourn time and transition probability. PMID:26798562
[A medical image semantic modeling based on hierarchical Bayesian networks].
Lin, Chunyi; Ma, Lihong; Yin, Junxun; Chen, Jianyu
2009-04-01
A semantic modeling approach for medical image semantic retrieval based on hierarchical Bayesian networks was proposed, in allusion to characters of medical images. It used GMM (Gaussian mixture models) to map low-level image features into object semantics with probabilities, then it captured high-level semantics through fusing these object semantics using a Bayesian network, so that it built a multi-layer medical image semantic model, aiming to enable automatic image annotation and semantic retrieval by using various keywords at different semantic levels. As for the validity of this method, we have built a multi-level semantic model from a small set of astrocytoma MRI (magnetic resonance imaging) samples, in order to extract semantics of astrocytoma in malignant degree. Experiment results show that this is a superior approach.
NASA Astrophysics Data System (ADS)
Chahal, M. K.; Brown, D. J.; Brooks, E. S.; Campbell, C.; Cobos, D. R.; Vierling, L. A.
2012-12-01
Estimating soil moisture content continuously over space and time using geo-statistical techniques supports the refinement of process-based watershed hydrology models and the application of soil process models (e.g. biogeochemical models predicting greenhouse gas fluxes) to complex landscapes. In this study, we model soil profile volumetric moisture content for five agricultural fields with loess soils in the Palouse region of Eastern Washington and Northern Idaho. Using a combination of stratification and space-filling techniques, we selected 42 representative and distributed measurement locations in the Cook Agronomy Farm (Pullman, WA) and 12 locations each in four additional grower fields that span the precipitation gradient across the Palouse. At each measurement location, soil moisture was measured on an hourly basis at five different depths (30, 60, 90, 120, and 150 cm) using Decagon 5-TE/5-TM soil moisture sensors (Decagon Devices, Pullman, WA, USA). This data was collected over three years for the Cook Agronomy Farm and one year for each of the grower fields. In addition to ordinary kriging, we explored the correlation of volumetric water content with external, spatially exhaustive indices derived from terrain models, optical remote sensing imagery, and proximal soil sensing data (electromagnetic induction and VisNIR penetrometer)
Exemplar models as a mechanism for performing Bayesian inference.
Shi, Lei; Griffiths, Thomas L; Feldman, Naomi H; Sanborn, Adam N
2010-08-01
Probabilistic models have recently received much attention as accounts of human cognition. However, most research in which probabilistic models have been used has been focused on formulating the abstract problems behind cognitive tasks and their optimal solutions, rather than on mechanisms that could implement these solutions. Exemplar models are a successful class of psychological process models in which an inventory of stored examples is used to solve problems such as identification, categorization, and function learning. We show that exemplar models can be used to perform a sophisticated form of Monte Carlo approximation known as importance sampling and thus provide a way to perform approximate Bayesian inference. Simulations of Bayesian inference in speech perception, generalization along a single dimension, making predictions about everyday events, concept learning, and reconstruction from memory show that exemplar models can often account for human performance with only a few exemplars, for both simple and relatively complex prior distributions. These results suggest that exemplar models provide a possible mechanism for implementing at least some forms of Bayesian inference. PMID:20702863
Yu, Guirui; Zheng, Zemei; Wang, Qiufeng; Fu, Yuling; Zhuang, Jie; Sun, Xiaomin; Wang, Yuesi
2010-08-15
Quantification of the spatiotemporal pattern of soil respiration (R(s)) at the regional scale can provide a theoretical basis and fundamental data for accurate evaluation of the global carbon budget. This study summarizes the R(s) data measured in China from 1995 to 2004. Based on the data, a new region-scale geostatistical model of soil respiration (GSMSR) was developed by modifying a global scale statistical model. The GSMSR model, which is driven by monthly air temperature, monthly precipitation, and soil organic carbon (SOC) density, can capture 64% of the spatiotemporal variability of soil R(s). We evaluated the spatiotemporal pattern of R(s) in China using the GSMSR model. The estimated results demonstrate that the annual R(s) in China ranged from 3.77 to 4.00 Pg C yr(-1) between 1995 and 2004, with an average value of 3.84 +/- 0.07 Pg C yr(-1), contributing 3.92%-4.87% to the global soil CO(2) emission. Annual R(s) rate of evergreen broadleaved forest ecosystem was 698 +/- 11 g C m(-2) yr(-1), significantly higher than that of grassland (439 +/- 7 g C m(-2) yr(-1)) and cropland (555 +/- 12 g C m(-2) yr(-1)). The contributions of grassland, cropland, and forestland ecosystems to the total R(s) in China were 48.38 +/- 0.35%, 22.19 +/- 0.18%, and 20.84 +/- 0.13%, respectively. PMID:20704202
Bayesian methods for assessing system reliability: models and computation.
Graves, T. L.; Hamada, Michael,
2004-01-01
There are many challenges with assessing the reliability of a system today. These challenges arise because a system may be aging and full system tests may be too expensive or can no longer be performed. Without full system testing, one must integrate (1) all science and engineering knowledge, models and simulations, (2) information and data at various levels of the system, e.g., subsystems and components and (3) information and data from similar systems, subsystems and components. The analyst must work with various data types and how the data are collected, account for measurement bias and uncertainty, deal with model and simulation uncertainty and incorporate expert knowledge. Bayesian hierarchical modeling provides a rigorous way to combine information from multiple sources and different types of information. However, an obstacle to applying Bayesian methods is the need to develop new software to analyze novel statistical models. We discuss a new statistical modeling environment, YADAS, that facilitates the development of Bayesian statistical analyses. It includes classes that help analysts specify new models, as well as classes that support the creation of new analysis algorithms. We illustrate these concepts using several examples.
HIBAYES: Global 21-cm Bayesian Monte-Carlo Model Fitting
NASA Astrophysics Data System (ADS)
Zwart, Jonathan T. L.; Price, Daniel; Bernardi, Gianni
2016-06-01
HIBAYES implements fully-Bayesian extraction of the sky-averaged (global) 21-cm signal from the Cosmic Dawn and Epoch of Reionization in the presence of foreground emission. User-defined likelihood and prior functions are called by the sampler PyMultiNest (ascl:1606.005) in order to jointly explore the full (signal plus foreground) posterior probability distribution and evaluate the Bayesian evidence for a given model. Implemented models, for simulation and fitting, include gaussians (HI signal) and polynomials (foregrounds). Some simple plotting and analysis tools are supplied. The code can be extended to other models (physical or empirical), to incorporate data from other experiments, or to use alternative Monte-Carlo sampling engines as required.
Bayesian point event modeling in spatial and environmental epidemiology.
Lawson, Andrew B
2012-10-01
This paper reviews the current state of point event modeling in spatial epidemiology from a Bayesian perspective. Point event (or case event) data arise when geo-coded addresses of disease events are available. Often, this level of spatial resolution would not be accessible due to medical confidentiality constraints. However, for the examination of small spatial scales, it is important to be capable of examining point process data directly. Models for such data are usually formulated based on point process theory. In addition, special conditioning arguments can lead to simpler Bernoulli likelihoods and logistic spatial models. Goodness-of-fit diagnostics and Bayesian residuals are also considered. Applications within putative health hazard risk assessment, cluster detection, and linkage to environmental risk fields (misalignment) are considered.
NASA Astrophysics Data System (ADS)
Parasyris, Antonios E.; Spanoudaki, Katerina; Kampanis, Nikolaos A.
2016-04-01
Groundwater level monitoring networks provide essential information for water resources management, especially in areas with significant groundwater exploitation for agricultural and domestic use. Given the high maintenance costs of these networks, development of tools, which can be used by regulators for efficient network design is essential. In this work, a monitoring network optimisation tool is presented. The network optimisation tool couples geostatistical modelling based on the Spartan family variogram with a genetic algorithm method and is applied to Mires basin in Crete, Greece, an area of high socioeconomic and agricultural interest, which suffers from groundwater overexploitation leading to a dramatic decrease of groundwater levels. The purpose of the optimisation tool is to determine which wells to exclude from the monitoring network because they add little or no beneficial information to groundwater level mapping of the area. Unlike previous relevant investigations, the network optimisation tool presented here uses Ordinary Kriging with the recently-established non-differentiable Spartan variogram for groundwater level mapping, which, based on a previous geostatistical study in the area leads to optimal groundwater level mapping. Seventy boreholes operate in the area for groundwater abstraction and water level monitoring. The Spartan variogram gives overall the most accurate groundwater level estimates followed closely by the power-law model. The geostatistical model is coupled to an integer genetic algorithm method programmed in MATLAB 2015a. The algorithm is used to find the set of wells whose removal leads to the minimum error between the original water level mapping using all the available wells in the network and the groundwater level mapping using the reduced well network (error is defined as the 2-norm of the difference between the original mapping matrix with 70 wells and the mapping matrix of the reduced well network). The solution to the
Bayesian model updating using incomplete modal data without mode matching
NASA Astrophysics Data System (ADS)
Sun, Hao; Büyüköztürk, Oral
2016-04-01
This study investigates a new probabilistic strategy for model updating using incomplete modal data. A hierarchical Bayesian inference is employed to model the updating problem. A Markov chain Monte Carlo technique with adaptive random-work steps is used to draw parameter samples for uncertainty quantification. Mode matching between measured and predicted modal quantities is not required through model reduction. We employ an iterated improved reduced system technique for model reduction. The reduced model retains the dynamic features as close as possible to those of the model before reduction. The proposed algorithm is finally validated by an experimental example.
Application of the Bayesian dynamic survival model in medicine.
He, Jianghua; McGee, Daniel L; Niu, Xufeng
2010-02-10
The Bayesian dynamic survival model (BDSM), a time-varying coefficient survival model from the Bayesian prospective, was proposed in early 1990s but has not been widely used or discussed. In this paper, we describe the model structure of the BDSM and introduce two estimation approaches for BDSMs: the Markov Chain Monte Carlo (MCMC) approach and the linear Bayesian (LB) method. The MCMC approach estimates model parameters through sampling and is computationally intensive. With the newly developed geoadditive survival models and software BayesX, the BDSM is available for general applications. The LB approach is easier in terms of computations but it requires the prespecification of some unknown smoothing parameters. In a simulation study, we use the LB approach to show the effects of smoothing parameters on the performance of the BDSM and propose an ad hoc method for identifying appropriate values for those parameters. We also demonstrate the performance of the MCMC approach compared with the LB approach and a penalized partial likelihood method available in software R packages. A gastric cancer trial is utilized to illustrate the application of the BDSM. PMID:20014356
Application of the Bayesian dynamic survival model in medicine.
He, Jianghua; McGee, Daniel L; Niu, Xufeng
2010-02-10
The Bayesian dynamic survival model (BDSM), a time-varying coefficient survival model from the Bayesian prospective, was proposed in early 1990s but has not been widely used or discussed. In this paper, we describe the model structure of the BDSM and introduce two estimation approaches for BDSMs: the Markov Chain Monte Carlo (MCMC) approach and the linear Bayesian (LB) method. The MCMC approach estimates model parameters through sampling and is computationally intensive. With the newly developed geoadditive survival models and software BayesX, the BDSM is available for general applications. The LB approach is easier in terms of computations but it requires the prespecification of some unknown smoothing parameters. In a simulation study, we use the LB approach to show the effects of smoothing parameters on the performance of the BDSM and propose an ad hoc method for identifying appropriate values for those parameters. We also demonstrate the performance of the MCMC approach compared with the LB approach and a penalized partial likelihood method available in software R packages. A gastric cancer trial is utilized to illustrate the application of the BDSM.
Bayesian analysis of botanical epidemics using stochastic compartmental models.
Gibson, G J; Kleczkowski, A; Gilligan, C A
2004-08-17
A stochastic model for an epidemic, incorporating susceptible, latent, and infectious states, is developed. The model represents primary and secondary infection rates and a time-varying host susceptibility with applications to a wide range of epidemiological systems. A Markov chain Monte Carlo algorithm is presented that allows the model to be fitted to experimental observations within a Bayesian framework. The approach allows the uncertainty in unobserved aspects of the process to be represented in the parameter posterior densities. The methods are applied to experimental observations of damping-off of radish (Raphanus sativus) caused by the fungal pathogen Rhizoctonia solani, in the presence and absence of the antagonistic fungus Trichoderma viride, a biological control agent that has previously been shown to affect the rate of primary infection by using a maximum-likelihood estimate for a simpler model with no allowance for a latent period. Using the Bayesian analysis, we are able to estimate the latent period from population data, even when there is uncertainty in discriminating infectious from latently infected individuals in data collection. We also show that the inference that T. viride can control primary, but not secondary, infection is robust to inclusion of the latent period in the model, although the absolute values of the parameters change. Some refinements and potential difficulties with the Bayesian approach in this context, when prior information on parameters is lacking, are discussed along with broader applications of the methods to a wide range of epidemiological systems.
NASA Astrophysics Data System (ADS)
He, X.; Sonnenborg, T. O.; Jørgensen, F.; Jensen, K. H.
2013-09-01
Multiple-point geostatistic simulation (MPS) has recently become popular in stochastic hydrogeology, primarily because of its capability to derive multivariate distributions from the training image (TI). However, its application in three dimensional simulations has been constrained by the difficulty of constructing 3-D TI. The object-based TiGenerator may be a useful tool in this regard; yet the sensitivity of model predictions to the training image has not been documented. Another issue in MPS is the integration of multiple geophysical data. The best way to retrieve and incorporate information from high resolution geophysical data is still under discussion. This work shows that TI from TiGenerator delivers acceptable results when used for groundwater modeling, although the TI directly converted from high resolution geophysical data leads to better simulation. The model results also indicate that soft conditioning in MPS is a convenient and efficient way of integrating secondary data such as 3-D airborne electromagnetic data, but over conditioning has to be avoided.
Assessment of substitution model adequacy using frequentist and Bayesian methods.
Ripplinger, Jennifer; Sullivan, Jack
2010-12-01
In order to have confidence in model-based phylogenetic methods, such as maximum likelihood (ML) and Bayesian analyses, one must use an appropriate model of molecular evolution identified using statistically rigorous criteria. Although model selection methods such as the likelihood ratio test and Akaike information criterion are widely used in the phylogenetic literature, model selection methods lack the ability to reject all models if they provide an inadequate fit to the data. There are two methods, however, that assess absolute model adequacy, the frequentist Goldman-Cox (GC) test and Bayesian posterior predictive simulations (PPSs), which are commonly used in conjunction with the multinomial log likelihood test statistic. In this study, we use empirical and simulated data to evaluate the adequacy of common substitution models using both frequentist and Bayesian methods and compare the results with those obtained with model selection methods. In addition, we investigate the relationship between model adequacy and performance in ML and Bayesian analyses in terms of topology, branch lengths, and bipartition support. We show that tests of model adequacy based on the multinomial likelihood often fail to reject simple substitution models, especially when the models incorporate among-site rate variation (ASRV), and normally fail to reject less complex models than those chosen by model selection methods. In addition, we find that PPSs often fail to reject simpler models than the GC test. Use of the simplest substitution models not rejected based on fit normally results in similar but divergent estimates of tree topology and branch lengths. In addition, use of the simplest adequate substitution models can affect estimates of bipartition support, although these differences are often small with the largest differences confined to poorly supported nodes. We also find that alternative assumptions about ASRV can affect tree topology, tree length, and bipartition support. Our
NASA Astrophysics Data System (ADS)
Wang, Hui; Wellmann, Florian
2016-04-01
It is generally accepted that 3D geological models inferred from observed data will contain a certain amount of uncertainties. The uncertainty quantification and stochastic sampling methods are essential for gaining the insight into the geological variability of subsurface structures. In the community of deterministic or traditional modelling techniques, classical geo-statistical methods using boreholes (hard data sets) are still most widely accepted although suffering certain drawbacks. Modern geophysical measurements provide us regional data sets in 2D or 3D spaces either directly from sensors or indirectly from inverse problem solving using observed signal (soft data sets). We propose a stochastic modelling framework to extract subsurface heterogeneity from multiple and complementary types of data. In the presented work, subsurface heterogeneity is considered as the "hidden link" among multiple spatial data sets as well as inversion results. Hidden Markov random field models are employed to perform 3D segmentation which is the representation of the "hidden link". Finite Gaussian mixture models are adopted to characterize the statistical parameters of the multiple data sets. The uncertainties are quantified via a Gibbs sampling process under the Bayesian inferential framework. The proposed modelling framework is validated using two numerical examples. The model behavior and convergence are also well examined. It is shown that the presented stochastic modelling framework is a promising tool for the 3D data fusion in the communities of geological modelling and geophysics.
NASA Astrophysics Data System (ADS)
Ly, S.; Charles, C.; Degré, A.
2011-07-01
Spatial interpolation of precipitation data is of great importance for hydrological modelling. Geostatistical methods (kriging) are widely applied in spatial interpolation from point measurement to continuous surfaces. The first step in kriging computation is the semi-variogram modelling which usually used only one variogram model for all-moment data. The objective of this paper was to develop different algorithms of spatial interpolation for daily rainfall on 1 km2 regular grids in the catchment area and to compare the results of geostatistical and deterministic approaches. This study leaned on 30-yr daily rainfall data of 70 raingages in the hilly landscape of the Ourthe and Ambleve catchments in Belgium (2908 km2). This area lies between 35 and 693 m in elevation and consists of river networks, which are tributaries of the Meuse River. For geostatistical algorithms, seven semi-variogram models (logarithmic, power, exponential, Gaussian, rational quadratic, spherical and penta-spherical) were fitted to daily sample semi-variogram on a daily basis. These seven variogram models were also adopted to avoid negative interpolated rainfall. The elevation, extracted from a digital elevation model, was incorporated into multivariate geostatistics. Seven validation raingages and cross validation were used to compare the interpolation performance of these algorithms applied to different densities of raingages. We found that between the seven variogram models used, the Gaussian model was the most frequently best fit. Using seven variogram models can avoid negative daily rainfall in ordinary kriging. The negative estimates of kriging were observed for convective more than stratiform rain. The performance of the different methods varied slightly according to the density of raingages, particularly between 8 and 70 raingages but it was much different for interpolation using 4 raingages. Spatial interpolation with the geostatistical and Inverse Distance Weighting (IDW) algorithms
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Meyer, Swen; Blaschek, Michael; Duttmann, Rainer; Ludwig, Ralf
2016-02-01
According to current climate projections, Mediterranean countries are at high risk for an even pronounced susceptibility to changes in the hydrological budget and extremes. These changes are expected to have severe direct impacts on the management of water resources, agricultural productivity and drinking water supply. Current projections of future hydrological change, based on regional climate model results and subsequent hydrological modeling schemes, are very uncertain and poorly validated. The Rio Mannu di San Sperate Basin, located in Sardinia, Italy, is one test site of the CLIMB project. The Water Simulation Model (WaSiM) was set up to model current and future hydrological conditions. The availability of measured meteorological and hydrological data is poor as it is common for many Mediterranean catchments. In this study we conducted a soil sampling campaign in the Rio Mannu catchment. We tested different deterministic and hybrid geostatistical interpolation methods on soil textures and tested the performance of the applied models. We calculated a new soil texture map based on the best prediction method. The soil model in WaSiM was set up with the improved new soil information. The simulation results were compared to standard soil parametrization. WaSiMs was validated with spatial evapotranspiration rates using the triangle method (Jiang and Islam, 1999). WaSiM was driven with the meteorological forcing taken from 4 different ENSEMBLES climate projections for a reference (1971-2000) and a future (2041-2070) times series. The climate change impact was assessed based on differences between reference and future time series. The simulated results show a reduction of all hydrological quantities in the future in the spring season. Furthermore simulation results reveal an earlier onset of dry conditions in the catchment. We show that a solid soil model setup based on short-term field measurements can improve long-term modeling results, which is especially important
Bayesian regression model for seasonal forecast of precipitation over Korea
NASA Astrophysics Data System (ADS)
Jo, Seongil; Lim, Yaeji; Lee, Jaeyong; Kang, Hyun-Suk; Oh, Hee-Seok
2012-08-01
In this paper, we apply three different Bayesian methods to the seasonal forecasting of the precipitation in a region around Korea (32.5°N-42.5°N, 122.5°E-132.5°E). We focus on the precipitation of summer season (June-July-August; JJA) for the period of 1979-2007 using the precipitation produced by the Global Data Assimilation and Prediction System (GDAPS) as predictors. Through cross-validation, we demonstrate improvement for seasonal forecast of precipitation in terms of root mean squared error (RMSE) and linear error in probability space score (LEPS). The proposed methods yield RMSE of 1.09 and LEPS of 0.31 between the predicted and observed precipitations, while the prediction using GDAPS output only produces RMSE of 1.20 and LEPS of 0.33 for CPC Merged Analyzed Precipitation (CMAP) data. For station-measured precipitation data, the RMSE and LEPS of the proposed Bayesian methods are 0.53 and 0.29, while GDAPS output is 0.66 and 0.33, respectively. The methods seem to capture the spatial pattern of the observed precipitation. The Bayesian paradigm incorporates the model uncertainty as an integral part of modeling in a natural way. We provide a probabilistic forecast integrating model uncertainty.
Dissecting Magnetar Variability with Bayesian Hierarchical Models
NASA Astrophysics Data System (ADS)
Huppenkothen, Daniela; Brewer, Brendon J.; Hogg, David W.; Murray, Iain; Frean, Marcus; Elenbaas, Chris; Watts, Anna L.; Levin, Yuri; van der Horst, Alexander J.; Kouveliotou, Chryssa
2015-09-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
DISSECTING MAGNETAR VARIABILITY WITH BAYESIAN HIERARCHICAL MODELS
Huppenkothen, Daniela; Elenbaas, Chris; Watts, Anna L.; Horst, Alexander J. van der; Brewer, Brendon J.; Hogg, David W.; Murray, Iain; Frean, Marcus; Levin, Yuri; Kouveliotou, Chryssa
2015-09-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
Utilizing Gaussian Markov random field properties of Bayesian animal models.
Steinsland, Ingelin; Jensen, Henrik
2010-09-01
In this article, we demonstrate how Gaussian Markov random field properties give large computational benefits and new opportunities for the Bayesian animal model. We make inference by computing the posteriors for important quantitative genetic variables. For the single-trait animal model, a nonsampling-based approximation is presented. For the multitrait model, we set up a robust and fast Markov chain Monte Carlo algorithm. The proposed methodology was used to analyze quantitative genetic properties of morphological traits of a wild house sparrow population. Results for single- and multitrait models were compared.
A Bayesian Nonparametric Meta-Analysis Model
ERIC Educational Resources Information Center
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
Bayesian Local Contamination Models for Multivariate Outliers
Page, Garritt L.; Dunson, David B.
2013-01-01
In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outlying labs are present, we propose a local contamination model that is able to accommodate unusual multivariate realizations in a flexible way. The proposed method models the process level of a hierarchical model using a mixture with a parametric component and a possibly nonparametric contamination. Much of the flexibility in the methodology is achieved by allowing varying random subsets of the elements in the lab-specific mean vectors to be allocated to the contamination component. Computational methods are developed and the methodology is compared to three other possible approaches using a simulation study. We apply the proposed method to a NIST/NOAA sponsored inter-laboratory study which motivated the methodological development. PMID:24363465
Bayesian sensitivity analysis of bifurcating nonlinear models
NASA Astrophysics Data System (ADS)
Becker, W.; Worden, K.; Rowson, J.
2013-01-01
Sensitivity analysis allows one to investigate how changes in input parameters to a system affect the output. When computational expense is a concern, metamodels such as Gaussian processes can offer considerable computational savings over Monte Carlo methods, albeit at the expense of introducing a data modelling problem. In particular, Gaussian processes assume a smooth, non-bifurcating response surface. This work highlights a recent extension to Gaussian processes which uses a decision tree to partition the input space into homogeneous regions, and then fits separate Gaussian processes to each region. In this way, bifurcations can be modelled at region boundaries and different regions can have different covariance properties. To test this method, both the treed and standard methods were applied to the bifurcating response of a Duffing oscillator and a bifurcating FE model of a heart valve. It was found that the treed Gaussian process provides a practical way of performing uncertainty and sensitivity analysis on large, potentially-bifurcating models, which cannot be dealt with by using a single GP, although an open problem remains how to manage bifurcation boundaries that are not parallel to coordinate axes.
Bayesian inference and model comparison for metallic fatigue data
NASA Astrophysics Data System (ADS)
Babuška, Ivo; Sawlan, Zaid; Scavino, Marco; Szabó, Barna; Tempone, Raúl
2016-06-01
In this work, we present a statistical treatment of stress-life (S-N) data drawn from a collection of records of fatigue experiments that were performed on 75S-T6 aluminum alloys. Our main objective is to predict the fatigue life of materials by providing a systematic approach to model calibration, model selection and model ranking with reference to S-N data. To this purpose, we consider fatigue-limit models and random fatigue-limit models that are specially designed to allow the treatment of the run-outs (right-censored data). We first fit the models to the data by maximum likelihood methods and estimate the quantiles of the life distribution of the alloy specimen. To assess the robustness of the estimation of the quantile functions, we obtain bootstrap confidence bands by stratified resampling with respect to the cycle ratio. We then compare and rank the models by classical measures of fit based on information criteria. We also consider a Bayesian approach that provides, under the prior distribution of the model parameters selected by the user, their simulation-based posterior distributions. We implement and apply Bayesian model comparison methods, such as Bayes factor ranking and predictive information criteria based on cross-validation techniques under various a priori scenarios.
Accurate Model Selection of Relaxed Molecular Clocks in Bayesian Phylogenetics
Baele, Guy; Li, Wai Lok Sibon; Drummond, Alexei J.; Suchard, Marc A.; Lemey, Philippe
2013-01-01
Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike’s information criterion through Markov chain Monte Carlo (AICM), in Bayesian model selection of demographic and molecular clock models. Almost simultaneously, a Bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets. PMID:23090976
Bayesian Thurstonian models for ranking data using JAGS.
Johnson, Timothy R; Kuhn, Kristine M
2013-09-01
A Thurstonian model for ranking data assumes that observed rankings are consistent with those of a set of underlying continuous variables. This model is appealing since it renders ranking data amenable to familiar models for continuous response variables-namely, linear regression models. To date, however, the use of Thurstonian models for ranking data has been very rare in practice. One reason for this may be that inferences based on these models require specialized technical methods. These methods have been developed to address computational challenges involved in these models but are not easy to implement without considerable technical expertise and are not widely available in software packages. To address this limitation, we show that Bayesian Thurstonian models for ranking data can be very easily implemented with the JAGS software package. We provide JAGS model files for Thurstonian ranking models for general use, discuss their implementation, and illustrate their use in analyses. PMID:23539504
Predicting coastal cliff erosion using a Bayesian probabilistic model
Hapke, C.; Plant, N.
2010-01-01
Regional coastal cliff retreat is difficult to model due to the episodic nature of failures and the along-shore variability of retreat events. There is a growing demand, however, for predictive models that can be used to forecast areas vulnerable to coastal erosion hazards. Increasingly, probabilistic models are being employed that require data sets of high temporal density to define the joint probability density function that relates forcing variables (e.g. wave conditions) and initial conditions (e.g. cliff geometry) to erosion events. In this study we use a multi-parameter Bayesian network to investigate correlations between key variables that control and influence variations in cliff retreat processes. The network uses Bayesian statistical methods to estimate event probabilities using existing observations. Within this framework, we forecast the spatial distribution of cliff retreat along two stretches of cliffed coast in Southern California. The input parameters are the height and slope of the cliff, a descriptor of material strength based on the dominant cliff-forming lithology, and the long-term cliff erosion rate that represents prior behavior. The model is forced using predicted wave impact hours. Results demonstrate that the Bayesian approach is well-suited to the forward modeling of coastal cliff retreat, with the correct outcomes forecast in 70-90% of the modeled transects. The model also performs well in identifying specific locations of high cliff erosion, thus providing a foundation for hazard mapping. This approach can be employed to predict cliff erosion at time-scales ranging from storm events to the impacts of sea-level rise at the century-scale. ?? 2010.
Calibrating Subjective Probabilities Using Hierarchical Bayesian Models
NASA Astrophysics Data System (ADS)
Merkle, Edgar C.
A body of psychological research has examined the correspondence between a judge's subjective probability of an event's outcome and the event's actual outcome. The research generally shows that subjective probabilities are noisy and do not match the "true" probabilities. However, subjective probabilities are still useful for forecasting purposes if they bear some relationship to true probabilities. The purpose of the current research is to exploit relationships between subjective probabilities and outcomes to create improved, model-based probabilities for forecasting. Once the model has been trained in situations where the outcome is known, it can then be used in forecasting situations where the outcome is unknown. These concepts are demonstrated using experimental psychology data, and potential applications are discussed.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Jara, Alejandro; Hanson, Timothy E.; Quintana, Fernando A.; Müller, Peter; Rosner, Gary L.
2011-01-01
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in R, DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN. PMID:21796263
DPpackage: Bayesian Non- and Semi-parametric Modelling in R.
Jara, Alejandro; Hanson, Timothy E; Quintana, Fernando A; Müller, Peter; Rosner, Gary L
2011-04-01
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in R, DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN. PMID:21796263
Estimating anatomical trajectories with Bayesian mixed-effects modeling
Ziegler, G.; Penny, W.D.; Ridgway, G.R.; Ourselin, S.; Friston, K.J.
2015-01-01
We introduce a mass-univariate framework for the analysis of whole-brain structural trajectories using longitudinal Voxel-Based Morphometry data and Bayesian inference. Our approach to developmental and aging longitudinal studies characterizes heterogeneous structural growth/decline between and within groups. In particular, we propose a probabilistic generative model that parameterizes individual and ensemble average changes in brain structure using linear mixed-effects models of age and subject-specific covariates. Model inversion uses Expectation Maximization (EM), while voxelwise (empirical) priors on the size of individual differences are estimated from the data. Bayesian inference on individual and group trajectories is realized using Posterior Probability Maps (PPM). In addition to parameter inference, the framework affords comparisons of models with varying combinations of model order for fixed and random effects using model evidence. We validate the model in simulations and real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) project. We further demonstrate how subject specific characteristics contribute to individual differences in longitudinal volume changes in healthy subjects, Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD). PMID:26190405
Estimating anatomical trajectories with Bayesian mixed-effects modeling.
Ziegler, G; Penny, W D; Ridgway, G R; Ourselin, S; Friston, K J
2015-11-01
We introduce a mass-univariate framework for the analysis of whole-brain structural trajectories using longitudinal Voxel-Based Morphometry data and Bayesian inference. Our approach to developmental and aging longitudinal studies characterizes heterogeneous structural growth/decline between and within groups. In particular, we propose a probabilistic generative model that parameterizes individual and ensemble average changes in brain structure using linear mixed-effects models of age and subject-specific covariates. Model inversion uses Expectation Maximization (EM), while voxelwise (empirical) priors on the size of individual differences are estimated from the data. Bayesian inference on individual and group trajectories is realized using Posterior Probability Maps (PPM). In addition to parameter inference, the framework affords comparisons of models with varying combinations of model order for fixed and random effects using model evidence. We validate the model in simulations and real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) project. We further demonstrate how subject specific characteristics contribute to individual differences in longitudinal volume changes in healthy subjects, Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD).
Estimating anatomical trajectories with Bayesian mixed-effects modeling.
Ziegler, G; Penny, W D; Ridgway, G R; Ourselin, S; Friston, K J
2015-11-01
We introduce a mass-univariate framework for the analysis of whole-brain structural trajectories using longitudinal Voxel-Based Morphometry data and Bayesian inference. Our approach to developmental and aging longitudinal studies characterizes heterogeneous structural growth/decline between and within groups. In particular, we propose a probabilistic generative model that parameterizes individual and ensemble average changes in brain structure using linear mixed-effects models of age and subject-specific covariates. Model inversion uses Expectation Maximization (EM), while voxelwise (empirical) priors on the size of individual differences are estimated from the data. Bayesian inference on individual and group trajectories is realized using Posterior Probability Maps (PPM). In addition to parameter inference, the framework affords comparisons of models with varying combinations of model order for fixed and random effects using model evidence. We validate the model in simulations and real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) project. We further demonstrate how subject specific characteristics contribute to individual differences in longitudinal volume changes in healthy subjects, Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD). PMID:26190405
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.
Bayesian partial linear model for skewed longitudinal data.
Tang, Yuanyuan; Sinha, Debajyoti; Pati, Debdeep; Lipsitz, Stuart; Lipshultz, Steven
2015-07-01
Unlike majority of current statistical models and methods focusing on mean response for highly skewed longitudinal data, we present a novel model for such data accommodating a partially linear median regression function, a skewed error distribution and within subject association structures. We provide theoretical justifications for our methods including asymptotic properties of the posterior and associated semiparametric Bayesian estimators. We also provide simulation studies to investigate the finite sample properties of our methods. Several advantages of our method compared with existing methods are demonstrated via analysis of a cardiotoxicity study of children of HIV-infected mothers.
Bayesian hierarchical modeling for detecting safety signals in clinical trials.
Xia, H Amy; Ma, Haijun; Carlin, Bradley P
2011-09-01
Detection of safety signals from clinical trial adverse event data is critical in drug development, but carries a challenging statistical multiplicity problem. Bayesian hierarchical mixture modeling is appealing for its ability to borrow strength across subgroups in the data, as well as moderate extreme findings most likely due merely to chance. We implement such a model for subject incidence (Berry and Berry, 2004 ) using a binomial likelihood, and extend it to subject-year adjusted incidence rate estimation under a Poisson likelihood. We use simulation to choose a signal detection threshold, and illustrate some effective graphics for displaying the flagged signals.
NASA Astrophysics Data System (ADS)
Tadeu Pereira, Gener; Ribeiro de Oliveira, Ismênia; De Bortoli Teixeira, Daniel; Arantes Camargo, Livia; Rodrigo Panosso, Alan; Marques, José, Jr.
2015-04-01
Phosphorus is one of the limiting nutrients for sugarcane development in Brazilian soils. The spatial variability of this nutrient is great, defined by the properties that control its adsorption and desorption reactions. Spatial estimates to characterize this variability are based on geostatistical interpolation. Thus, the assessment of the uncertainty of estimates associated with the spatial distribution of available P (Plabile) is decisive to optimize the use of phosphate fertilizers. The purpose of this study was to evaluate the performance of sequential Gaussian simulation (sGs) and ordinary kriging (OK) in the modeling of uncertainty in available P estimates. A sampling grid with 626 points was established in a 200-ha experimental sugarcane field in Tabapuã, São Paulo State, Brazil. The soil was sampled in the crossover points of a regular grid with intervals of 50 m. From the observations, 63 points, approximately 10% of sampled points were randomly selected before the geostatistical modeling of the composition of a data set used in the validation process modeling, while the remaining 563 points were used for the predictions variable in a place not sampled. The sGs generated 200 realizations. From the realizations generated, different measures of estimation and uncertainty were obtained. The standard deviation, calculated point to point, all simulated maps provided the map of deviation, used to assess local uncertainty. The visual analysis of maps of the E-type and KO showed that the spatial patterns produced by both methods were similar, however, it was possible to observe the characteristic smoothing effect of the KO especially in regions with extreme values. The Standardized variograms of selected realizations sGs showed both range and model similar to the variogram of the Observed date of Plabile. The variogram KO showed a distinct structure of the observed data, underestimating the variability over short distances, presenting parabolic behavior near
Brooker, Simon; Clements, Archie C.A.
2009-01-01
Multiple parasite infections are widespread in the developing world and understanding their geographical distribution is important for spatial targeting of differing intervention packages. We investigated the spatial epidemiology of mono- and co-infection with helminth parasites in East Africa and developed a geostatistical model to predict infection risk. The data used for the analysis were taken from standardised school surveys of Schistosoma mansoni and hookworm (Ancylostoma duodenale/Necator americanus) carried out between 1999 and 2005 in East Africa. Prevalence of mono- and co-infection was modelled using satellite-derived environmental and demographic variables as potential predictors. A Bayesian multi-nominal geostatistical model was developed for each infection category for producing maps of predicted co-infection risk. We show that heterogeneities in co-infection with S. mansoni and hookworm are influenced primarily by the distribution of S. mansoni, rather than the distribution of hookworm, and that temperature, elevation and distance to large water bodies are reliable predictors of the spatial large-scale distribution of co-infection. On the basis of these results, we developed a validated geostatistical model of the distribution of co-infection at a scale that is relevant for planning regional disease control efforts that simultaneously target multiple parasite species. PMID:19073189
A study of finite mixture model: Bayesian approach on financial time series data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
Predictive RANS simulations via Bayesian Model-Scenario Averaging
Edeling, W.N.; Cinnella, P.; Dwight, R.P.
2014-10-15
The turbulence closure model is the dominant source of error in most Reynolds-Averaged Navier–Stokes simulations, yet no reliable estimators for this error component currently exist. Here we develop a stochastic, a posteriori error estimate, calibrated to specific classes of flow. It is based on variability in model closure coefficients across multiple flow scenarios, for multiple closure models. The variability is estimated using Bayesian calibration against experimental data for each scenario, and Bayesian Model-Scenario Averaging (BMSA) is used to collate the resulting posteriors, to obtain a stochastic estimate of a Quantity of Interest (QoI) in an unmeasured (prediction) scenario. The scenario probabilities in BMSA are chosen using a sensor which automatically weights those scenarios in the calibration set which are similar to the prediction scenario. The methodology is applied to the class of turbulent boundary-layers subject to various pressure gradients. For all considered prediction scenarios the standard-deviation of the stochastic estimate is consistent with the measurement ground truth. Furthermore, the mean of the estimate is more consistently accurate than the individual model predictions.
Quantum-Like Bayesian Networks for Modeling Decision Making
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios. PMID:26858669
Quantum-Like Bayesian Networks for Modeling Decision Making.
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios.
Assessing uncertainty in a stand growth model by Bayesian synthesis
Green, E.J.; MacFarlane, D.W.; Valentine, H.T.; Strawderman, W.E.
1999-11-01
The Bayesian synthesis method (BSYN) was used to bound the uncertainty in projections calculated with PIPESTEM, a mechanistic model of forest growth. The application furnished posterior distributions of (a) the values of the model's parameters, and (b) the values of three of the model's output variables--basal area per unit land area, average tree height, and tree density--at different points in time. Confidence or credible intervals for the output variables were obtained directly from the posterior distributions. The application also provides estimates of correlation among the parameters and output variables. BSYN, which originally was applied to a population dynamics model for bowhead whales, is generally applicable to deterministic models. Extension to two or more linked models is discussed. A simple worked example is included in an appendix.
Assessing global vegetation activity using spatio-temporal Bayesian modelling
NASA Astrophysics Data System (ADS)
Mulder, Vera L.; van Eck, Christel M.; Friedlingstein, Pierre; Regnier, Pierre A. G.
2016-04-01
This work demonstrates the potential of modelling vegetation activity using a hierarchical Bayesian spatio-temporal model. This approach allows modelling changes in vegetation and climate simultaneous in space and time. Changes of vegetation activity such as phenology are modelled as a dynamic process depending on climate variability in both space and time. Additionally, differences in observed vegetation status can be contributed to other abiotic ecosystem properties, e.g. soil and terrain properties. Although these properties do not change in time, they do change in space and may provide valuable information in addition to the climate dynamics. The spatio-temporal Bayesian models were calibrated at a regional scale because the local trends in space and time can be better captured by the model. The regional subsets were defined according to the SREX segmentation, as defined by the IPCC. Each region is considered being relatively homogeneous in terms of large-scale climate and biomes, still capturing small-scale (grid-cell level) variability. Modelling within these regions is hence expected to be less uncertain due to the absence of these large-scale patterns, compared to a global approach. This overall modelling approach allows the comparison of model behavior for the different regions and may provide insights on the main dynamic processes driving the interaction between vegetation and climate within different regions. The data employed in this study encompasses the global datasets for soil properties (SoilGrids), terrain properties (Global Relief Model based on SRTM DEM and ETOPO), monthly time series of satellite-derived vegetation indices (GIMMS NDVI3g) and climate variables (Princeton Meteorological Forcing Dataset). The findings proved the potential of a spatio-temporal Bayesian modelling approach for assessing vegetation dynamics, at a regional scale. The observed interrelationships of the employed data and the different spatial and temporal trends support
A Bayesian approach to biokinetic models of internally- deposited radionuclides
NASA Astrophysics Data System (ADS)
Amer, Mamun F.
Bayesian methods were developed and applied to estimate parameters of biokinetic models of internally deposited radionuclides for the first time. Marginal posterior densities for the parameters, given the available data, were obtained and graphed. These densities contain all the information available about the parameters and fully describe their uncertainties. Two different numerical integration methods were employed to approximate the multi-dimensional integrals needed to obtain these densities and to verify our results. One numerical method was based on Gaussian quadrature. The other method was a lattice rule that was developed by Conroy. The lattice rule method is applied here for the first time in conjunction with Bayesian analysis. Computer codes were developed in Mathematica's own programming language to perform the integrals. Several biokinetic models were studied. The first model was a single power function, a/ t-b that was used to describe 226Ra whole body retention data for long periods of time in many patients. The posterior odds criterion for model identification was applied to select, from among some competing models, the best model to represent 226Ra retention in man. The highest model posterior was attained by the single power function. Posterior densities for the model parameters were obtained for each patient. Also, predictive densities for retention, given the available retention values and some selected times, were obtained. These predictive densities characterize the uncertainties in the unobservable retention values taking into consideration the uncertainties of other parameters in the model. The second model was a single exponential function, α e-/beta t, that was used to represent one patient's whole body retention as well as total excretion of 137Cs. Missing observations (censored data) in the two responses were replaced by unknown parameters and were handled in the same way other model parameters are treated. By applying the Bayesian
Approximate Bayesian computation for forward modeling in cosmology
NASA Astrophysics Data System (ADS)
Akeret, Joël; Refregier, Alexandre; Amara, Adam; Seehars, Sebastian; Hasner, Caspar
2015-08-01
Bayesian inference is often used in cosmology and astrophysics to derive constraints on model parameters from observations. This approach relies on the ability to compute the likelihood of the data given a choice of model parameters. In many practical situations, the likelihood function may however be unavailable or intractable due to non-gaussian errors, non-linear measurements processes, or complex data formats such as catalogs and maps. In these cases, the simulation of mock data sets can often be made through forward modeling. We discuss how Approximate Bayesian Computation (ABC) can be used in these cases to derive an approximation to the posterior constraints using simulated data sets. This technique relies on the sampling of the parameter set, a distance metric to quantify the difference between the observation and the simulations and summary statistics to compress the information in the data. We first review the principles of ABC and discuss its implementation using a Population Monte-Carlo (PMC) algorithm and the Mahalanobis distance metric. We test the performance of the implementation using a Gaussian toy model. We then apply the ABC technique to the practical case of the calibration of image simulations for wide field cosmological surveys. We find that the ABC analysis is able to provide reliable parameter constraints for this problem and is therefore a promising technique for other applications in cosmology and astrophysics. Our implementation of the ABC PMC method is made available via a public code release.
Bayesian Gaussian Copula Factor Models for Mixed Data
Murray, Jared S.; Dunson, David B.; Carin, Lawrence; Lucas, Joseph E.
2013-01-01
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.1 PMID:23990691
Bayesian Gaussian Copula Factor Models for Mixed Data.
Murray, Jared S; Dunson, David B; Carin, Lawrence; Lucas, Joseph E
2013-06-01
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.
Bayesian Models for fMRI Data Analysis
Zhang, Linlin; Guindani, Michele; Vannucci, Marina
2015-01-01
Functional magnetic resonance imaging (fMRI), a noninvasive neuroimaging method that provides an indirect measure of neuronal activity by detecting blood flow changes, has experienced an explosive growth in the past years. Statistical methods play a crucial role in understanding and analyzing fMRI data. Bayesian approaches, in particular, have shown great promise in applications. A remarkable feature of fully Bayesian approaches is that they allow a flexible modeling of spatial and temporal correlations in the data. This paper provides a review of the most relevant models developed in recent years. We divide methods according to the objective of the analysis. We start from spatio-temporal models for fMRI data that detect task-related activation patterns. We then address the very important problem of estimating brain connectivity. We also touch upon methods that focus on making predictions of an individual's brain activity or a clinical or behavioral response. We conclude with a discussion of recent integrative models that aim at combining fMRI data with other imaging modalities, such as EEG/MEG and DTI data, measured on the same subjects. We also briefly discuss the emerging field of imaging genetics. PMID:25750690
Bayesian Sensitivity Analysis of Statistical Models with Missing Data
ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG
2013-01-01
Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718
ERIC Educational Resources Information Center
Wu, Haiyan
2013-01-01
General diagnostic models (GDMs) and Bayesian networks are mathematical frameworks that cover a wide variety of psychometric models. Both extend latent class models, and while GDMs also extend item response theory (IRT) models, Bayesian networks can be parameterized using discretized IRT. The purpose of this study is to examine similarities and…
Model Selection in Historical Research Using Approximate Bayesian Computation
Rubio-Campillo, Xavier
2016-01-01
Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. Case Study This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Impact Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. PMID:26730953
Uncovering Transcriptional Regulatory Networks by Sparse Bayesian Factor Model
NASA Astrophysics Data System (ADS)
Meng, Jia; Zhang, Jianqiu(Michelle); Qi, Yuan(Alan); Chen, Yidong; Huang, Yufei
2010-12-01
The problem of uncovering transcriptional regulation by transcription factors (TFs) based on microarray data is considered. A novel Bayesian sparse correlated rectified factor model (BSCRFM) is proposed that models the unknown TF protein level activity, the correlated regulations between TFs, and the sparse nature of TF-regulated genes. The model admits prior knowledge from existing database regarding TF-regulated target genes based on a sparse prior and through a developed Gibbs sampling algorithm, a context-specific transcriptional regulatory network specific to the experimental condition of the microarray data can be obtained. The proposed model and the Gibbs sampling algorithm were evaluated on the simulated systems, and results demonstrated the validity and effectiveness of the proposed approach. The proposed model was then applied to the breast cancer microarray data of patients with Estrogen Receptor positive ([InlineEquation not available: see fulltext.]) status and Estrogen Receptor negative ([InlineEquation not available: see fulltext.]) status, respectively.
Bayesian Inference for Generalized Linear Models for Spiking Neurons
Gerwinn, Sebastian; Macke, Jakob H.; Bethge, Matthias
2010-01-01
Generalized Linear Models (GLMs) are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate. PMID:20577627
GY SAMPLING THEORY AND GEOSTATISTICS: ALTERNATE MODELS OF VARIABILITY IN CONTINUOUS MEDIA
In the sampling theory developed by Pierre Gy, sample variability is modeled as the sum of a set of seven discrete error components. The variogram used in geostatisties provides an alternate model in which several of Gy's error components are combined in a continuous mode...
A kinematic model for Bayesian tracking of cyclic human motion
NASA Astrophysics Data System (ADS)
Greif, Thomas; Lienhart, Rainer
2010-01-01
We introduce a two-dimensional kinematic model for cyclic motions of humans, which is suitable for the use as temporal prior in any Bayesian tracking framework. This human motion model is solely based on simple kinematic properties: the joint accelerations. Distributions of joint accelerations subject to the cycle progress are learned from training data. We present results obtained by applying the introduced model to the cyclic motion of backstroke swimming in a Kalman filter framework that represents the posterior distribution by a Gaussian. We experimentally evaluate the sensitivity of the motion model with respect to the frequency and noise level of assumed appearance-based pose measurements by simulating various fidelities of the pose measurements using ground truth data.
Bayesian inference with an adaptive proposal density for GARCH models
NASA Astrophysics Data System (ADS)
Takaishi, Tetsuya
2010-04-01
We perform the Bayesian inference of a GARCH model by the Metropolis-Hastings algorithm with an adaptive proposal density. The adaptive proposal density is assumed to be the Student's t-distribution and the distribution parameters are evaluated by using the data sampled during the simulation. We apply the method for the QGARCH model which is one of asymmetric GARCH models and make empirical studies for Nikkei 225, DAX and Hang indexes. We find that autocorrelation times from our method are very small, thus the method is very efficient for generating uncorrelated Monte Carlo data. The results from the QGARCH model show that all the three indexes show the leverage effect, i.e. the volatility is high after negative observations.
Bayesian Dose-Response Modeling in Sparse Data
NASA Astrophysics Data System (ADS)
Kim, Steven B.
This book discusses Bayesian dose-response modeling in small samples applied to two different settings. The first setting is early phase clinical trials, and the second setting is toxicology studies in cancer risk assessment. In early phase clinical trials, experimental units are humans who are actual patients. Prior to a clinical trial, opinions from multiple subject area experts are generally more informative than the opinion of a single expert, but we may face a dilemma when they have disagreeing prior opinions. In this regard, we consider compromising the disagreement and compare two different approaches for making a decision. In addition to combining multiple opinions, we also address balancing two levels of ethics in early phase clinical trials. The first level is individual-level ethics which reflects the perspective of trial participants. The second level is population-level ethics which reflects the perspective of future patients. We extensively compare two existing statistical methods which focus on each perspective and propose a new method which balances the two conflicting perspectives. In toxicology studies, experimental units are living animals. Here we focus on a potential non-monotonic dose-response relationship which is known as hormesis. Briefly, hormesis is a phenomenon which can be characterized by a beneficial effect at low doses and a harmful effect at high doses. In cancer risk assessments, the estimation of a parameter, which is known as a benchmark dose, can be highly sensitive to a class of assumptions, monotonicity or hormesis. In this regard, we propose a robust approach which considers both monotonicity and hormesis as a possibility. In addition, We discuss statistical hypothesis testing for hormesis and consider various experimental designs for detecting hormesis based on Bayesian decision theory. Past experiments have not been optimally designed for testing for hormesis, and some Bayesian optimal designs may not be optimal under a
Parameter Estimation and Parameterization Uncertainty Using Bayesian Model Averaging
NASA Astrophysics Data System (ADS)
Tsai, F. T.; Li, X.
2007-12-01
This study proposes Bayesian model averaging (BMA) to address parameter estimation uncertainty arisen from non-uniqueness in parameterization methods. BMA provides a means of incorporating multiple parameterization methods for prediction through the law of total probability, with which an ensemble average of hydraulic conductivity distribution is obtained. Estimation uncertainty is described by the BMA variances, which contain variances within and between parameterization methods. BMA shows the facts that considering more parameterization methods tends to increase estimation uncertainty and estimation uncertainty is always underestimated using a single parameterization method. Two major problems in applying BMA to hydraulic conductivity estimation using a groundwater inverse method will be discussed in the study. The first problem is the use of posterior probabilities in BMA, which tends to single out one best method and discard other good methods. This problem arises from Occam's window that only accepts models in a very narrow range. We propose a variance window to replace Occam's window to cope with this problem. The second problem is the use of Kashyap information criterion (KIC), which makes BMA tend to prefer high uncertain parameterization methods due to considering the Fisher information matrix. We found that Bayesian information criterion (BIC) is a good approximation to KIC and is able to avoid controversial results. We applied BMA to hydraulic conductivity estimation in the 1,500-foot sand aquifer in East Baton Rouge Parish, Louisiana.
Advanced REACH Tool: a Bayesian model for occupational exposure assessment.
McNally, Kevin; Warren, Nicholas; Fransman, Wouter; Entink, Rinke Klein; Schinkel, Jody; van Tongeren, Martie; Cherrie, John W; Kromhout, Hans; Schneider, Thomas; Tielemans, Erik
2014-06-01
This paper describes a Bayesian model for the assessment of inhalation exposures in an occupational setting; the methodology underpins a freely available web-based application for exposure assessment, the Advanced REACH Tool (ART). The ART is a higher tier exposure tool that combines disparate sources of information within a Bayesian statistical framework. The information is obtained from expert knowledge expressed in a calibrated mechanistic model of exposure assessment, data on inter- and intra-individual variability in exposures from the literature, and context-specific exposure measurements. The ART provides central estimates and credible intervals for different percentiles of the exposure distribution, for full-shift and long-term average exposures. The ART can produce exposure estimates in the absence of measurements, but the precision of the estimates improves as more data become available. The methodology presented in this paper is able to utilize partially analogous data, a novel approach designed to make efficient use of a sparsely populated measurement database although some additional research is still required before practical implementation. The methodology is demonstrated using two worked examples: an exposure to copper pyrithione in the spraying of antifouling paints and an exposure to ethyl acetate in shoe repair. PMID:24665110
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
NASA Astrophysics Data System (ADS)
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
NASA Astrophysics Data System (ADS)
Mendes, B. S.; Draper, D.
2008-12-01
The issue of model uncertainty and model choice is central in any groundwater modeling effort [Neuman and Wierenga, 2003]; among the several approaches to the problem we favour using Bayesian statistics because it is a method that integrates in a natural way uncertainties (arising from any source) and experimental data. In this work, we experiment with several Bayesian approaches to model choice, focusing primarily on demonstrating the usefulness of the Reversible Jump Markov Chain Monte Carlo (RJMCMC) simulation method [Green, 1995]; this is an extension of the now- common MCMC methods. Standard MCMC techniques approximate posterior distributions for quantities of interest, often by creating a random walk in parameter space; RJMCMC allows the random walk to take place between parameter spaces with different dimensionalities. This fact allows us to explore state spaces that are associated with different deterministic models for experimental data. Our work is exploratory in nature; we restrict our study to comparing two simple transport models applied to a data set gathered to estimate the breakthrough curve for a tracer compound in groundwater. One model has a mean surface based on a simple advection dispersion differential equation; the second model's mean surface is also governed by a differential equation but in two dimensions. We focus on artificial data sets (in which truth is known) to see if model identification is done correctly, but we also address the issues of over and under-paramerization, and we compare RJMCMC's performance with other traditional methods for model selection and propagation of model uncertainty, including Bayesian model averaging, BIC and DIC.References Neuman and Wierenga (2003). A Comprehensive Strategy of Hydrogeologic Modeling and Uncertainty Analysis for Nuclear Facilities and Sites. NUREG/CR-6805, Division of Systems Analysis and Regulatory Effectiveness Office of Nuclear Regulatory Research, U. S. Nuclear Regulatory Commission
Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model
Bitzer, Sebastian; Park, Hame; Blankenburg, Felix; Kiebel, Stefan J.
2014-01-01
Behavioral data obtained with perceptual decision making experiments are typically analyzed with the drift-diffusion model. This parsimonious model accumulates noisy pieces of evidence toward a decision bound to explain the accuracy and reaction times of subjects. Recently, Bayesian models have been proposed to explain how the brain extracts information from noisy input as typically presented in perceptual decision making tasks. It has long been known that the drift-diffusion model is tightly linked with such functional Bayesian models but the precise relationship of the two mechanisms was never made explicit. Using a Bayesian model, we derived the equations which relate parameter values between these models. In practice we show that this equivalence is useful when fitting multi-subject data. We further show that the Bayesian model suggests different decision variables which all predict equal responses and discuss how these may be discriminated based on neural correlates of accumulated evidence. In addition, we discuss extensions to the Bayesian model which would be difficult to derive for the drift-diffusion model. We suggest that these and other extensions may be highly useful for deriving new experiments which test novel hypotheses. PMID:24616689
Modeling the Climatology of Tornado Occurrence with Bayesian Inference
NASA Astrophysics Data System (ADS)
Cheng, Vincent Y. S.
Our mechanistic understanding of tornadic environments has significantly improved by the recent technological enhancements in the detection of tornadoes as well as the advances of numerical weather predictive modeling. Nonetheless, despite the decades of active research, prediction of tornado occurrence remains one of the most difficult problems in meteorological and climate science. In our efforts to develop predictive tools for tornado occurrence, there are a number of issues to overcome, such as the treatment of inconsistent tornado records, the consideration of suitable combination of atmospheric predictors, and the selection of appropriate resolution to accommodate the variability in time and space. In this dissertation, I address each of these topics by undertaking three empirical (statistical) modeling studies, where I examine the signature of different atmospheric factors influencing the tornado occurrence, the sampling biases in tornado observations, and the optimal spatiotemporal resolution for studying tornado occurrence. In the first study, I develop a novel Bayesian statistical framework to assess the probability of tornado occurrence in Canada, in which the sampling bias of tornado observations and the linkage between lightning climatology and tornadogenesis are considered. The results produced reasonable probability estimates of tornado occurrence for the under-sampled areas in the model domain. The same study also delineated the geographical variability in the lightning-tornado relationship across Canada. In the second study, I present a novel modeling framework to examine the relative importance of several key atmospheric variables (e.g., convective available potential energy, 0-3 km storm-relative helicity, 0-6 km bulk wind difference, 0-tropopause vertical wind shear) on tornado activity in North America. I found that the variable quantifying the updraft strength is more important during the warm season, whereas the effects of wind
Lemouzy, P.
1997-08-01
In field delineation phase, uncertainty in hydrocarbon reservoir descriptions is large. To quickly examine the impact of this uncertainty on production performance, it is necessary to evaluate a large number of descriptions in relation to possible production methods (well spacing, injection rate, etc.). The method of using coarse upscaled models was first proposed by Ballin. Unlike other methods (connectivity analysis, tracer simulations), it considers parameters such as PVT, well management, etc. After a detailed review of upscaling issues, applications to water-injection cases (either with balance or imbalance of production, with or without aquifer) and to depletion of an oil reservoir with aquifer coning are presented. Much more important than the method of permeability upscaling far from wells, the need of correct upscaling of numerical well representation is pointed out Methods are proposed to accurately represent fluids volumes in coarse models. Simple methods to upscale relative permeabilities, and methods to efficiently correct numerical dispersion are proposed. Good results are obtained for water injection. The coarse upscaling method allows the performance of sensitivity analyses on model parameters at a much lower CPU cost than comprehensive simulations. Models representing extreme behaviors can be easily distinguished. For depletion of an oil reservoir showing aquifer coning, however, the method did not work property. It is our opinion that further research is required for upscaling close to wells. We therefore recombined this method for practical use in the case of water injection.
Aggregated Residential Load Modeling Using Dynamic Bayesian Networks
Vlachopoulou, Maria; Chin, George; Fuller, Jason C.; Lu, Shuai
2014-09-28
Abstract—It is already obvious that the future power grid will have to address higher demand for power and energy, and to incorporate renewable resources of different energy generation patterns. Demand response (DR) schemes could successfully be used to manage and balance power supply and demand under operating conditions of the future power grid. To achieve that, more advanced tools for DR management of operations and planning are necessary that can estimate the available capacity from DR resources. In this research, a Dynamic Bayesian Network (DBN) is derived, trained, and tested that can model aggregated load of Heating, Ventilation, and Air Conditioning (HVAC) systems. DBNs can provide flexible and powerful tools for both operations and planing, due to their unique analytical capabilities. The DBN model accuracy and flexibility of use is demonstrated by testing the model under different operational scenarios.
Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess
NASA Astrophysics Data System (ADS)
Haworth, Guy; Regan, Ken; di Fatta, Giuseppe
Evaluating agents in decision-making applications requires assessing their skill and predicting their behaviour. Both are well developed in Poker-like situations, but less so in more complex game and model domains. This paper addresses both tasks by using Bayesian inference in a benchmark space of reference agents. The concepts are explained and demonstrated using the game of chess but the model applies generically to any domain with quantifiable options and fallible choice. Demonstration applications address questions frequently asked by the chess community regarding the stability of the rating scale, the comparison of players of different eras and/or leagues, and controversial incidents possibly involving fraud. The last include alleged under-performance, fabrication of tournament results, and clandestine use of computer advice during competition. Beyond the model world of games, the aim is to improve fallible human performance in complex, high-value tasks.
Development of a Bayesian Belief Network Runway Incursion Model
NASA Technical Reports Server (NTRS)
Green, Lawrence L.
2014-01-01
In a previous paper, a statistical analysis of runway incursion (RI) events was conducted to ascertain their relevance to the top ten Technical Challenges (TC) of the National Aeronautics and Space Administration (NASA) Aviation Safety Program (AvSP). The study revealed connections to perhaps several of the AvSP top ten TC. That data also identified several primary causes and contributing factors for RI events that served as the basis for developing a system-level Bayesian Belief Network (BBN) model for RI events. The system-level BBN model will allow NASA to generically model the causes of RI events and to assess the effectiveness of technology products being developed under NASA funding. These products are intended to reduce the frequency of RI events in particular, and to improve runway safety in general. The development, structure and assessment of that BBN for RI events by a Subject Matter Expert panel are documented in this paper.
Advances in Bayesian Model Based Clustering Using Particle Learning
Merl, D M
2009-11-19
Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original
NASA Astrophysics Data System (ADS)
Panagos, Panos; Ballabio, Cristiano; Borrelli, Pasquale; Meusburger, Katrin; Alewell, Christine
2015-04-01
Rainfall erosivity (R-factor) is among the 6 input factors in estimating soil erosion risk by using the empirical Revised Universal Soil Loss Equation (RUSLE). R-factor is a driving force for soil erosion modelling and potentially can be used in flood risk assessments, landslides susceptibility, post-fire damage assessment, application of agricultural management practices and climate change modelling. The rainfall erosivity is extremely difficult to model at large scale (national, European) due to lack of high temporal resolution precipitation data which cover long-time series. In most cases, R-factor is estimated based on empirical equations which take into account precipitation volume. The Rainfall Erosivity Database on the European Scale (REDES) is the output of an extensive data collection of high resolution precipitation data in the 28 Member States of the European Union plus Switzerland taking place during 2013-2014 in collaboration with national meteorological/environmental services. Due to different temporal resolutions of the data (5, 10, 15, 30, 60 minutes), conversion equations have been applied in order to homogenise the database at 30-minutes interval. The 1,541 stations included in REDES have been interpolated using the Gaussian Process Regression (GPR) model using as covariates the climatic data (monthly precipitation, monthly temperature, wettest/driest month) from WorldClim Database, Digital Elevation Model and latitude/longitude. GPR has been selected among other candidate models (GAM, Regression Kriging) due the best performance both in cross validation (R2=0.63) and in fitting dataset (R2=0.72). The highest uncertainty has been noticed in North-western Scotland, North Sweden and Finland due to limited number of stations in REDES. Also, in highlands such as Alpine arch and Pyrenees the diversity of environmental features forced relatively high uncertainty. The rainfall erosivity map of Europe available at 500m resolution plus the standard error
Geostatistical modeling of riparian forest microclimate and its implications for sampling
Eskelson, B.N.I.; Anderson, P.D.; Hagar, J.C.; Temesgen, H.
2011-01-01
Predictive models of microclimate under various site conditions in forested headwater stream - riparian areas are poorly developed, and sampling designs for characterizing underlying riparian microclimate gradients are sparse. We used riparian microclimate data collected at eight headwater streams in the Oregon Coast Range to compare ordinary kriging (OK), universal kriging (UK), and kriging with external drift (KED) for point prediction of mean maximum air temperature (Tair). Several topographic and forest structure characteristics were considered as site-specific parameters. Height above stream and distance to stream were the most important covariates in the KED models, which outperformed OK and UK in terms of root mean square error. Sample patterns were optimized based on the kriging variance and the weighted means of shortest distance criterion using the simulated annealing algorithm. The optimized sample patterns outperformed systematic sample patterns in terms of mean kriging variance mainly for small sample sizes. These findings suggest methods for increasing efficiency of microclimate monitoring in riparian areas.
Welhan, John A.; Farabaugh, Renee L.; Merrick, Melissa J.; Anderson, Steven R.
2007-01-01
The spatial distribution of sediment in the eastern Snake River Plain aquifer was evaluated and modeled to improve the parameterization of hydraulic conductivity (K) for a subregional-scale ground-water flow model being developed by the U.S. Geological Survey. The aquifer is hosted within a layered series of permeable basalts within which intercalated beds of fine-grained sediment constitute local confining units. These sediments have K values as much as six orders of magnitude lower than the most permeable basalt, and previous flow-model calibrations have shown that hydraulic conductivity is sensitive to the proportion of intercalated sediment. Stratigraphic data in the form of sediment thicknesses from 333 boreholes in and around the Idaho National Laboratory were evaluated as grouped subsets of lithologic units (composite units) corresponding to their relative time-stratigraphic position. The results indicate that median sediment abundances of the stratigraphic units below the water table are statistically invariant (stationary) in a spatial sense and provide evidence of stationarity across geologic time, as well. Based on these results, the borehole data were kriged as two-dimensional spatial data sets representing the sediment content of the layers that discretize the ground-water flow model in the uppermost 300 feet of the aquifer. Multiple indicator kriging (mIK) was used to model the geographic distribution of median sediment abundance within each layer by defining the local cumulative frequency distribution (CFD) of sediment via indicator variograms defined at multiple thresholds. The mIK approach is superior to ordinary kriging because it provides a statistically best estimate of sediment abundance (the local median) drawn from the distribution of local borehole data, independent of any assumption of normality. A methodology is proposed for delineating and constraining the assignment of hydraulic conductivity zones for parameter estimation, based on the
Road network safety evaluation using Bayesian hierarchical joint model.
Wang, Jie; Huang, Helai
2016-05-01
Safety and efficiency are commonly regarded as two significant performance indicators of transportation systems. In practice, road network planning has focused on road capacity and transport efficiency whereas the safety level of a road network has received little attention in the planning stage. This study develops a Bayesian hierarchical joint model for road network safety evaluation to help planners take traffic safety into account when planning a road network. The proposed model establishes relationships between road network risk and micro-level variables related to road entities and traffic volume, as well as socioeconomic, trip generation and network density variables at macro level which are generally used for long term transportation plans. In addition, network spatial correlation between intersections and their connected road segments is also considered in the model. A road network is elaborately selected in order to compare the proposed hierarchical joint model with a previous joint model and a negative binomial model. According to the results of the model comparison, the hierarchical joint model outperforms the joint model and negative binomial model in terms of the goodness-of-fit and predictive performance, which indicates the reasonableness of considering the hierarchical data structure in crash prediction and analysis. Moreover, both random effects at the TAZ level and the spatial correlation between intersections and their adjacent segments are found to be significant, supporting the employment of the hierarchical joint model as an alternative in road-network-level safety modeling as well.
Integrated Bayesian network framework for modeling complex ecological issues.
Johnson, Sandra; Mengersen, Kerrie
2012-07-01
The management of environmental problems is multifaceted, requiring varied and sometimes conflicting objectives and perspectives to be considered. Bayesian network (BN) modeling facilitates the integration of information from diverse sources and is well suited to tackling the management challenges of complex environmental problems. However, combining several perspectives in one model can lead to large, unwieldy BNs that are difficult to maintain and understand. Conversely, an oversimplified model may lead to an unrealistic representation of the environmental problem. Environmental managers require the current research and available knowledge about an environmental problem of interest to be consolidated in a meaningful way, thereby enabling the assessment of potential impacts and different courses of action. Previous investigations of the environmental problem of interest may have already resulted in the construction of several disparate ecological models. On the other hand, the opportunity may exist to initiate this modeling. In the first instance, the challenge is to integrate existing models and to merge the information and perspectives from these models. In the second instance, the challenge is to include different aspects of the environmental problem incorporating both the scientific and management requirements. Although the paths leading to the combined model may differ for these 2 situations, the common objective is to design an integrated model that captures the available information and research, yet is simple to maintain, expand, and refine. BN modeling is typically an iterative process, and we describe a heuristic method, the iterative Bayesian network development cycle (IBNDC), for the development of integrated BN models that are suitable for both situations outlined above. The IBNDC approach facilitates object-oriented BN (OOBN) modeling, arguably viewed as the next logical step in adaptive management modeling, and that embraces iterative development
A Semiparametric Bayesian Model for Detecting Synchrony Among Multiple Neurons
Shahbaba, Babak; Zhou, Bo; Lan, Shiwei; Ombao, Hernando; Moorman, David; Behseta, Sam
2015-01-01
We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their co-firing (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1’s (spike) and 0’s (silence) for each neuron is modeled using the logistic function of a continuous latent variable with a Gaussian process prior. For multiple neurons, the corresponding marginal distributions are coupled to their joint probability distribution using a parametric copula model. The advantages of our approach are as follows: the nonparametric component (i.e., the Gaussian process model) provides a flexible framework for modeling the underlying firing rates; the parametric component (i.e., the copula model) allows us to make inference regarding both contemporaneous and lagged relationships among neurons; using the copula model, we construct multivariate probabilistic models by separating the modeling of univariate marginal distributions from the modeling of dependence structure among variables; our method is easy to implement using a computationally efficient sampling algorithm that can be easily extended to high dimensional problems. Using simulated data, we show that our approach could correctly capture temporal dependencies in firing rates and identify synchronous neurons. We also apply our model to spike train data obtained from prefrontal cortical areas. PMID:24922500
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837
ERIC Educational Resources Information Center
Levy, Roy
2014-01-01
Digital games offer an appealing environment for assessing student proficiencies, including skills and misconceptions in a diagnostic setting. This paper proposes a dynamic Bayesian network modeling approach for observations of student performance from an educational video game. A Bayesian approach to model construction, calibration, and use in…
Bayesian Framework for Water Quality Model Uncertainty Estimation and Risk Management
A formal Bayesian methodology is presented for integrated model calibration and risk-based water quality management using Bayesian Monte Carlo simulation and maximum likelihood estimation (BMCML). The primary focus is on lucid integration of model calibration with risk-based wat...
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error in addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error inmore » addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.« less
Acquisition of causal models for local distributions in Bayesian networks.
Xiang, Yang; Truong, Minh
2014-09-01
To specify a Bayesian network, a local distribution in the form of a conditional probability table, often of an effect conditioned on its n causes, needs to be acquired, one for each non-root node. Since the number of parameters to be assessed is generally exponential in n , improving the efficiency is an important concern in knowledge engineering. Non-impeding noisy-AND (NIN-AND) tree causal models reduce the number of parameters to being linear in n , while explicitly expressing both reinforcing and undermining interactions among causes. The key challenge in NIN-AND tree modeling is the acquisition of the NIN-AND tree structure. In this paper, we formulate a concise structure representation and an expressive causal interaction function of NIN-AND trees. Building on these representations, we propose two structural acquisition methods, which are applicable to both elicitation-based and machine learning-based acquisitions. Their accuracy is demonstrated through experimental evaluations.
NASA Astrophysics Data System (ADS)
Popova, Olga H.
Dental hygiene students must embody effective critical thinking skills in order to provide evidence-based comprehensive patient care. The problem addressed in this study it was not known if and to what extent concept mapping and reflective journaling activities embedded in a curriculum over a 4-week period, impacted the critical thinking skills of 22 first and second-year dental hygiene students attending a community college in the Midwest. The overarching research questions were: what is the effect of concept mapping, and what is the effect of reflective journaling on the level of critical thinking skills of first and second year dental hygiene students? This quantitative study employed a quasi-experimental, pretest-posttest design. Analysis of Covariance (ANCOVA) assessed students' mean scores of critical thinking on the California Critical Thinking Skills Test (CCTST) pretest and posttest for the concept mapping and reflective journaling treatment groups. The results of the study found an increase in CCTST posttest scores with the use of both concept mapping and reflective journaling. However, the increase in scores was not found to be statistically significant. Hence, this study identified concept mapping using Ausubel's assimilation theory and reflective journaling incorporating Johns's revision of Carper's patterns of knowing as potential instructional strategies and theoretical models to enhance undergraduate students' critical thinking skills. More research is required in this area to draw further conclusions. Keywords: Critical thinking, critical thinking development, critical thinking skills, instructional strategies, concept mapping, reflective journaling, dental hygiene, college students.
Cooper, Richard J; Krueger, Tobias; Hiscock, Kevin M; Rawlins, Barry G
2014-01-01
Mixing models have become increasingly common tools for apportioning fluvial sediment load to various sediment sources across catchments using a wide variety of Bayesian and frequentist modeling approaches. In this study, we demonstrate how different model setups can impact upon resulting source apportionment estimates in a Bayesian framework via a one-factor-at-a-time (OFAT) sensitivity analysis. We formulate 13 versions of a mixing model, each with different error assumptions and model structural choices, and apply them to sediment geochemistry data from the River Blackwater, Norfolk, UK, to apportion suspended particulate matter (SPM) contributions from three sources (arable topsoils, road verges, and subsurface material) under base flow conditions between August 2012 and August 2013. Whilst all 13 models estimate subsurface sources to be the largest contributor of SPM (median ∼76%), comparison of apportionment estimates reveal varying degrees of sensitivity to changing priors, inclusion of covariance terms, incorporation of time-variant distributions, and methods of proportion characterization. We also demonstrate differences in apportionment results between a full and an empirical Bayesian setup, and between a Bayesian and a frequentist optimization approach. This OFAT sensitivity analysis reveals that mixing model structural choices and error assumptions can significantly impact upon sediment source apportionment results, with estimated median contributions in this study varying by up to 21% between model versions. Users of mixing models are therefore strongly advised to carefully consider and justify their choice of model structure prior to conducting sediment source apportionment investigations. Key Points An OFAT sensitivity analysis of sediment fingerprinting mixing models is conducted Bayesian models display high sensitivity to error assumptions and structural choices Source apportionment results differ between Bayesian and frequentist approaches PMID
Diagnosing Hybrid Systems: a Bayesian Model Selection Approach
NASA Technical Reports Server (NTRS)
McIlraith, Sheila A.
2005-01-01
In this paper we examine the problem of monitoring and diagnosing noisy complex dynamical systems that are modeled as hybrid systems-models of continuous behavior, interleaved by discrete transitions. In particular, we examine continuous systems with embedded supervisory controllers that experience abrupt, partial or full failure of component devices. Building on our previous work in this area (MBCG99;MBCG00), our specific focus in this paper ins on the mathematical formulation of the hybrid monitoring and diagnosis task as a Bayesian model tracking algorithm. The nonlinear dynamics of many hybrid systems present challenges to probabilistic tracking. Further, probabilistic tracking of a system for the purposes of diagnosis is problematic because the models of the system corresponding to failure modes are numerous and generally very unlikely. To focus tracking on these unlikely models and to reduce the number of potential models under consideration, we exploit logic-based techniques for qualitative model-based diagnosis to conjecture a limited initial set of consistent candidate models. In this paper we discuss alternative tracking techniques that are relevant to different classes of hybrid systems, focusing specifically on a method for tracking multiple models of nonlinear behavior simultaneously using factored sampling and conditional density propagation. To illustrate and motivate the approach described in this paper we examine the problem of monitoring and diganosing NASA's Sprint AERCam, a small spherical robotic camera unit with 12 thrusters that enable both linear and rotational motion.
NASA Astrophysics Data System (ADS)
Legleiter, Carl J.
2014-01-01
Fluvial geomorphology is fundamentally concerned with the association between form and process in rivers. Examining these interactions in complex, natural channels requires a means of quantifying the variability and organization of bed topography—this paper introduces a geostatistical framework for characterizing reach-scale spatial structure. Transformation to a channel-centered coordinate system allows topographic variations to be resolved into along- and across-stream components. Dimensionless variables, obtained by scaling distances by the mean channel width and de-trended elevations by the mean bankfull depth, account for channel size and allow spatial patterns to be compared over time or among sites. These patterns are effectively described by the variogram, a spatial statistic that expresses dissimilarity as a function of distance. Fitting a parametric model to the sample variogram provides a rich description of channel form. For example, multiple, nested structures can be combined to account for anisotropy, with varying degrees of spatial variability observed over different length scales along and across the channel. Integral metrics derived from the variogram model yield a more compact summary, and variogram maps a useful visualization. To guide interpretation of these metrics, I used a simple 'channel builder' to isolate the effects of specific aspects of river morphology on the variogram. This analysis indicated that geostatistical models were sensitive to changes in the size, shape, and orientation of channel features, but not to a pure translation of the morphology. The results also highlighted the importance of considering streamwise and transverse components jointly rather than in isolation.
Bayesian modeling of censored partial linear models using scale-mixtures of normal distributions
NASA Astrophysics Data System (ADS)
Castro, Luis M.; Lachos, Victor H.; Ferreira, Guillermo P.; Arellano-Valle, Reinaldo B.
2012-10-01
Regression models where the dependent variable is censored (limited) are usually considered in statistical analysis. Particularly, the case of a truncation to the left of zero and a normality assumption for the error terms is studied in detail by [1] in the well known Tobit model. In the present article, this typical censored regression model is extended by considering a partial linear model with errors belonging to the class of scale mixture of normal distributions. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measures. We evaluate the performances of the proposed methods with simulated data. In addition, we present an application in order to know what type of variables affect the income of housewives.
Peterson, Erin E; Urquhart, N Scott
2006-10-01
In the United States, probability-based water quality surveys are typically used to meet the requirements of Section 305(b) of the Clean Water Act. The survey design allows an inference to be generated concerning regional stream condition, but it cannot be used to identify water quality impaired stream segments. Therefore, a rapid and cost-efficient method is needed to locate potentially impaired stream segments throughout large areas. We fit a set of geostatistical models to 312 samples of dissolved organic carbon (DOC) collected in 1996 for the Maryland Biological Stream Survey using coarse-scale watershed characteristics. The models were developed using two distance measures, straight-line distance (SLD) and weighted asymmetric hydrologic distance (WAHD). We used the Corrected Spatial Akaike Information Criterion and the mean square prediction error to compare models. The SLD models predicted more variability in DOC than models based on WAHD for every autocovariance model except the spherical model. The SLD model based on the Mariah autocovariance model showed the best fit (r(2) = 0.72). DOC demonstrated a positive relationship with the watershed attributes percent water, percent wetlands, and mean minimum temperature, but was negatively correlated to percent felsic rock type. We used universal kriging to generate predictions and prediction variances for 3083 stream segments throughout Maryland. The model predicted that 90.2% of stream kilometers had DOC values less than 5 mg/l, 6.7% were between 5 and 8 mg/l, and 3.1% of streams produced values greater than 8 mg/l. The geostatistical model generated more accurate DOC predictions than previous models, but did not fit the data equally well throughout the state. Consequently, it may be necessary to develop more than one geostatistical model to predict stream DOC throughout Maryland. Our methodology is an improvement over previous methods because additional field sampling is not necessary, inferences about regional
Bayesian network models for error detection in radiotherapy plans
NASA Astrophysics Data System (ADS)
Kalet, Alan M.; Gennari, John H.; Ford, Eric C.; Phillips, Mark H.
2015-04-01
The purpose of this study is to design and develop a probabilistic network for detecting errors in radiotherapy plans for use at the time of initial plan verification. Our group has initiated a multi-pronged approach to reduce these errors. We report on our development of Bayesian models of radiotherapy plans. Bayesian networks consist of joint probability distributions that define the probability of one event, given some set of other known information. Using the networks, we find the probability of obtaining certain radiotherapy parameters, given a set of initial clinical information. A low probability in a propagated network then corresponds to potential errors to be flagged for investigation. To build our networks we first interviewed medical physicists and other domain experts to identify the relevant radiotherapy concepts and their associated interdependencies and to construct a network topology. Next, to populate the network’s conditional probability tables, we used the Hugin Expert software to learn parameter distributions from a subset of de-identified data derived from a radiation oncology based clinical information database system. These data represent 4990 unique prescription cases over a 5 year period. Under test case scenarios with approximately 1.5% introduced error rates, network performance produced areas under the ROC curve of 0.88, 0.98, and 0.89 for the lung, brain and female breast cancer error detection networks, respectively. Comparison of the brain network to human experts performance (AUC of 0.90 ± 0.01) shows the Bayes network model performs better than domain experts under the same test conditions. Our results demonstrate the feasibility and effectiveness of comprehensive probabilistic models as part of decision support systems for improved detection of errors in initial radiotherapy plan verification procedures.
Wang, Meng; Sampson, Paul D; Hu, Jianlin; Kleeman, Michael; Keller, Joshua P; Olives, Casey; Szpiro, Adam A; Vedal, Sverre; Kaufman, Joel D
2016-05-17
Assessments of long-term air pollution exposure in population studies have commonly employed land-use regression (LUR) or chemical transport modeling (CTM) techniques. Attempts to incorporate both approaches in one modeling framework are challenging. We present a novel geostatistical modeling framework, incorporating CTM predictions into a spatiotemporal LUR model with spatial smoothing to estimate spatiotemporal variability of ozone (O3) and particulate matter with diameter less than 2.5 μm (PM2.5) from 2000 to 2008 in the Los Angeles Basin. The observations include over 9 years' data from more than 20 routine monitoring sites and specific monitoring data at over 100 locations to provide more comprehensive spatial coverage of air pollutants. Our composite modeling approach outperforms separate CTM and LUR models in terms of root-mean-square error (RMSE) assessed by 10-fold cross-validation in both temporal and spatial dimensions, with larger improvement in the accuracy of predictions for O3 (RMSE [ppb] for CTM, 6.6; LUR, 4.6; composite, 3.6) than for PM2.5 (RMSE [μg/m(3)] CTM: 13.7, LUR: 3.2, composite: 3.1). Our study highlights the opportunity for future exposure assessment to make use of readily available spatiotemporal modeling methods and auxiliary gridded data that takes chemical reaction processes into account to improve the accuracy of predictions in a single spatiotemporal modeling framework.
Wang, Meng; Sampson, Paul D; Hu, Jianlin; Kleeman, Michael; Keller, Joshua P; Olives, Casey; Szpiro, Adam A; Vedal, Sverre; Kaufman, Joel D
2016-05-17
Assessments of long-term air pollution exposure in population studies have commonly employed land-use regression (LUR) or chemical transport modeling (CTM) techniques. Attempts to incorporate both approaches in one modeling framework are challenging. We present a novel geostatistical modeling framework, incorporating CTM predictions into a spatiotemporal LUR model with spatial smoothing to estimate spatiotemporal variability of ozone (O3) and particulate matter with diameter less than 2.5 μm (PM2.5) from 2000 to 2008 in the Los Angeles Basin. The observations include over 9 years' data from more than 20 routine monitoring sites and specific monitoring data at over 100 locations to provide more comprehensive spatial coverage of air pollutants. Our composite modeling approach outperforms separate CTM and LUR models in terms of root-mean-square error (RMSE) assessed by 10-fold cross-validation in both temporal and spatial dimensions, with larger improvement in the accuracy of predictions for O3 (RMSE [ppb] for CTM, 6.6; LUR, 4.6; composite, 3.6) than for PM2.5 (RMSE [μg/m(3)] CTM: 13.7, LUR: 3.2, composite: 3.1). Our study highlights the opportunity for future exposure assessment to make use of readily available spatiotemporal modeling methods and auxiliary gridded data that takes chemical reaction processes into account to improve the accuracy of predictions in a single spatiotemporal modeling framework. PMID:27074524
A Bayesian modelling framework for tornado occurrences in North America.
Cheng, Vincent Y S; Arhonditsis, George B; Sills, David M L; Gough, William A; Auld, Heather
2015-01-01
Tornadoes represent one of nature's most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year.
A Bayesian modelling framework for tornado occurrences in North America.
Cheng, Vincent Y S; Arhonditsis, George B; Sills, David M L; Gough, William A; Auld, Heather
2015-01-01
Tornadoes represent one of nature's most hazardous phenomena that have been responsible for significant destruction and devastating fatalities. Here we present a Bayesian modelling approach for elucidating the spatiotemporal patterns of tornado activity in North America. Our analysis shows a significant increase in the Canadian Prairies and the Northern Great Plains during the summer, indicating a clear transition of tornado activity from the United States to Canada. The linkage between monthly-averaged atmospheric variables and likelihood of tornado events is characterized by distinct seasonality; the convective available potential energy is the predominant factor in the summer; vertical wind shear appears to have a strong signature primarily in the winter and secondarily in the summer; and storm relative environmental helicity is most influential in the spring. The present probabilistic mapping can be used to draw inference on the likelihood of tornado occurrence in any location in North America within a selected time period of the year. PMID:25807465
Toward diagnostic model calibration and evaluation: Approximate Bayesian computation
NASA Astrophysics Data System (ADS)
Vrugt, Jasper A.; Sadegh, Mojtaba
2013-07-01
The ever increasing pace of computational power, along with continued advances in measurement technologies and improvements in process understanding has stimulated the development of increasingly complex hydrologic models that simulate soil moisture flow, groundwater recharge, surface runoff, root water uptake, and river discharge at different spatial and temporal scales. Reconciling these high-order system models with perpetually larger volumes of field data is becoming more and more difficult, particularly because classical likelihood-based fitting methods lack the power to detect and pinpoint deficiencies in the model structure. Gupta et al. (2008) has recently proposed steps (amongst others) toward the development of a more robust and powerful method of model evaluation. Their diagnostic approach uses signature behaviors and patterns observed in the input-output data to illuminate to what degree a representation of the real world has been adequately achieved and how the model should be improved for the purpose of learning and scientific discovery. In this paper, we introduce approximate Bayesian computation (ABC) as a vehicle for diagnostic model evaluation. This statistical methodology relaxes the need for an explicit likelihood function in favor of one or multiple different summary statistics rooted in hydrologic theory that together have a clearer and more compelling diagnostic power than some average measure of the size of the error residuals. Two illustrative case studies are used to demonstrate that ABC is relatively easy to implement, and readily employs signature based indices to analyze and pinpoint which part of the model is malfunctioning and in need of further improvement.
A unified Bayesian hierarchical model for MRI tissue classification.
Feng, Dai; Liang, Dong; Tierney, Luke
2014-04-15
Various works have used magnetic resonance imaging (MRI) tissue classification extensively to study a number of neurological and psychiatric disorders. Various noise characteristics and other artifacts make this classification a challenging task. Instead of splitting the procedure into different steps, we extend a previous work to develop a unified Bayesian hierarchical model, which addresses both the partial volume effect and intensity non-uniformity, the two major acquisition artifacts, simultaneously. We adopted a normal mixture model with the means and variances depending on the tissue types of voxels to model the observed intensity values. We modeled the relationship among the components of the index vector of tissue types by a hidden Markov model, which captures the spatial similarity of voxels. Furthermore, we addressed the partial volume effect by construction of a higher resolution image in which each voxel is divided into subvoxels. Finally, We achieved the bias field correction by using a Gaussian Markov random field model with a band precision matrix designed in light of image filtering. Sparse matrix methods and parallel computations based on conditional independence are exploited to improve the speed of the Markov chain Monte Carlo simulation. The unified model provides more accurate tissue classification results for both simulated and real data sets. PMID:24738112
Bridging groundwater models and decision support with a Bayesian network
Fienen, Michael N.; Masterson, John P.; Plant, Nathaniel G.; Gutierrez, Benjamin T.; Thieler, E. Robert
2013-01-01
Resource managers need to make decisions to plan for future environmental conditions, particularly sea level rise, in the face of substantial uncertainty. Many interacting processes factor in to the decisions they face. Advances in process models and the quantification of uncertainty have made models a valuable tool for this purpose. Long-simulation runtimes and, often, numerical instability make linking process models impractical in many cases. A method for emulating the important connections between model input and forecasts, while propagating uncertainty, has the potential to provide a bridge between complicated numerical process models and the efficiency and stability needed for decision making. We explore this using a Bayesian network (BN) to emulate a groundwater flow model. We expand on previous approaches to validating a BN by calculating forecasting skill using cross validation of a groundwater model of Assateague Island in Virginia and Maryland, USA. This BN emulation was shown to capture the important groundwater-flow characteristics and uncertainty of the groundwater system because of its connection to island morphology and sea level. Forecast power metrics associated with the validation of multiple alternative BN designs guided the selection of an optimal level of BN complexity. Assateague island is an ideal test case for exploring a forecasting tool based on current conditions because the unique hydrogeomorphological variability of the island includes a range of settings indicative of past, current, and future conditions. The resulting BN is a valuable tool for exploring the response of groundwater conditions to sea level rise in decision support.
Mixed-point geostatistical simulation: A combination of two- and multiple-point geostatistics
NASA Astrophysics Data System (ADS)
Cordua, Knud Skou; Hansen, Thomas Mejer; Gulbrandsen, Mats Lundh; Barnes, Christophe; Mosegaard, Klaus
2016-09-01
Multiple-point-based geostatistical methods are used to model complex geological structures. However, a training image containing the characteristic patterns of the Earth model has to be provided. If no training image is available, two-point (i.e., covariance-based) geostatistical methods are typically applied instead because these methods provide fewer constraints on the Earth model. This study is motivated by the case where 1-D vertical training images are available through borehole logs, whereas little or no information about horizontal dependencies exists. This problem is solved by developing theory that makes it possible to combine information from multiple- and two-point geostatistics for different directions, leading to a mixed-point geostatistical model. An example of combining information from the multiple-point-based single normal equation simulation algorithm and two-point-based sequential indicator simulation algorithm is provided. The mixed-point geostatistical model is used for conditional sequential simulation based on vertical training images from five borehole logs and a range parameter describing the horizontal dependencies.
A Bayesian Joint Model of Menstrual Cycle Length and Fecundity
Lum, Kirsten J.; Sundaram, Rajeshwari; Louis, Germaine M. Buck; Louis, Thomas A.
2015-01-01
Summary Menstrual cycle length (MCL) has been shown to play an important role in couple fecundity, which is the biologic capacity for reproduction irrespective of pregnancy intentions. However, a comprehensive assessment of its role requires a fecundity model that accounts for male and female attributes and the couple’s intercourse pattern relative to the ovulation day. To this end, we employ a Bayesian joint model for MCL and pregnancy. MCLs follow a scale multiplied (accelerated) mixture model with Gaussian and Gumbel components; the pregnancy model includes MCL as a covariate and computes the cycle-specific probability of pregnancy in a menstrual cycle conditional on the pattern of intercourse and no previous fertilization. Day-specific fertilization probability is modeled using natural, cubic splines. We analyze data from the Longitudinal Investigation of Fertility and the Environment Study (the LIFE Study), a couple based prospective pregnancy study, and find a statistically significant quadratic relation between fecundity and menstrual cycle length, after adjustment for intercourse pattern and other attributes, including male semen quality, both partner’s age, and active smoking status (determined by baseline cotinine level 100ng/mL). We compare results to those produced by a more basic model and show the advantages of a more comprehensive approach. PMID:26295923
Bayesian analysis of inflation: Parameter estimation for single field models
Mortonson, Michael J.; Peiris, Hiranya V.; Easther, Richard
2011-02-15
Future astrophysical data sets promise to strengthen constraints on models of inflation, and extracting these constraints requires methods and tools commensurate with the quality of the data. In this paper we describe ModeCode, a new, publicly available code that computes the primordial scalar and tensor power spectra for single-field inflationary models. ModeCode solves the inflationary mode equations numerically, avoiding the slow roll approximation. It is interfaced with CAMB and CosmoMC to compute cosmic microwave background angular power spectra and perform likelihood analysis and parameter estimation. ModeCode is easily extendable to additional models of inflation, and future updates will include Bayesian model comparison. Errors from ModeCode contribute negligibly to the error budget for analyses of data from Planck or other next generation experiments. We constrain representative single-field models ({phi}{sup n} with n=2/3, 1, 2, and 4, natural inflation, and 'hilltop' inflation) using current data, and provide forecasts for Planck. From current data, we obtain weak but nontrivial limits on the post-inflationary physics, which is a significant source of uncertainty in the predictions of inflationary models, while we find that Planck will dramatically improve these constraints. In particular, Planck will link the inflationary dynamics with the post-inflationary growth of the horizon, and thus begin to probe the ''primordial dark ages'' between TeV and grand unified theory scale energies.
A Bayesian hierarchical model for wind gust prediction
NASA Astrophysics Data System (ADS)
Friederichs, Petra; Oesting, Marco; Schlather, Martin
2014-05-01
A postprocessing method for ensemble wind gust forecasts given by a mesoscale limited area numerical weather prediction (NWP) model is presented, which is based on extreme value theory. A process layer for the parameters of a generalized extreme value distribution (GEV) is introduced using a Bayesian hierarchical model (BHM). Incorporating the information of the COMSO-DE forecasts, the process parameters model the spatial response surfaces of the GEV parameters as Gaussian random fields. The spatial BHM provides area wide forecasts of wind gusts in terms of a conditional GEV. It models the marginal distribution of the spatial gust process and provides not only forecasts of the conditional GEV at locations without observations, but also uncertainty information about the estimates. A disadvantages of BHM model is that it assumes conditional independent observations. In order to incorporate the dependence between gusts at neighboring locations as well as the spatial random fields of observed and forecasted maximal wind gusts, we propose to model them jointly by a bivariate Brown-Resnick process.
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
ERIC Educational Resources Information Center
West, Patti; Rutstein, Daisy Wise; Mislevy, Robert J.; Liu, Junhui; Choi, Younyoung; Levy, Roy; Crawford, Aaron; DiCerbo, Kristen E.; Chappel, Kristina; Behrens, John T.
2010-01-01
A major issue in the study of learning progressions (LPs) is linking student performance on assessment tasks to the progressions. This report describes the challenges faced in making this linkage using Bayesian networks to model LPs in the field of computer networking. The ideas are illustrated with exemplar Bayesian networks built on Cisco…
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan; Tang, Nian-Sheng
2007-01-01
The analysis of interaction among latent variables has received much attention. This article introduces a Bayesian approach to analyze a general structural equation model that accommodates the general nonlinear terms of latent variables and covariates. This approach produces a Bayesian estimate that has the same statistical optimal properties as a…
Integration of geologic interpretation into geostatistical simulation
Carle, S.F.
1997-06-01
Embedded Markov chain analysis has been used to quantify geologic interpretation of juxtapositional tendencies of geologic facies. Such interpretations can also be translated into continuous-lag Markov chain models of spatial variability for use in geostatistical simulation of facies architecture.
Bayesian Estimation and Uncertainty Quantification in Differential Equation Models
NASA Astrophysics Data System (ADS)
Bhaumik, Prithwish
In engineering, physics, biomedical sciences, pharmacokinetics and pharmacodynamics (PKPD) and many other fields the regression function is often specified as solution of a system of ordinary differential equations (ODEs) given by. dƒtheta(t) / dt = F(t), ƒtheta(, t),theta), t ∈ [0, 1]; here F is a known appropriately smooth vector valued function. Our interest lies in estimating theta from the noisy data. A two-step approach to solve this problem consists of the first step fitting the data nonparametrically, and the second step estimating the parameter by minimizing the distance between the nonparametrically estimated derivative and the derivative suggested by the system of ODEs. In Chapter 2 we consider a Bayesian analog of the two step approach by putting a finite random series prior on the regression function using B-spline basis. We establish a Bernstein-von Mises theorem for the posterior distribution of the parameter of interest induced from that on the regression function with the n --1/2 contraction rate. Although this approach is computationally fast, the Bayes estimator is not asymptotically efficient. This can be remedied by directly considering the distance between the function in the nonparametric model and a Runge-Kutta (RK4) approximate solution of the ODE while inducing the posterior distribution on the parameter as done in Chapter 3. We also study the asymptotic properties of a direct Bayesian method obtained from the approximate likelihood obtained by the RK4 method in Chapter 3. Chapters 4 and 5 contain the extensions of the methods discussed so far for higher order ODE's and partial differential equations (PDE's) respectively. We have mentioned the scopes of some future works in Chapter 6.
More Bayesian Transdimensional Inversion for Thermal History Modelling (Invited)
NASA Astrophysics Data System (ADS)
Gallagher, K.
2013-12-01
Since the publication of Dodson (1973) quantifying the relationship between geochronogical ages and closure temperatures, an ongoing concern in thermochronology is reconstruction of thermal histories consistent with the measured data. Extracting this thermal history information is best treated as an inverse problem, given the complex relationship between the observations and the thermal history. When solving the inverse problem (i.e. finding thermal acceptable thermal histories), stochastic sampling methods have often been used, as these are relatively global when searching the model space. However, the issue remains how best to estimate those parts of the thermal history unconstrained by independent information, i.e. what is required to fit the data ? To solve this general problem, we use a Bayesian transdimensional Markov Chain Monte Carlo method and this has been integrated into user-friendly software, QTQt (Quantitative Thermochronology with Qt), which runs on both Macintosh and PC. The Bayesian approach allows us to consider a wide range of possible thermal history as general prior information on time, temperature (and temperature offset for multiple samples in a vertical profile). We can also incorporate more focussed geological constraints in terms of more specific priors. In this framework, it is the data themselves (and their errors) that determine the complexity of the thermal history solutions. For example, more precise data will justify a more complex solution, while more noisy data will be happy with simpler solutions. We can express complexity in terms of the number of time-temperature points defining the total thermal history. Another useful feature of this method is that was can easily deal with imprecise parameter values (e.g. kinetics, data errors), by drawing samples from a user specified probability distribution, rather than using a single value. Finally, the method can be applied to either single samples, or multiple samples (from a borehole or
Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach
Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.
2005-01-01
A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.
Bayesian latent structure models with space-time-dependent covariates.
Cai, Bo; Lawson, Andrew B; Hossain, Md Monir; Choi, Jungsoon
2012-04-01
Spatial-temporal data requires flexible regression models which can model the dependence of responses on space- and time-dependent covariates. In this paper, we describe a semiparametric space-time model from a Bayesian perspective. Nonlinear time dependence of covariates and the interactions among the covariates are constructed by local linear and piecewise linear models, allowing for more flexible orientation and position of the covariate plane by using time-varying basis functions. Space-varying covariate linkage coefficients are also incorporated to allow for the variation of space structures across the geographical location. The formulation accommodates uncertainty in the number and locations of the piecewise basis functions to characterize the global effects, spatially structured and unstructured random effects in relation to covariates. The proposed approach relies on variable selection-type mixture priors for uncertainty in the number and locations of basis functions and in the space-varying linkage coefficients. A simulation example is presented to evaluate the performance of the proposed approach with the competing models. A real data example is used for illustration.
Bayesian approach for flexible modeling of semicompeting risks data.
Han, Baoguang; Yu, Menggang; Dignam, James J; Rathouz, Paul J
2014-12-20
Semicompeting risks data arise when two types of events, non-terminal and terminal, are observed. When the terminal event occurs first, it censors the non-terminal event, but not vice versa. To account for possible dependent censoring of the non-terminal event by the terminal event and to improve prediction of the terminal event using the non-terminal event information, it is crucial to model their association properly. Motivated by a breast cancer clinical trial data analysis, we extend the well-known illness-death models to allow flexible random effects to capture heterogeneous association structures in the data. Our extension also represents a generalization of the popular shared frailty models that usually assume that the non-terminal event does not affect the hazards of the terminal event beyond a frailty term. We propose a unified Bayesian modeling approach that can utilize existing software packages for both model fitting and individual-specific event prediction. The approach is demonstrated via both simulation studies and a breast cancer data set analysis. PMID:25274445
A Bayesian Model of Category-Specific Emotional Brain Responses
Wager, Tor D.; Kang, Jian; Johnson, Timothy D.; Nichols, Thomas E.; Satpute, Ajay B.; Barrett, Lisa Feldman
2015-01-01
Understanding emotion is critical for a science of healthy and disordered brain function, but the neurophysiological basis of emotional experience is still poorly understood. We analyzed human brain activity patterns from 148 studies of emotion categories (2159 total participants) using a novel hierarchical Bayesian model. The model allowed us to classify which of five categories—fear, anger, disgust, sadness, or happiness—is engaged by a study with 66% accuracy (43-86% across categories). Analyses of the activity patterns encoded in the model revealed that each emotion category is associated with unique, prototypical patterns of activity across multiple brain systems including the cortex, thalamus, amygdala, and other structures. The results indicate that emotion categories are not contained within any one region or system, but are represented as configurations across multiple brain networks. The model provides a precise summary of the prototypical patterns for each emotion category, and demonstrates that a sufficient characterization of emotion categories relies on (a) differential patterns of involvement in neocortical systems that differ between humans and other species, and (b) distinctive patterns of cortical-subcortical interactions. Thus, these findings are incompatible with several contemporary theories of emotion, including those that emphasize emotion-dedicated brain systems and those that propose emotion is localized primarily in subcortical activity. They are consistent with componential and constructionist views, which propose that emotions are differentiated by a combination of perceptual, mnemonic, prospective, and motivational elements. Such brain-based models of emotion provide a foundation for new translational and clinical approaches. PMID:25853490
Ensemble bayesian model averaging using markov chain Monte Carlo sampling
Vrugt, Jasper A; Diks, Cees G H; Clark, Martyn P
2008-01-01
Bayesian model averaging (BMA) has recently been proposed as a statistical method to calibrate forecast ensembles from numerical weather models. Successful implementation of BMA however, requires accurate estimates of the weights and variances of the individual competing models in the ensemble. In their seminal paper (Raftery etal. Mon Weather Rev 133: 1155-1174, 2(05)) has recommended the Expectation-Maximization (EM) algorithm for BMA model training, even though global convergence of this algorithm cannot be guaranteed. In this paper, we compare the performance of the EM algorithm and the recently developed Differential Evolution Adaptive Metropolis (DREAM) Markov Chain Monte Carlo (MCMC) algorithm for estimating the BMA weights and variances. Simulation experiments using 48-hour ensemble data of surface temperature and multi-model stream-flow forecasts show that both methods produce similar results, and that their performance is unaffected by the length of the training data set. However, MCMC simulation with DREAM is capable of efficiently handling a wide variety of BMA predictive distributions, and provides useful information about the uncertainty associated with the estimated BMA weights and variances.
Bayesian calibration of the Community Land Model using surrogates
Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Swiler, Laura Painton
2014-02-01
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.
Bayesian Calibration of the Community Land Model using Surrogates
Ray, Jaideep; Hou, Zhangshuan; Huang, Maoyi; Sargsyan, K.; Swiler, Laura P.
2015-01-01
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditioned on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that accurate surrogate models can be created for CLM in most cases. The posterior distributions lead to better prediction than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters’ distributions significantly. The structural error model reveals a correlation time-scale which can potentially be used to identify physical processes that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.
Application Bayesian Model Averaging method for ensemble system for Poland
NASA Astrophysics Data System (ADS)
Guzikowski, Jakub; Czerwinska, Agnieszka
2014-05-01
The aim of the project is to evaluate methods for generating numerical ensemble weather prediction using a meteorological data from The Weather Research & Forecasting Model and calibrating this data by means of Bayesian Model Averaging (WRF BMA) approach. We are constructing height resolution short range ensemble forecasts using meteorological data (temperature) generated by nine WRF's models. WRF models have 35 vertical levels and 2.5 km x 2.5 km horizontal resolution. The main emphasis is that the used ensemble members has a different parameterization of the physical phenomena occurring in the boundary layer. To calibrate an ensemble forecast we use Bayesian Model Averaging (BMA) approach. The BMA predictive Probability Density Function (PDF) is a weighted average of predictive PDFs associated with each individual ensemble member, with weights that reflect the member's relative skill. For test we chose a case with heat wave and convective weather conditions in Poland area from 23th July to 1st August 2013. From 23th July to 29th July 2013 temperature oscillated below or above 30 Celsius degree in many meteorology stations and new temperature records were added. During this time the growth of the hospitalized patients with cardiovascular system problems was registered. On 29th July 2013 an advection of moist tropical air masses was recorded in the area of Poland causes strong convection event with mesoscale convection system (MCS). MCS caused local flooding, damage to the transport infrastructure, destroyed buildings, trees and injuries and direct threat of life. Comparison of the meteorological data from ensemble system with the data recorded on 74 weather stations localized in Poland is made. We prepare a set of the model - observations pairs. Then, the obtained data from single ensemble members and median from WRF BMA system are evaluated on the basis of the deterministic statistical error Root Mean Square Error (RMSE), Mean Absolute Error (MAE). To evaluation
Quantifying Uncertainty in Velocity Models using Bayesian Methods
NASA Astrophysics Data System (ADS)
Hobbs, R.; Caiado, C.; Majdański, M.
2008-12-01
Quanitifying uncertainty in models derived from observed data is a major issue. Public and political understanding of uncertainty is poor and for industry inadequate assessment of risk costs money. In this talk we will examine the geological structure of the subsurface, however our principal exploration tool, controlled source seismology, gives its data in time. Inversion tools exist to map these data into a depth model but a full exploration of the uncertainty of the model is rarely done because robust strategies do not exist for large non-linear complex systems. There are two principal sources of uncertainty: the first comes from the input data which is noisy and bandlimited; the second, and more sinister, is from the model parameterisation and forward algorithms themselves, which approximate to the physics to make the problem tractable. To address these issues we propose a Bayesian approach. One philosophy is to estimate the uncertainty in a possible model derived using standard inversion tools. During the inversion stage we can use our geological prejudice to derive an acceptable model. Then we use a local random walk using the Metropolis- Hastings algorithm to explore the model space immediately around a possible solution. For models with a limited number of parameters we can use the forward modeling step from the inversion code. However as the number of parameters increase and/or the cost of the forward modeling step becomes significant, we need to use fast emulators to act as proxies so a sufficient number of iterations can be performed on which to base our statistical measures of uncertainty. In this presentation we show examples of uncertainty estimation using both pre- and post-critical seismic data. In particular, we will demonstrate uncertainty introduced by the approximation of the physics by using a tomographic inversion of bandlimited data and show that uncertainty increases as the central frequency of the data decreases. This is consistent with the
Using Bayesian Stable Isotope Mixing Models to Enhance Marine Ecosystem Models
The use of stable isotopes in food web studies has proven to be a valuable tool for ecologists. We investigated the use of Bayesian stable isotope mixing models as constraints for an ecosystem model of a temperate seagrass system on the Atlantic coast of France. δ13C and δ15N i...
Technology Transfer Automated Retrieval System (TEKTRAN)
In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT). In this hybrid method, several SWAT models with different structures are first selected; next GA i...
Forecasting unconventional resource productivity - A spatial Bayesian model
NASA Astrophysics Data System (ADS)
Montgomery, J.; O'sullivan, F.
2015-12-01
Today's low prices mean that unconventional oil and gas development requires ever greater efficiency and better development decision-making. Inter and intra-field variability in well productivity, which is a major contemporary driver of uncertainty regarding resource size and its economics is driven by factors including geological conditions, well and completion design (which companies vary as they seek to optimize their performance), and uncertainty about the nature of fracture propagation. Geological conditions are often not be well understood early on in development campaigns, but nevertheless critical assessments and decisions must be made regarding the value of drilling an area and the placement of wells. In these situations, location provides a reasonable proxy for geology and the "rock quality." We propose a spatial Bayesian model for forecasting acreage quality, which improves decision-making by leveraging available production data and provides a framework for statistically studying the influence of different parameters on well productivity. Our approach consists of subdividing a field into sections and forming prior distributions for productivity in each section based on knowledge about the overall field. Production data from wells is used to update these estimates in a Bayesian fashion, improving model accuracy far more rapidly and with less sensitivity to outliers than a model that simply establishes an "average" productivity in each section. Additionally, forecasts using this model capture the importance of uncertainty—either due to a lack of information or for areas that demonstrate greater geological risk. We demonstrate the forecasting utility of this method using public data and also provide examples of how information from this model can be combined with knowledge about a field's geology or changes in technology to better quantify development risk. This approach represents an important shift in the way that production data is used to guide
Collective opinion formation model under Bayesian updating and confirmation bias
NASA Astrophysics Data System (ADS)
Nishi, Ryosuke; Masuda, Naoki
2013-06-01
We propose a collective opinion formation model with a so-called confirmation bias. The confirmation bias is a psychological effect with which, in the context of opinion formation, an individual in favor of an opinion is prone to misperceive new incoming information as supporting the current belief of the individual. Our model modifies a Bayesian decision-making model for single individuals [M. Rabin and J. L. Schrag, Q. J. Econ.0033-553310.1162/003355399555945 114, 37 (1999)] for the case of a well-mixed population of interacting individuals in the absence of the external input. We numerically simulate the model to show that all the agents eventually agree on one of the two opinions only when the confirmation bias is weak. Otherwise, the stochastic population dynamics ends up creating a disagreement configuration (also called polarization), particularly for large system sizes. A strong confirmation bias allows various final disagreement configurations with different fractions of the individuals in favor of the opposite opinions.
A Bayesian Model for the Analysis of Transgenerational Epigenetic Variation
Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan
2015-01-01
Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix (T−1) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible. PMID:25617408
A Bayesian model for the analysis of transgenerational epigenetic variation.
Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan
2015-01-23
Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T: matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix ( T-1: ) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible.
Improving default risk prediction using Bayesian model uncertainty techniques.
Kazemi, Reza; Mosleh, Ali
2012-11-01
Credit risk is the potential exposure of a creditor to an obligor's failure or refusal to repay the debt in principal or interest. The potential of exposure is measured in terms of probability of default. Many models have been developed to estimate credit risk, with rating agencies dating back to the 19th century. They provide their assessment of probability of default and transition probabilities of various firms in their annual reports. Regulatory capital requirements for credit risk outlined by the Basel Committee on Banking Supervision have made it essential for banks and financial institutions to develop sophisticated models in an attempt to measure credit risk with higher accuracy. The Bayesian framework proposed in this article uses the techniques developed in physical sciences and engineering for dealing with model uncertainty and expert accuracy to obtain improved estimates of credit risk and associated uncertainties. The approach uses estimates from one or more rating agencies and incorporates their historical accuracy (past performance data) in estimating future default risk and transition probabilities. Several examples demonstrate that the proposed methodology can assess default probability with accuracy exceeding the estimations of all the individual models. Moreover, the methodology accounts for potentially significant departures from "nominal predictions" due to "upsetting events" such as the 2008 global banking crisis. PMID:23163724
Improving default risk prediction using Bayesian model uncertainty techniques.
Kazemi, Reza; Mosleh, Ali
2012-11-01
Credit risk is the potential exposure of a creditor to an obligor's failure or refusal to repay the debt in principal or interest. The potential of exposure is measured in terms of probability of default. Many models have been developed to estimate credit risk, with rating agencies dating back to the 19th century. They provide their assessment of probability of default and transition probabilities of various firms in their annual reports. Regulatory capital requirements for credit risk outlined by the Basel Committee on Banking Supervision have made it essential for banks and financial institutions to develop sophisticated models in an attempt to measure credit risk with higher accuracy. The Bayesian framework proposed in this article uses the techniques developed in physical sciences and engineering for dealing with model uncertainty and expert accuracy to obtain improved estimates of credit risk and associated uncertainties. The approach uses estimates from one or more rating agencies and incorporates their historical accuracy (past performance data) in estimating future default risk and transition probabilities. Several examples demonstrate that the proposed methodology can assess default probability with accuracy exceeding the estimations of all the individual models. Moreover, the methodology accounts for potentially significant departures from "nominal predictions" due to "upsetting events" such as the 2008 global banking crisis.
NASA Astrophysics Data System (ADS)
Iskandar, Ismed; Satria Gondokaryono, Yudi
2016-02-01
In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range
NASA Astrophysics Data System (ADS)
Namysłowska-Wilczyńska, Barbara
2013-03-01
The paper presents the first stage of research on a geostatistical hydrogeochemical 3D model dedicated to the horizontal and vertical spatial and time variation in the topographical, hydrological and quality parameters of underground water in the Kłodzko water intake area. The research covers the period 1977-2012. For this purpose various thematic databases, containing original data on coordinates X, Y (latitude and longitude) and Z (terrain elevation and time - years) and on regionalized variables, i.e., the underground water quality parameters in the Kłodzko water intake area determined for different analytical configurations (22 wells, 14 wells, 14 wells + 3 piezometers), were created. The data were subjected to spatial analyses using statistical methods. The input for the studies was the chemical determination of the quality parameters of underground water samples taken from the wells in the water intake area in different periods of time. Both archival data (acquired in the years 1977-1999, 1977-2011) and the latest data (collected in November 2011 and in January 2012) were analyzed. First, the underground water intake area with 22 wells was investigated. Then in order to assess the current quality of the underground water, 14 wells out of the 22 wells were selected for further chemical analyses and a collection siphon wall was included. Recently, three new piezometers were installed in the water intake area and so new water samples were taken, whereby the databases were supplemented with new chemical determinations. The variation in the topographical parameter (terrain elevation) and in the hydrogeological parameters: water abstraction level Z (with and without the land layout being taken into account) and the depth of occurrence of the water table, was examined. Subsequently, the variation in quality parameters was studied on the basis of data coming from 22 wells, then 14 wells and finally from 14 wells and 3 piezometers. The variation in: Fe, Mn, ammonium
Context-dependent decision-making: a simple Bayesian model.
Lloyd, Kevin; Leslie, David S
2013-05-01
Many phenomena in animal learning can be explained by a context-learning process whereby an animal learns about different patterns of relationship between environmental variables. Differentiating between such environmental regimes or 'contexts' allows an animal to rapidly adapt its behaviour when context changes occur. The current work views animals as making sequential inferences about current context identity in a world assumed to be relatively stable but also capable of rapid switches to previously observed or entirely new contexts. We describe a novel decision-making model in which contexts are assumed to follow a Chinese restaurant process with inertia and full Bayesian inference is approximated by a sequential-sampling scheme in which only a single hypothesis about current context is maintained. Actions are selected via Thompson sampling, allowing uncertainty in parameters to drive exploration in a straightforward manner. The model is tested on simple two-alternative choice problems with switching reinforcement schedules and the results compared with rat behavioural data from a number of T-maze studies. The model successfully replicates a number of important behavioural effects: spontaneous recovery, the effect of partial reinforcement on extinction and reversal, the overtraining reversal effect, and serial reversal-learning effects.
Using Bayesian Networks to Model Hierarchical Relationships in Epidemiological Studies
2011-01-01
OBJECTIVES To propose an alternative procedure, based on a Bayesian network (BN), for estimation and prediction, and to discuss its usefulness for taking into account the hierarchical relationships among covariates. METHODS The procedure is illustrated by modeling the risk of diarrhea infection for 2,740 children aged 0 to 59 months in Cameroon. We compare the procedure with a standard logistic regression and with a model based on multi-level logistic regression. RESULTS The standard logistic regression approach is inadequate, or at least incomplete, in that it does not attempt to account for potentially causal relationships between risk factors. The multi-level logistic regression does model the hierarchical structure, but does so in a piecewise manner; the resulting estimates and interpretations differ from those of the BN approach proposed here. An advantage of the BN approach is that it enables one to determine the probability that a risk factor (and/or the outcome) is in any specific state, given the states of the others. The currently available approaches can only predict the outcome (disease), given the states of the covariates. CONCLUSION A major advantage of BNs is that they can deal with more complex interrelationships between variables whereas competing approaches deal at best only with hierarchical ones. We propose that BN be considered as well as a worthwhile method for summarizing the data in epidemiological studies whose aim is understanding the determinants of diseases and quantifying their effects. PMID:21779534
Bayesian network model of crowd emotion and negative behavior
NASA Astrophysics Data System (ADS)
Ramli, Nurulhuda; Ghani, Noraida Abdul; Hatta, Zulkarnain Ahmad; Hashim, Intan Hashimah Mohd; Sulong, Jasni; Mahudin, Nor Diana Mohd; Rahman, Shukran Abd; Saad, Zarina Mat
2014-12-01
The effects of overcrowding have become a major concern for event organizers. One aspect of this concern has been the idea that overcrowding can enhance the occurrence of serious incidents during events. As one of the largest Muslim religious gathering attended by pilgrims from all over the world, Hajj has become extremely overcrowded with many incidents being reported. The purpose of this study is to analyze the nature of human emotion and negative behavior resulting from overcrowding during Hajj events from data gathered in Malaysian Hajj Experience Survey in 2013. The sample comprised of 147 Malaysian pilgrims (70 males and 77 females). Utilizing a probabilistic model called Bayesian network, this paper models the dependence structure between different emotions and negative behaviors of pilgrims in the crowd. The model included the following variables of emotion: negative, negative comfortable, positive, positive comfortable and positive spiritual and variables of negative behaviors; aggressive and hazardous acts. The study demonstrated that emotions of negative, negative comfortable, positive spiritual and positive emotion have a direct influence on aggressive behavior whereas emotion of negative comfortable, positive spiritual and positive have a direct influence on hazardous acts behavior. The sensitivity analysis showed that a low level of negative and negative comfortable emotions leads to a lower level of aggressive and hazardous behavior. Findings of the study can be further improved to identify the exact cause and risk factors of crowd-related incidents in preventing crowd disasters during the mass gathering events.
A Bayesian Semiparametric Model for Radiation Dose-Response Estimation.
Furukawa, Kyoji; Misumi, Munechika; Cologne, John B; Cullings, Harry M
2016-06-01
In evaluating the risk of exposure to health hazards, characterizing the dose-response relationship and estimating acceptable exposure levels are the primary goals. In analyses of health risks associated with exposure to ionizing radiation, while there is a clear agreement that moderate to high radiation doses cause harmful effects in humans, little has been known about the possible biological effects at low doses, for example, below 0.1 Gy, which is the dose range relevant to most radiation exposures of concern today. A conventional approach to radiation dose-response estimation based on simple parametric forms, such as the linear nonthreshold model, can be misleading in evaluating the risk and, in particular, its uncertainty at low doses. As an alternative approach, we consider a Bayesian semiparametric model that has a connected piece-wise-linear dose-response function with prior distributions having an autoregressive structure among the random slope coefficients defined over closely spaced dose categories. With a simulation study and application to analysis of cancer incidence data among Japanese atomic bomb survivors, we show that this approach can produce smooth and flexible dose-response estimation while reasonably handling the risk uncertainty at low doses and elsewhere. With relatively few assumptions and modeling options to be made by the analyst, the method can be particularly useful in assessing risks associated with low-dose radiation exposures. PMID:26581473
Ball, R D
2001-01-01
We describe an approximate method for the analysis of quantitative trait loci (QTL) based on model selection from multiple regression models with trait values regressed on marker genotypes, using a modification of the easily calculated Bayesian information criterion to estimate the posterior probability of models with various subsets of markers as variables. The BIC-delta criterion, with the parameter delta increasing the penalty for additional variables in a model, is further modified to incorporate prior information, and missing values are handled by multiple imputation. Marginal probabilities for model sizes are calculated, and the posterior probability of nonzero model size is interpreted as the posterior probability of existence of a QTL linked to one or more markers. The method is demonstrated on analysis of associations between wood density and markers on two linkage groups in Pinus radiata. Selection bias, which is the bias that results from using the same data to both select the variables in a model and estimate the coefficients, is shown to be a problem for commonly used non-Bayesian methods for QTL mapping, which do not average over alternative possible models that are consistent with the data. PMID:11729175
NASA Astrophysics Data System (ADS)
Bellin, A.; Firmani, G.; Fiori, A.
2005-12-01
We analyze, by means of a numerical model, flow toward a pumping well in a confined three-dimensional heterogeneous aquifer. In order to model hydraulic property variations and the associated uncertainty the logconductivity field Y=ln K, where K is the hydraulic conductivity, is modelled as a stationary Random Space Function (RSF), normally distributed with constant mean and variance, σ_Y2, and an exponential axisymmetric covariance function, which identifies the geostatistical model of variability. First, we analyze how the boundary condition at the pumping (extraction) well influences the flow field. Specifically, we show that a specific water discharge through the well's envelope proportional to the local hydraulic conductivity is the condition that better approximates the flow field obtained by imposing a constant head along the well. The latter is the condition that better represents the experimental setup typically employed in pumping tests. Another result of our analysis is that the difference between the drawdown at a fully penetrating monitoring well and the ergodic solution provided by Indelman et al. (1996), which coincides with the Thiem's solution, reduces as the depth of the aquifer increases, becoming negligible as the depth grows larger than 60 vertical integral scales of the hydraulic logconductivity. With these results in mind we envision a simply to apply procedure for obtaining the parameters of the geostatistical model of spatial variability. The procedure is based on fitting the expression of the equivalent hydraulic conductivity proposed by Indelman et al. (1996) to the experimental values obtained by interpreting with the Thiem's solution the measured drawdown at a few wells . If the vertical integral scale is known independently from the pumping test the fitting procedure leads to a robust calculation of the parameters, although the horizontal integral scale is adversely affected by a wide confidence interval.
A Flexible Bayesian Model for Testing for Transmission Ratio Distortion
Casellas, Joaquim; Manunza, Arianna; Mercader, Anna; Quintanilla, Raquel; Amills, Marcel
2014-01-01
Current statistical approaches to investigate the nature and magnitude of transmission ratio distortion (TRD) are scarce and restricted to the most common experimental designs such as F2 populations and backcrosses. In this article, we describe a new Bayesian approach to check TRD within a given biallelic genetic marker in a diploid species, providing a highly flexible framework that can accommodate any kind of population structure. This model relies on the genotype of each offspring and thus integrates all available information from either the parents’ genotypes or population-specific allele frequencies and yields TRD estimates that can be corroborated by the calculation of a Bayes factor (BF). This approach has been evaluated on simulated data sets with appealing statistical performance. As a proof of concept, we have also tested TRD in a porcine population with five half-sib families and 352 offspring. All boars and piglets were genotyped with the Porcine SNP60 BeadChip, whereas genotypes from the sows were not available. The SNP-by-SNP screening of the pig genome revealed 84 SNPs with decisive evidences of TRD (BF > 100) after accounting for multiple testing. Many of these regions contained genes related to biological processes (e.g., nucleosome assembly and co-organization, DNA conformation and packaging, and DNA complex assembly) that are critically associated with embryonic viability. The implementation of this method, which overcomes many of the limitations of previous approaches, should contribute to fostering research on TRD in both model and nonmodel organisms. PMID:25271302
A Bayesian model of context-sensitive value attribution
Rigoli, Francesco; Friston, Karl J; Martinelli, Cristina; Selaković, Mirjana; Shergill, Sukhwinder S; Dolan, Raymond J
2016-01-01
Substantial evidence indicates that incentive value depends on an anticipation of rewards within a given context. However, the computations underlying this context sensitivity remain unknown. To address this question, we introduce a normative (Bayesian) account of how rewards map to incentive values. This assumes that the brain inverts a model of how rewards are generated. Key features of our account include (i) an influence of prior beliefs about the context in which rewards are delivered (weighted by their reliability in a Bayes-optimal fashion), (ii) the notion that incentive values correspond to precision-weighted prediction errors, (iii) and contextual information unfolding at different hierarchical levels. This formulation implies that incentive value is intrinsically context-dependent. We provide empirical support for this model by showing that incentive value is influenced by context variability and by hierarchically nested contexts. The perspective we introduce generates new empirical predictions that might help explaining psychopathologies, such as addiction. DOI: http://dx.doi.org/10.7554/eLife.16127.001 PMID:27328323
Point source moment tensor inversion through a Bayesian hierarchical model
NASA Astrophysics Data System (ADS)
Mustać, Marija; Tkalčić, Hrvoje
2016-01-01
Characterization of seismic sources is an important aspect of seismology. Parameter uncertainties in such inversions are essential for estimating solution robustness, but are rarely available. We have developed a non-linear moment tensor inversion method in a probabilistic Bayesian framework that also accounts for noise in the data. The method is designed for point source inversion using waveform data of moderate-size earthquakes and explosions at regional distances. This probabilistic approach results in an ensemble of models, whose density is proportional to parameter probability distribution and quantifies parameter uncertainties. Furthermore, we invert for noise in the data, allowing it to determine the model complexity. We implement an empirical noise covariance matrix that accounts for interdependence of observational errors present in waveform data. After we demonstrate the feasibility of the approach on synthetic data, we apply it to a Long Valley Caldera, CA, earthquake with a well-documented anomalous (non-double-couple) radiation from previous studies. We confirm a statistically significant isotropic component in the source without a trade-off with the compensated linear vector dipoles component.
Bayesian mixture models for source separation in MEG
NASA Astrophysics Data System (ADS)
Calvetti, Daniela; Homa, Laura; Somersalo, Erkki
2011-11-01
This paper discusses the problem of imaging electromagnetic brain activity from measurements of the induced magnetic field outside the head. This imaging modality, magnetoencephalography (MEG), is known to be severely ill posed, and in order to obtain useful estimates for the activity map, complementary information needs to be used to regularize the problem. In this paper, a particular emphasis is on finding non-superficial focal sources that induce a magnetic field that may be confused with noise due to external sources and with distributed brain noise. The data are assumed to come from a mixture of a focal source and a spatially distributed possibly virtual source; hence, to differentiate between those two components, the problem is solved within a Bayesian framework, with a mixture model prior encoding the information that different sources may be concurrently active. The mixture model prior combines one density that favors strongly focal sources and another that favors spatially distributed sources, interpreted as clutter in the source estimation. Furthermore, to address the challenge of localizing deep focal sources, a novel depth sounding algorithm is suggested, and it is shown with simulated data that the method is able to distinguish between a signal arising from a deep focal source and a clutter signal.
Bayesian image reconstruction - The pixon and optimal image modeling
NASA Technical Reports Server (NTRS)
Pina, R. K.; Puetter, R. C.
1993-01-01
In this paper we describe the optimal image model, maximum residual likelihood method (OptMRL) for image reconstruction. OptMRL is a Bayesian image reconstruction technique for removing point-spread function blurring. OptMRL uses both a goodness-of-fit criterion (GOF) and an 'image prior', i.e., a function which quantifies the a priori probability of the image. Unlike standard maximum entropy methods, which typically reconstruct the image on the data pixel grid, OptMRL varies the image model in order to find the optimal functional basis with which to represent the image. We show how an optimal basis for image representation can be selected and in doing so, develop the concept of the 'pixon' which is a generalized image cell from which this basis is constructed. By allowing both the image and the image representation to be variable, the OptMRL method greatly increases the volume of solution space over which the image is optimized. Hence the likelihood of the final reconstructed image is greatly increased. For the goodness-of-fit criterion, OptMRL uses the maximum residual likelihood probability distribution introduced previously by Pina and Puetter (1992). This GOF probability distribution, which is based on the spatial autocorrelation of the residuals, has the advantage that it ensures spatially uncorrelated image reconstruction residuals.
Emulation Modeling with Bayesian Networks for Efficient Decision Support
NASA Astrophysics Data System (ADS)
Fienen, M. N.; Masterson, J.; Plant, N. G.; Gutierrez, B. T.; Thieler, E. R.
2012-12-01
Bayesian decision networks (BDN) have long been used to provide decision support in systems that require explicit consideration of uncertainty; applications range from ecology to medical diagnostics and terrorism threat assessments. Until recently, however, few studies have applied BDNs to the study of groundwater systems. BDNs are particularly useful for representing real-world system variability by synthesizing a range of hydrogeologic situations within a single simulation. Because BDN output is cast in terms of probability—an output desired by decision makers—they explicitly incorporate the uncertainty of a system. BDNs can thus serve as a more efficient alternative to other uncertainty characterization methods such as computationally demanding Monte Carlo analyses and others methods restricted to linear model analyses. We present a unique application of a BDN to a groundwater modeling analysis of the hydrologic response of Assateague Island, Maryland to sea-level rise. Using both input and output variables of the modeled groundwater response to different sea-level (SLR) rise scenarios, the BDN predicts the probability of changes in the depth to fresh water, which exerts an important influence on physical and biological island evolution. Input variables included barrier-island width, maximum island elevation, and aquifer recharge. The variability of these inputs and their corresponding outputs are sampled along cross sections in a single model run to form an ensemble of input/output pairs. The BDN outputs, which are the posterior distributions of water table conditions for the sea-level rise scenarios, are evaluated through error analysis and cross-validation to assess both fit to training data and predictive power. The key benefit for using BDNs in groundwater modeling analyses is that they provide a method for distilling complex model results into predictions with associated uncertainty, which is useful to decision makers. Future efforts incorporate
Modeling hypoxia in the Chesapeake Bay: Ensemble estimation using a Bayesian hierarchical model
NASA Astrophysics Data System (ADS)
Stow, Craig A.; Scavia, Donald
2009-02-01
Quantifying parameter and prediction uncertainty in a rigorous framework can be an important component of model skill assessment. Generally, models with lower uncertainty will be more useful for prediction and inference than models with higher uncertainty. Ensemble estimation, an idea with deep roots in the Bayesian literature, can be useful to reduce model uncertainty. It is based on the idea that simultaneously estimating common or similar parameters among models can result in more precise estimates. We demonstrate this approach using the Streeter-Phelps dissolved oxygen sag model fit to 29 years of data from Chesapeake Bay. Chesapeake Bay has a long history of bottom water hypoxia and several models are being used to assist management decision-making in this system. The Bayesian framework is particularly useful in a decision context because it can combine both expert-judgment and rigorous parameter estimation to yield model forecasts and a probabilistic estimate of the forecast uncertainty.
Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim
2014-02-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.
Comparing models for perfluorooctanoic acid pharmacokinetics using Bayesian analysis.
Wambaugh, John F; Barton, Hugh A; Setzer, R Woodrow
2008-12-01
Selecting the appropriate pharmacokinetic (PK) model given the available data is investigated for perfluorooctanoic acid (PFOA), which has been widely analyzed with an empirical, one-compartment model. This research examined the results of experiments [Kemper R. A., DuPont Haskell Laboratories, USEPA Administrative Record AR-226.1499 (2003)] that administered single oral or iv doses of PFOA to adult male and female rats. PFOA concentration was observed over time; in plasma for some animals and in fecal and urinary excretion for others. There were four rats per dose group, for a total of 36 males and 36 females. Assuming that the PK parameters for each individual within a gender were drawn from the same, biologically varying population, plasma and excretion data were jointly analyzed using a hierarchical framework to separate uncertainty due to measurement error from actual biological variability. Bayesian analysis using Markov Chain Monte Carlo (MCMC) provides tools to perform such an analysis as well as quantitative diagnostics to evaluate and discriminate between models. Starting from a one-compartment PK model with separate clearances to urine and feces, the model was incrementally expanded using Bayesian measures to assess if the expansion was supported by the data. PFOA excretion is sexually dimorphic in rats; male rats have bi-phasic elimination that is roughly 40 times slower than that of the females, which appear to have a single elimination phase. The male and female data were analyzed separately, keeping only the parameters describing the measurement process in common. For male rats, including excretion data initially decreased certainty in the one-compartment parameter estimates compared to an analysis using plasma data only. Allowing a third, unspecified clearance improved agreement and increased certainty when all the data was used, however a significant amount of eliminated PFOA was estimated to be missing from the excretion data. Adding an additional
Bayesian Safety Risk Modeling of Human-Flightdeck Automation Interaction
NASA Technical Reports Server (NTRS)
Ancel, Ersin; Shih, Ann T.
2015-01-01
Usage of automatic systems in airliners has increased fuel efficiency, added extra capabilities, enhanced safety and reliability, as well as provide improved passenger comfort since its introduction in the late 80's. However, original automation benefits, including reduced flight crew workload, human errors or training requirements, were not achieved as originally expected. Instead, automation introduced new failure modes, redistributed, and sometimes increased workload, brought in new cognitive and attention demands, and increased training requirements. Modern airliners have numerous flight modes, providing more flexibility (and inherently more complexity) to the flight crew. However, the price to pay for the increased flexibility is the need for increased mode awareness, as well as the need to supervise, understand, and predict automated system behavior. Also, over-reliance on automation is linked to manual flight skill degradation and complacency in commercial pilots. As a result, recent accidents involving human errors are often caused by the interactions between humans and the automated systems (e.g., the breakdown in man-machine coordination), deteriorated manual flying skills, and/or loss of situational awareness due to heavy dependence on automated systems. This paper describes the development of the increased complexity and reliance on automation baseline model, named FLAP for FLightdeck Automation Problems. The model development process starts with a comprehensive literature review followed by the construction of a framework comprised of high-level causal factors leading to an automation-related flight anomaly. The framework was then converted into a Bayesian Belief Network (BBN) using the Hugin Software v7.8. The effects of automation on flight crew are incorporated into the model, including flight skill degradation, increased cognitive demand and training requirements along with their interactions. Besides flight crew deficiencies, automation system
A Bayesian network model for biomarker-based dose response.
Hack, C Eric; Haber, Lynne T; Maier, Andrew; Shulte, Paul; Fowler, Bruce; Lotz, W Gregory; Savage, Russell E
2010-07-01
A Bayesian network model was developed to integrate diverse types of data to conduct an exposure-dose-response assessment for benzene-induced acute myeloid leukemia (AML). The network approach was used to evaluate and compare individual biomarkers and quantitatively link the biomarkers along the exposure-disease continuum. The network was used to perform the biomarker-based dose-response analysis, and various other approaches to the dose-response analysis were conducted for comparison. The network-derived benchmark concentration was approximately an order of magnitude lower than that from the usual exposure concentration versus response approach, which suggests that the presence of more information in the low-dose region (where changes in biomarkers are detectable but effects on AML mortality are not) helps inform the description of the AML response at lower exposures. This work provides a quantitative approach for linking changes in biomarkers of effect both to exposure information and to changes in disease response. Such linkage can provide a scientifically valid point of departure that incorporates precursor dose-response information without being dependent on the difficult issue of a definition of adversity for precursors.
Using Bayesian statistical methods to quantify uncertainty and variability in human PBPK model predictions for use in risk assessments requires prior distributions (priors), which characterize what is known or believed about parameters’ values before observing in vivo data. Expe...
Using Bayesian statistical methods to quantify uncertainty and variability in human physiologically-based pharmacokinetic (PBPK) model predictions for use in risk assessments requires prior distributions (priors), which characterize what is known or believed about parameters’ val...
Lee, Sik-Yum; Song, Xin-Yuan
2004-05-01
Missing data are very common in behavioural and psychological research. In this paper, we develop a Bayesian approach in the context of a general nonlinear structural equation model with missing continuous and ordinal categorical data. In the development, the missing data are treated as latent quantities, and provision for the incompleteness of the data is made by a hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm. We show by means of a simulation study that the Bayesian estimates are accurate. A Bayesian model comparison procedure based on the Bayes factor and path sampling is proposed. The required observations from the posterior distribution for computing the Bayes factor are simulated by the hybrid algorithm in Bayesian estimation. Our simulation results indicate that the correct model is selected more frequently when the incomplete records are used in the analysis than when they are ignored. The methodology is further illustrated with a real data set from a study concerned with an AIDS preventative intervention for Filipina sex workers.
Bayesian Model Averaging of Artificial Intelligence Models for Hydraulic Conductivity Estimation
NASA Astrophysics Data System (ADS)
Nadiri, A.; Chitsazan, N.; Tsai, F. T.; Asghari Moghaddam, A.
2012-12-01
This research presents a Bayesian artificial intelligence model averaging (BAIMA) method that incorporates multiple artificial intelligence (AI) models to estimate hydraulic conductivity and evaluate estimation uncertainties. Uncertainty in the AI model outputs stems from error in model input as well as non-uniqueness in selecting different AI methods. Using one single AI model tends to bias the estimation and underestimate uncertainty. BAIMA employs Bayesian model averaging (BMA) technique to address the issue of using one single AI model for estimation. BAIMA estimates hydraulic conductivity by averaging the outputs of AI models according to their model weights. In this study, the model weights were determined using the Bayesian information criterion (BIC) that follows the parsimony principle. BAIMA calculates the within-model variances to account for uncertainty propagation from input data to AI model output. Between-model variances are evaluated to account for uncertainty due to model non-uniqueness. We employed Takagi-Sugeno fuzzy logic (TS-FL), artificial neural network (ANN) and neurofuzzy (NF) to estimate hydraulic conductivity for the Tasuj plain aquifer, Iran. BAIMA combined three AI models and produced better fitting than individual models. While NF was expected to be the best AI model owing to its utilization of both TS-FL and ANN models, the NF model is nearly discarded by the parsimony principle. The TS-FL model and the ANN model showed equal importance although their hydraulic conductivity estimates were quite different. This resulted in significant between-model variances that are normally ignored by using one AI model.
Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements
Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Datta, Susmita; Payne, Samuel H.; Kang, Jiyun; Bramer, Lisa M.; Nicora, Carrie D.; Shukla, Anil K.; Metz, Thomas O.; Rodland, Karin D.; Smith, Richard D.; Tardiff, Mark F.; McDermott, Jason E.; Pounds, Joel G.; Waters, Katrina M.
2014-12-01
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.
Bayesian parameter inference and model selection by population annealing in systems biology.
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.
A Bayesian model of stereopsis depth and motion direction discrimination.
Read, J C A
2002-02-01
The extraction of stereoscopic depth from retinal disparity, and motion direction from two-frame kinematograms, requires the solution of a correspondence problem. In previous psychophysical work [Read and Eagle (2000) Vision Res 40: 3345-3358], we compared the performance of the human stereopsis and motion systems with correlated and anti-correlated stimuli. We found that, although the two systems performed similarly for narrow-band stimuli, broadband anti-correlated kinematograms produced a strong perception of reversed motion, whereas the stereograms appeared merely rivalrous. I now model these psychophysical data with a computational model of the correspondence problem based on the known properties of visual cortical cells. Noisy retinal images are filtered through a set of Fourier channels tuned to different spatial frequencies and orientations. Within each channel, a Bayesian analysis incorporating a prior preference for small disparities is used to assess the probability of each possible match. Finally, information from the different channels is combined to arrive at a judgement of stimulus disparity. Each model system--stereopsis and motion--has two free parameters: the amount of noise they are subject to, and the strength of their preference for small disparities. By adjusting these parameters independently for each system, qualitative matches are produced to psychophysical data, for both correlated and anti-correlated stimuli, across a range of spatial frequency and orientation bandwidths. The motion model is found to require much higher noise levels and a weaker preference for small disparities. This makes the motion model more tolerant of poor-quality reverse-direction false matches encountered with anti-correlated stimuli, matching the strong perception of reversed motion that humans experience with these stimuli. In contrast, the lower noise level and tighter prior preference used with the stereopsis model means that it performs close to chance with
Ice Shelf Modeling: A Cross-Polar Bayesian Statistical Approach
NASA Astrophysics Data System (ADS)
Kirchner, N.; Furrer, R.; Jakobsson, M.; Zwally, H. J.
2010-12-01
Ice streams interlink glacial terrestrial and marine environments: embedded in a grounded inland ice such as the Antarctic Ice Sheet or the paleo ice sheets covering extensive parts of the Eurasian and Amerasian Arctic respectively, ice streams are major drainage agents facilitating the discharge of substantial portions of continental ice into the ocean. At their seaward side, ice streams can either extend onto the ocean as floating ice tongues (such as the Drygalsky Ice Tongue/East Antarctica), or feed large ice shelves (as is the case for e.g. the Siple Coast and the Ross Ice Shelf/West Antarctica). The flow behavior of ice streams has been recognized to be intimately linked with configurational changes in their attached ice shelves; in particular, ice shelf disintegration is associated with rapid ice stream retreat and increased mass discharge from the continental ice mass, contributing eventually to sea level rise. Investigations of ice stream retreat mechanism are however incomplete if based on terrestrial records only: rather, the dynamics of ice shelves (and, eventually, the impact of the ocean on the latter) must be accounted for. However, since floating ice shelves leave hardly any traces behind when melting, uncertainty regarding the spatio-temporal distribution and evolution of ice shelves in times prior to instrumented and recorded observation is high, calling thus for a statistical modeling approach. Complementing ongoing large-scale numerical modeling efforts (Pollard & DeConto, 2009), we model the configuration of ice shelves by using a Bayesian Hiearchial Modeling (BHM) approach. We adopt a cross-polar perspective accounting for the fact that currently, ice shelves exist mainly along the coastline of Antarctica (and are virtually non-existing in the Arctic), while Arctic Ocean ice shelves repeatedly impacted the Arctic ocean basin during former glacial periods. Modeled Arctic ocean ice shelf configurations are compared with geological spatial
Inherently irrational? A computational model of escalation of commitment as Bayesian Updating.
Gilroy, Shawn P; Hantula, Donald A
2016-06-01
Monte Carlo simulations were performed to analyze the degree to which two-, three- and four-step learning histories of losses and gains correlated with escalation and persistence in extended extinction (continuous loss) conditions. Simulated learning histories were randomly generated at varying lengths and compositions and warranted probabilities were determined using Bayesian Updating methods. Bayesian Updating predicted instances where particular learning sequences were more likely to engender escalation and persistence under extinction conditions. All simulations revealed greater rates of escalation and persistence in the presence of heterogeneous (e.g., both Wins and Losses) lag sequences, with substantially increased rates of escalation when lags comprised predominantly of losses were followed by wins. These methods were then applied to human investment choices in earlier experiments. The Bayesian Updating models corresponded with data obtained from these experiments. These findings suggest that Bayesian Updating can be utilized as a model for understanding how and when individual commitment may escalate and persist despite continued failures.
Hierarchical Bayesian Model Averaging for Chance Constrained Remediation Designs
NASA Astrophysics Data System (ADS)
Chitsazan, N.; Tsai, F. T.
2012-12-01
Groundwater remediation designs are heavily relying on simulation models which are subjected to various sources of uncertainty in their predictions. To develop a robust remediation design, it is crucial to understand the effect of uncertainty sources. In this research, we introduce a hierarchical Bayesian model averaging (HBMA) framework to segregate and prioritize sources of uncertainty in a multi-layer frame, where each layer targets a source of uncertainty. The HBMA framework provides an insight to uncertainty priorities and propagation. In addition, HBMA allows evaluating model weights in different hierarchy levels and assessing the relative importance of models in each level. To account for uncertainty, we employ a chance constrained (CC) programming for stochastic remediation design. Chance constrained programming was implemented traditionally to account for parameter uncertainty. Recently, many studies suggested that model structure uncertainty is not negligible compared to parameter uncertainty. Using chance constrained programming along with HBMA can provide a rigorous tool for groundwater remediation designs under uncertainty. In this research, the HBMA-CC was applied to a remediation design in a synthetic aquifer. The design was to develop a scavenger well approach to mitigate saltwater intrusion toward production wells. HBMA was employed to assess uncertainties from model structure, parameter estimation and kriging interpolation. An improved harmony search optimization method was used to find the optimal location of the scavenger well. We evaluated prediction variances of chloride concentration at the production wells through the HBMA framework. The results showed that choosing the single best model may lead to a significant error in evaluating prediction variances for two reasons. First, considering the single best model, variances that stem from uncertainty in the model structure will be ignored. Second, considering the best model with non
Bayesian estimation of regularization parameters for deformable surface models
Cunningham, G.S.; Lehovich, A.; Hanson, K.M.
1999-02-20
In this article the authors build on their past attempts to reconstruct a 3D, time-varying bolus of radiotracer from first-pass data obtained by the dynamic SPECT imager, FASTSPECT, built by the University of Arizona. The object imaged is a CardioWest total artificial heart. The bolus is entirely contained in one ventricle and its associated inlet and outlet tubes. The model for the radiotracer distribution at a given time is a closed surface parameterized by 482 vertices that are connected to make 960 triangles, with nonuniform intensity variations of radiotracer allowed inside the surface on a voxel-to-voxel basis. The total curvature of the surface is minimized through the use of a weighted prior in the Bayesian framework, as is the weighted norm of the gradient of the voxellated grid. MAP estimates for the vertices, interior intensity voxels and background count level are produced. The strength of the priors, or hyperparameters, are determined by maximizing the probability of the data given the hyperparameters, called the evidence. The evidence is calculated by first assuming that the posterior is approximately normal in the values of the vertices and voxels, and then by evaluating the integral of the multi-dimensional normal distribution. This integral (which requires evaluating the determinant of a covariance matrix) is computed by applying a recent algorithm from Bai et. al. that calculates the needed determinant efficiently. They demonstrate that the radiotracer is highly inhomogeneous in early time frames, as suspected in earlier reconstruction attempts that assumed a uniform intensity of radiotracer within the closed surface, and that the optimal choice of hyperparameters is substantially different for different time frames.
Bayesian Modeling in Institutional Research: An Example of Nonlinear Classification
ERIC Educational Resources Information Center
Xu, Yonghong Jade; Ishitani, Terry T.
2008-01-01
In recent years, rapid advancement has taken place in computing technology that allows institutional researchers to efficiently and effectively address data of increasing volume and structural complexity (Luan, 2002). In this chapter, the authors propose a new data analytical technique, Bayesian belief networks (BBN), to add to the toolbox for…
Model Criticism of Bayesian Networks with Latent Variables.
ERIC Educational Resources Information Center
Williamson, David M.; Mislevy, Robert J.; Almond, Russell G.
This study investigated statistical methods for identifying errors in Bayesian networks (BN) with latent variables, as found in intelligent cognitive assessments. BN, commonly used in artificial intelligence systems, are promising mechanisms for scoring constructed-response examinations. The success of an intelligent assessment or tutoring system…
A Comparison of Imputation Methods for Bayesian Factor Analysis Models
ERIC Educational Resources Information Center
Merkle, Edgar C.
2011-01-01
Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
NASA Astrophysics Data System (ADS)
He, X. L.; Sonnenborg, T. O.; Jørgensen, F.; Jensen, K. H.
2014-08-01
Multiple-point geostatistical simulation (MPS) has recently become popular in stochastic hydrogeology, primarily because of its capability to derive multivariate distributions from a training image (TI). However, its application in three-dimensional (3-D) simulations has been constrained by the difficulty of constructing a 3-D TI. The object-based unconditional simulation program TiGenerator may be a useful tool in this regard; yet the applicability of such parametric training images has not been documented in detail. Another issue in MPS is the integration of multiple geophysical data. The proper way to retrieve and incorporate information from high-resolution geophysical data is still under discussion. In this study, MPS simulation was applied to different scenarios regarding the TI and soft conditioning. By comparing their output from simulations of groundwater flow and probabilistic capture zone, TI from both sources (directly converted from high-resolution geophysical data and generated by TiGenerator) yields comparable results, even for the probabilistic capture zones, which are highly sensitive to the geological architecture. This study also suggests that soft conditioning in MPS is a convenient and efficient way of integrating secondary data such as 3-D airborne electromagnetic data (SkyTEM), but over-conditioning has to be avoided.
Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses
2016-01-01
Bayesian models constructed from structure-derived fingerprints have been a popular and useful method for drug discovery research when applied to bioactivity measurements that can be effectively classified as active or inactive. The results can be used to rank candidate structures according to their probability of activity, and this ranking benefits from the high degree of interpretability when structure-based fingerprints are used, making the results chemically intuitive. Besides selecting an activity threshold, building a Bayesian model is fast and requires few or no parameters or user intervention. The method also does not suffer from such acute overtraining problems as quantitative structure–activity relationships or quantitative structure–property relationships (QSAR/QSPR). This makes it an approach highly suitable for automated workflows that are independent of user expertise or prior knowledge of the training data. We now describe a new method for creating a composite group of Bayesian models to extend the method to work with multiple states, rather than just binary. Incoming activities are divided into bins, each covering a mutually exclusive range of activities. For each of these bins, a Bayesian model is created to model whether or not the compound belongs in the bin. Analyzing putative molecules using the composite model involves making a prediction for each bin and examining the relative likelihood for each assignment, for example, highest value wins. The method has been evaluated on a collection of hundreds of data sets extracted from ChEMBL v20 and validated data sets for ADME/Tox and bioactivity. PMID:26750305
A Bayesian hierarchical diffusion model decomposition of performance in Approach–Avoidance Tasks
Krypotos, Angelos-Miltiadis; Beckers, Tom; Kindt, Merel; Wagenmakers, Eric-Jan
2015-01-01
Common methods for analysing response time (RT) tasks, frequently used across different disciplines of psychology, suffer from a number of limitations such as the failure to directly measure the underlying latent processes of interest and the inability to take into account the uncertainty associated with each individual's point estimate of performance. Here, we discuss a Bayesian hierarchical diffusion model and apply it to RT data. This model allows researchers to decompose performance into meaningful psychological processes and to account optimally for individual differences and commonalities, even with relatively sparse data. We highlight the advantages of the Bayesian hierarchical diffusion model decomposition by applying it to performance on Approach–Avoidance Tasks, widely used in the emotion and psychopathology literature. Model fits for two experimental data-sets demonstrate that the model performs well. The Bayesian hierarchical diffusion model overcomes important limitations of current analysis procedures and provides deeper insight in latent psychological processes of interest. PMID:25491372
Parameterizing Bayesian network Representations of Social-Behavioral Models by Expert Elicitation
Walsh, Stephen J.; Dalton, Angela C.; Whitney, Paul D.; White, Amanda M.
2010-05-23
Bayesian networks provide a general framework with which to model many natural phenomena. The mathematical nature of Bayesian networks enables a plethora of model validation and calibration techniques: e.g parameter estimation, goodness of fit tests, and diagnostic checking of the model assumptions. However, they are not free of shortcomings. Parameter estimation from relevant extant data is a common approach to calibrating the model parameters. In practice it is not uncommon to find oneself lacking adequate data to reliably estimate all model parameters. In this paper we present the early development of a novel application of conjoint analysis as a method for eliciting and modeling expert opinions and using the results in a methodology for calibrating the parameters of a Bayesian network.
Placek, Ben; Knuth, Kevin H.; Angerhausen, Daniel E-mail: kknuth@albany.edu
2014-11-10
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
Number-Knower Levels in Young Children: Insights from Bayesian Modeling
ERIC Educational Resources Information Center
Lee, Michael D.; Sarnecka, Barbara W.
2011-01-01
Lee and Sarnecka (2010) developed a Bayesian model of young children's behavior on the Give-N test of number knowledge. This paper presents two new extensions of the model, and applies the model to new data. In the first extension, the model is used to evaluate competing theories about the conceptual knowledge underlying children's behavior. One,…
NASA Astrophysics Data System (ADS)
Scradeanu, D.; Pagnejer, M.
2012-04-01
The purpose of the works is to evaluate the uncertainty of the hydrodynamic model for a multilayered geological structure, a potential trap for carbon dioxide storage. The hydrodynamic model is based on a conceptual model of the multilayered hydrostructure with three components: 1) spatial model; 2) parametric model and 3) energy model. The necessary data to achieve the three components of the conceptual model are obtained from: 240 boreholes explored by geophysical logging and seismic investigation, for the first two components, and an experimental water injection test for the last one. The hydrodinamic model is a finite difference numerical model based on a 3D stratigraphic model with nine stratigraphic units (Badenian and Oligocene) and a 3D multiparameter model (porosity, permeability, hydraulic conductivity, storage coefficient, leakage etc.). The uncertainty of the two 3D models was evaluated using multivariate geostatistical tools: a)cross-semivariogram for structural analysis, especially the study of anisotropy and b)cokriging to reduce estimation variances in a specific situation where is a cross-correlation between a variable and one or more variables that are undersampled. It has been identified important differences between univariate and bivariate anisotropy. The minimised uncertainty of the parametric model (by cokriging) was transferred to hydrodynamic model. The uncertainty distribution of the pressures generated by the water injection test has been additional filtered by the sensitivity of the numerical model. The obtained relative errors of the pressure distribution in the hydrodynamic model are 15-20%. The scientific research was performed in the frame of the European FP7 project "A multiple space and time scale approach for the quantification of deep saline formation for CO2 storage(MUSTANG)".
A Bayesian approach to model structural error and input variability in groundwater modeling
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Lin, Y. F. F.; Liang, F.
2015-12-01
Effective water resource management typically relies on numerical models to analyze groundwater flow and solute transport processes. Model structural error (due to simplification and/or misrepresentation of the "true" environmental system) and input forcing variability (which commonly arises since some inputs are uncontrolled or estimated with high uncertainty) are ubiquitous in groundwater models. Calibration that overlooks errors in model structure and input data can lead to biased parameter estimates and compromised predictions. We present a fully Bayesian approach for a complete assessment of uncertainty for spatially distributed groundwater models. The approach explicitly recognizes stochastic input and uses data-driven error models based on nonparametric kernel methods to account for model structural error. We employ exploratory data analysis to assist in specifying informative prior for error models to improve identifiability. The inference is facilitated by an efficient sampling algorithm based on DREAM-ZS and a parameter subspace multiple-try strategy to reduce the required number of forward simulations of the groundwater model. We demonstrate the Bayesian approach through a synthetic case study of surface-ground water interaction under changing pumping conditions. It is found that explicit treatment of errors in model structure and input data (groundwater pumping rate) has substantial impact on the posterior distribution of groundwater model parameters. Using error models reduces predictive bias caused by parameter compensation. In addition, input variability increases parametric and predictive uncertainty. The Bayesian approach allows for a comparison among the contributions from various error sources, which could inform future model improvement and data collection efforts on how to best direct resources towards reducing predictive uncertainty.
Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno
2016-01-01
Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision.
Boos, Moritz; Seer, Caroline; Lange, Florian; Kopp, Bruno
2016-01-01
Cognitive determinants of probabilistic inference were examined using hierarchical Bayesian modeling techniques. A classic urn-ball paradigm served as experimental strategy, involving a factorial two (prior probabilities) by two (likelihoods) design. Five computational models of cognitive processes were compared with the observed behavior. Parameter-free Bayesian posterior probabilities and parameter-free base rate neglect provided inadequate models of probabilistic inference. The introduction of distorted subjective probabilities yielded more robust and generalizable results. A general class of (inverted) S-shaped probability weighting functions had been proposed; however, the possibility of large differences in probability distortions not only across experimental conditions, but also across individuals, seems critical for the model's success. It also seems advantageous to consider individual differences in parameters of probability weighting as being sampled from weakly informative prior distributions of individual parameter values. Thus, the results from hierarchical Bayesian modeling converge with previous results in revealing that probability weighting parameters show considerable task dependency and individual differences. Methodologically, this work exemplifies the usefulness of hierarchical Bayesian modeling techniques for cognitive psychology. Theoretically, human probabilistic inference might be best described as the application of individualized strategic policies for Bayesian belief revision. PMID:27303323
On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings
Johnson, Valen E.
2014-01-01
This article examines the convergence properties of a Bayesian model selection procedure based on a non-local prior density in ultrahigh-dimensional settings. The performance of the model selection procedure is also compared to popular penalized likelihood methods. Coupling diagnostics are used to bound the total variation distance between iterates in an Markov chain Monte Carlo (MCMC) algorithm and the posterior distribution on the model space. In several simulation scenarios in which the number of observations exceeds 100, rapid convergence and high accuracy of the Bayesian procedure is demonstrated. Conversely, the coupling diagnostics are successful in diagnosing lack of convergence in several scenarios for which the number of observations is less than 100. The accuracy of the Bayesian model selection procedure in identifying high probability models is shown to be comparable to commonly used penalized likelihood methods, including extensions of smoothly clipped absolute deviations (SCAD) and least absolute shrinkage and selection operator (LASSO) procedures. PMID:24683431
Bayesian shared frailty models for regional inference about wildlife survival
Heisey, D.M.
2012-01-01
One can joke that 'exciting statistics' is an oxymoron, but it is neither a joke nor an exaggeration to say that these are exciting times to be involved in statistical ecology. As Halstead et al.'s (2012) paper nicely exemplifies, recently developed Bayesian analyses can now be used to extract insights from data using techniques that would have been unavailable to the ecological researcher just a decade ago. Some object to this, implying that the subjective priors of the Bayesian approach is the pathway to perdition (e.g. Lele & Dennis, 2009). It is reasonable to ask whether these new approaches are really giving us anything that we could not obtain with traditional tried-and-true frequentist approaches. I believe the answer is a clear yes.
The establishment of Bayesian Coronary Artery Disease Prediction model.
Chu, Chi-Ming; Tscai, Hui-Jen; Chu, Nian-Feng; Pai, Lu; Wetter, Thomas; Sun, Cien-An; Lin, Jin-Ding; Yang, Tsan; Pai, Cien-Yu; Bludau, Hans-Bernd
2005-01-01
This poster will demonstrate how we build up the module of Bayesian Coronary Artery Disease Predicting Evidence-Based Medicine. The system-module may help the young professional understand the effect of factors for referring patients to take the invasive examination of Angiographic.Moreover, the non-invasive information-tech also can perform as the screening tool on a clinical or a community-based epidemiology.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-01-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible. PMID:25745272
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-12-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.
Karacan, C. Özgen; Olea, Ricardo A.; Goodman, Gerrit
2015-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control. This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines. Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may
Karacan, C.O.; Olea, R.A.; Goodman, G.
2012-01-01
Determination of the size of the gas emission zone, the locations of gas sources within, and especially the amount of gas retained in those zones is one of the most important steps for designing a successful methane control strategy and an efficient ventilation system in longwall coal mining. The formation of the gas emission zone and the potential amount of gas-in-place (GIP) that might be available for migration into a mine are factors of local geology and rock properties that usually show spatial variability in continuity and may also show geometric anisotropy. Geostatistical methods are used here for modeling and prediction of gas amounts and for assessing their associated uncertainty in gas emission zones of longwall mines for methane control.This study used core data obtained from 276 vertical exploration boreholes drilled from the surface to the bottom of the Pittsburgh coal seam in a mining district in the Northern Appalachian basin. After identifying important coal and non-coal layers for the gas emission zone, univariate statistical and semivariogram analyses were conducted for data from different formations to define the distribution and continuity of various attributes. Sequential simulations performed stochastic assessment of these attributes, such as gas content, strata thickness, and strata displacement. These analyses were followed by calculations of gas-in-place and their uncertainties in the Pittsburgh seam caved zone and fractured zone of longwall mines in this mining district. Grid blanking was used to isolate the volume over the actual panels from the entire modeled district and to calculate gas amounts that were directly related to the emissions in longwall mines.Results indicated that gas-in-place in the Pittsburgh seam, in the caved zone and in the fractured zone, as well as displacements in major rock units, showed spatial correlations that could be modeled and estimated using geostatistical methods. This study showed that GIP volumes may
Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective
Barker, Richard J.; Link, William A.
2015-01-01
Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.
Bayesian Comparison of Alternative Graded Response Models for Performance Assessment Applications
ERIC Educational Resources Information Center
Zhu, Xiaowen; Stone, Clement A.
2012-01-01
This study examined the relative effectiveness of Bayesian model comparison methods in selecting an appropriate graded response (GR) model for performance assessment applications. Three popular methods were considered: deviance information criterion (DIC), conditional predictive ordinate (CPO), and posterior predictive model checking (PPMC). Using…
Bayesian model selection for a finite element model of a large civil aircraft
Hemez, F. M.; Rutherford, A. C.
2004-01-01
Nine aircraft stiffness parameters have been varied and used as inputs to a finite element model of an aircraft to generate natural frequency and deflection features (Goge, 2003). This data set (147 input parameter configurations and associated outputs) is now used to generate a metamodel, or a fast running surrogate model, using Bayesian model selection methods. Once a forward relationship is defined, the metamodel may be used in an inverse sense. That is, knowing the measured output frequencies and deflections, what were the input stiffness parameters that caused them?
D. L. Kelly
2007-06-01
Markov chain Monte Carlo (MCMC) techniques represent an extremely flexible and powerful approach to Bayesian modeling. This work illustrates the application of such techniques to time-dependent reliability of components with repair. The WinBUGS package is used to illustrate, via examples, how Bayesian techniques can be used for parametric statistical modeling of time-dependent component reliability. Additionally, the crucial, but often overlooked subject of model validation is discussed, and summary statistics for judging the model’s ability to replicate the observed data are developed, based on the posterior predictive distribution for the parameters of interest.
Bayesian model selection applied to artificial neural networks used for water resources modeling
NASA Astrophysics Data System (ADS)
Kingston, Greer B.; Maier, Holger R.; Lambert, Martin F.
2008-04-01
Artificial neural networks (ANNs) have proven to be extremely valuable tools in the field of water resources engineering. However, one of the most difficult tasks in developing an ANN is determining the optimum level of complexity required to model a given problem, as there is no formal systematic model selection method. This paper presents a Bayesian model selection (BMS) method for ANNs that provides an objective approach for comparing models of varying complexity in order to select the most appropriate ANN structure. The approach uses Markov Chain Monte Carlo posterior simulations to estimate the evidence in favor of competing models and, in this study, three known methods for doing this are compared in terms of their suitability for being incorporated into the proposed BMS framework for ANNs. However, it is acknowledged that it can be particularly difficult to accurately estimate the evidence of ANN models. Therefore, the proposed BMS approach for ANNs incorporates a further check of the evidence results by inspecting the marginal posterior distributions of the hidden-to-output layer weights, which unambiguously indicate any redundancies in the hidden layer nodes. The fact that this check is available is one of the greatest advantages of the proposed approach over conventional model selection methods, which do not provide such a test and instead rely on the modeler's subjective choice of selection criterion. The advantages of a total Bayesian approach to ANN development, including training and model selection, are demonstrated on two synthetic and one real world water resources case study.
Tang, An-Min; Tang, Nian-Sheng
2015-02-28
We propose a semiparametric multivariate skew-normal joint model for multivariate longitudinal and multivariate survival data. One main feature of the posited model is that we relax the commonly used normality assumption for random effects and within-subject error by using a centered Dirichlet process prior to specify the random effects distribution and using a multivariate skew-normal distribution to specify the within-subject error distribution and model trajectory functions of longitudinal responses semiparametrically. A Bayesian approach is proposed to simultaneously obtain Bayesian estimates of unknown parameters, random effects and nonparametric functions by combining the Gibbs sampler and the Metropolis-Hastings algorithm. Particularly, a Bayesian local influence approach is developed to assess the effect of minor perturbations to within-subject measurement error and random effects. Several simulation studies and an example are presented to illustrate the proposed methodologies. PMID:25404574
Simplifying Probability Elicitation and Uncertainty Modeling in Bayesian Networks
Paulson, Patrick R; Carroll, Thomas E; Sivaraman, Chitra; Neorr, Peter A; Unwin, Stephen D; Hossain, Shamina S
2011-04-16
In this paper we contribute two methods that simplify the demands of knowledge elicitation for particular types of Bayesian networks. The first method simplify the task of providing probabilities when the states that a random variable takes can be described by a new, fully ordered state set in which a state implies all the preceding states. The second method leverages Dempster-Shafer theory of evidence to provide a way for the expert to express the degree of ignorance that they feel about the estimates being provided.
Model Reduction of a Transient Groundwater-Flow Model for Bayesian Inverse Problems
NASA Astrophysics Data System (ADS)
Boyce, S. E.; Yeh, W. W.
2011-12-01
A Bayesian inverse problem requires many repeated model simulations to characterize an unknown parameter's posterior probability distribution. It is computationally infeasible to solve a Bayesian inverse problem of a discretized groundwater flow model with a high dimension parameter and state space. Model reduction has been shown to reduce the dimension of a groundwater model by several orders of magnitude and is well suited for Bayesian inverse problems. A projection-based model reduction approach is proposed to reduce the parameter and state dimensions of a groundwater model. Previous work has done this by using a greedy algorithm for the selection of parameter vectors that make up a basis and their corresponding steady-state solutions for a state basis. The proposed method extends this idea to include transient models by assembling sequentially though the greedy algorithm the parameter and state projection bases. The method begins with the parameter basis being a single vector that is equal to one or an accepted series of values. A set of state vectors that are solutions to the groundwater model using this parameter vector at appropriate times is called the parameter snapshot set. The appropriate times for the parameter snapshot set are determined by maximizing the set's minimum singular value. This optimization is a similar to those used in experimental design for maximizing information. The two bases are made orthonormal by a QR decomposition and applied to the full groundwater model to form a reduced model. The parameter basis is increased with a new parameter vector that maximizes the error between the full model and the reduced model at a set of observation times. The new parameter vector represents where the reduced model is least accurate in representing the original full model. The corresponding parameter snapshot set's appropriate times are found using a greedy algorithm. This sequentially chooses times that have maximum error between the full and
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes
Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D.
2016-01-01
This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. PMID:26993062
Locating the Optic Nerve in Retinal Images: Comparing Model-Based and Bayesian Decision Methods
Karnowski, Thomas Paul; Tobin Jr, Kenneth William; Muthusamy Govindasamy, Vijaya Priya; Chaum, Edward
2006-01-01
In this work we compare two methods for automatic optic nerve (ON) localization in retinal imagery. The first method uses a Bayesian decision theory is criminator based on four spatial features of the retina imagery. The second method uses a principal component-based reconstruction to model the ON. We report on an improvement to the model-based technique by incorporating linear discriminant analysis and Bayesian decision theory methods. We explore a method to combine both techniques to produce a composite technique with high accuracy and rapid throughput. Results are shown for a data set of 395 images with 2-fold validation testing.
Bayesian conditional-independence modeling of the AIDS epidemic in England and Wales
NASA Astrophysics Data System (ADS)
Gilks, Walter R.; De Angelis, Daniela; Day, Nicholas E.
We describe the use of conditional-independence modeling, Bayesian inference and Markov chain Monte Carlo, to model and project the HIV-AIDS epidemic in homosexual/bisexual males in England and Wales. Complexity in this analysis arises through selectively missing data, indirectly observed underlying processes, and measurement error. Our emphasis is on presentation and discussion of the concepts, not on the technicalities of this analysis, which can be found elsewhere [D. De Angelis, W.R. Gilks, N.E. Day, Bayesian projection of the the acquired immune deficiency syndrome epidemic (with discussion), Applied Statistics, in press].
Reconstructing Constructivism: Causal Models, Bayesian Learning Mechanisms, and the Theory Theory
ERIC Educational Resources Information Center
Gopnik, Alison; Wellman, Henry M.
2012-01-01
We propose a new version of the "theory theory" grounded in the computational framework of probabilistic causal models and Bayesian learning. Probabilistic models allow a constructivist but rigorous and detailed approach to cognitive development. They also explain the learning of both more specific causal hypotheses and more abstract framework…
A Bayesian Multi-Level Factor Analytic Model of Consumer Price Sensitivities across Categories
ERIC Educational Resources Information Center
Duvvuri, Sri Devi; Gruca, Thomas S.
2010-01-01
Identifying price sensitive consumers is an important problem in marketing. We develop a Bayesian multi-level factor analytic model of the covariation among household-level price sensitivities across product categories that are substitutes. Based on a multivariate probit model of category incidence, this framework also allows the researcher to…
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
A Test of Bayesian Observer Models of Processing in the Eriksen Flanker Task
ERIC Educational Resources Information Center
White, Corey N.; Brown, Scott; Ratcliff, Roger
2012-01-01
Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to…
Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?
Sahlin, Ullrika
2015-07-01
A prediction of a chemical property or activity is subject to uncertainty. Which type of uncertainties to consider, whether to account for them in a differentiated manner and with which methods, depends on the practical context. In chemical modelling, general guidance of the assessment of uncertainty is hindered by the high variety in underlying modelling algorithms, high-dimensionality problems, the acknowledgement of both qualitative and quantitative dimensions of uncertainty, and the fact that statistics offers alternative principles for uncertainty quantification. Here, a view of the assessment of uncertainty in predictions is presented with the aim to overcome these issues. The assessment sets out to quantify uncertainty representing error in predictions and is based on probability modelling of errors where uncertainty is measured by Bayesian probabilities. Even though well motivated, the choice to use Bayesian probabilities is a challenge to statistics and chemical modelling. Fully Bayesian modelling, Bayesian meta-modelling and bootstrapping are discussed as possible approaches. Deciding how to assess uncertainty is an active choice, and should not be constrained by traditions or lack of validated and reliable ways of doing it.
NASA Astrophysics Data System (ADS)
Tsai, Frank T.-C.; Li, Xiaobao
2008-09-01
This study proposes a Bayesian model averaging (BMA) method to address parameter estimation uncertainty arising from nonuniqueness in parameterization methods. BMA is able to incorporate multiple parameterization methods for prediction through the law of total probability and to obtain an ensemble average of hydraulic conductivity estimates. Two major issues in applying BMA to hydraulic conductivity estimation are discussed. The first problem is using Occam's window in usual BMA applications to measure approximated posterior model probabilities. Occam's window only accepts models in a very narrow range, tending to single out the best method and discard other good methods. We propose a variance window to replace Occam's window to cope with this problem. The second problem is the Kashyap information criterion (KIC) in the approximated posterior model probabilities, which tends to prefer highly uncertain parameterization methods by considering the Fisher information matrix. With sufficient amounts of observation data, the Bayesian information criterion (BIC) is a good approximation and is able to avoid controversial results from using KIC. This study adopts multiple generalized parameterization (GP) methods such as the BMA models to estimate spatially correlated hydraulic conductivity. Numerical examples illustrate the issues of using KIC and Occam's window and show the advantages of using BIC and the variance window in BMA application. Finally, we apply BMA to the hydraulic conductivity estimation of the "1500-foot" sand in East Baton Rouge Parish, Louisiana.
Automated parameter estimation for biological models using Bayesian statistical model checking
2015-01-01
Background Probabilistic models have gained widespread acceptance in the systems biology community as a useful way to represent complex biological systems. Such models are developed using existing knowledge of the structure and dynamics of the system, experimental observations, and inferences drawn from statistical analysis of empirical data. A key bottleneck in building such models is that some system variables cannot be measured experimentally. These variables are incorporated into the model as numerical parameters. Determining values of these parameters that justify existing experiments and provide reliable predictions when model simulations are performed is a key research problem. Domain experts usually estimate the values of these parameters by fitting the model to experimental data. Model fitting is usually expressed as an optimization problem that requires minimizing a cost-function which measures some notion of distance between the model and the data. This optimization problem is often solved by combining local and global search methods that tend to perform well for the specific application domain. When some prior information about parameters is available, methods such as Bayesian inference are commonly used for parameter learning. Choosing the appropriate parameter search technique requires detailed domain knowledge and insight into the underlying system. Results Using an agent-based model of the dynamics of acute inflammation, we demonstrate a novel parameter estimation algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide that guarantee a set of observed clinical outcomes with high probability. We synthesized values of twenty-eight unknown parameters such that the parameterized model instantiated with these parameter values satisfies four specifications describing the dynamic behavior of the model. Conclusions We have developed a new algorithmic technique for discovering parameters in complex stochastic models of
Geostatistical methods for hazard assessment and site characterization in mining
Riefenberg, J.
1996-12-01
Ground control hazards, coal quality, ore reserve estimation, and pollution modeling seem unrelated topics from most mining perspectives. However, geostatistical methods can be used to characterize each of these, and more topics. Exploratory drill core data, and continued drilling and field measurements, can provide a wealth of information related to each of the above areas and are often severely underutilized. Recent studies have led to the development of the Multiple Parameter Mapping (MPM) technology, which utilizes geostatistics and other numerical modeling methods, to generate a {open_quotes}hazard index{close_quotes} map, often from exploratory drill core data. This mapping has been presented for ground control hazards relating roof quality, floor quality, numerically modelled stresses due to mining geometry, and geologic features. A review of the MPM method, future directions with the MPM, and a discussion of using these and other geostatistical methods to quantify coal quality, ore reserve estimation, and pollutant modeling are presented in this paper.
NASA Astrophysics Data System (ADS)
Pham, Hai V.; Tsai, Frank T.-C.
2015-09-01
The lack of hydrogeological data and knowledge often results in different propositions (or alternatives) to represent uncertain model components and creates many candidate groundwater models using the same data. Uncertainty of groundwater head prediction may become unnecessarily high. This study introduces an experimental design to identify propositions in each uncertain model component and decrease the prediction uncertainty by reducing conceptual model uncertainty. A discrimination criterion is developed based on posterior model probability that directly uses data to evaluate model importance. Bayesian model averaging (BMA) is used to predict future observation data. The experimental design aims to find the optimal number and location of future observations and the number of sampling rounds such that the desired discrimination criterion is met. Hierarchical Bayesian model averaging (HBMA) is adopted to assess if highly probable propositions can be identified and the conceptual model uncertainty can be reduced by the experimental design. The experimental design is implemented to a groundwater study in the Baton Rouge area, Louisiana. We design a new groundwater head observation network based on existing USGS observation wells. The sources of uncertainty that create multiple groundwater models are geological architecture, boundary condition, and fault permeability architecture. All possible design solutions are enumerated using a multi-core supercomputer. Several design solutions are found to achieve an 80%-identifiable groundwater model in 5 years by using six or more existing USGS wells. The HBMA result shows that each highly probable proposition can be identified for each uncertain model component once the discrimination criterion is achieved. The variances of groundwater head predictions are significantly decreased by reducing posterior model probabilities of unimportant propositions.
Bayesian model selection of template forward models for EEG source reconstruction.
Strobbe, Gregor; van Mierlo, Pieter; De Vos, Maarten; Mijović, Bogdan; Hallez, Hans; Van Huffel, Sabine; López, José David; Vandenberghe, Stefaan
2014-06-01
Several EEG source reconstruction techniques have been proposed to identify the generating neuronal sources of electrical activity measured on the scalp. The solution of these techniques depends directly on the accuracy of the forward model that is inverted. Recently, a parametric empirical Bayesian (PEB) framework for distributed source reconstruction in EEG/MEG was introduced and implemented in the Statistical Parametric Mapping (SPM) software. The framework allows us to compare different forward modeling approaches, using real data, instead of using more traditional simulated data from an assumed true forward model. In the absence of a subject specific MR image, a 3-layered boundary element method (BEM) template head model is currently used including a scalp, skull and brain compartment. In this study, we introduced volumetric template head models based on the finite difference method (FDM). We constructed a FDM head model equivalent to the BEM model and an extended FDM model including CSF. These models were compared within the context of three different types of source priors related to the type of inversion used in the PEB framework: independent and identically distributed (IID) sources, equivalent to classical minimum norm approaches, coherence (COH) priors similar to methods such as LORETA, and multiple sparse priors (MSP). The resulting models were compared based on ERP data of 20 subjects using Bayesian model selection for group studies. The reconstructed activity was also compared with the findings of previous studies using functional magnetic resonance imaging. We found very strong evidence in favor of the extended FDM head model with CSF and assuming MSP. These results suggest that the use of realistic volumetric forward models can improve PEB EEG source reconstruction.
Zhong, Buqing; Liang, Tao; Wang, Lingqing; Li, Kexin
2014-08-15
An extensive soil survey was conducted to study pollution sources and delineate contamination of heavy metals in one of the metalliferous industrial bases, in the karst areas of southwest China. A total of 597 topsoil samples were collected and the concentrations of five heavy metals, namely Cd, As (metalloid), Pb, Hg and Cr were analyzed. Stochastic models including a conditional inference tree (CIT) and a finite mixture distribution model (FMDM) were applied to identify the sources and partition the contribution from natural and anthropogenic sources for heavy metal in topsoils of the study area. Regression trees for Cd, As, Pb and Hg were proved to depend mostly on indicators of anthropogenic activities such as industrial type and distance from urban area, while the regression tree for Cr was found to be mainly influenced by the geogenic characteristics. The FMDM analysis showed that the geometric means of modeled background values for Cd, As, Pb, Hg and Cr were close to their background values previously reported in the study area, while the contamination of Cd and Hg were widespread in the study area, imposing potentially detrimental effects on organisms through the food chain. Finally, the probabilities of single and multiple heavy metals exceeding the threshold values derived from the FMDM were estimated using indicator kriging (IK) and multivariate indicator kriging (MVIK). The high probabilities exceeding the thresholds of heavy metals were associated with metalliferous production and atmospheric deposition of heavy metals transported from the urban and industrial areas. Geostatistics coupled with stochastic models provide an effective way to delineate multiple heavy metal pollution to facilitate improved environmental management. PMID:24875258
Zhong, Buqing; Liang, Tao; Wang, Lingqing; Li, Kexin
2014-08-15
An extensive soil survey was conducted to study pollution sources and delineate contamination of heavy metals in one of the metalliferous industrial bases, in the karst areas of southwest China. A total of 597 topsoil samples were collected and the concentrations of five heavy metals, namely Cd, As (metalloid), Pb, Hg and Cr were analyzed. Stochastic models including a conditional inference tree (CIT) and a finite mixture distribution model (FMDM) were applied to identify the sources and partition the contribution from natural and anthropogenic sources for heavy metal in topsoils of the study area. Regression trees for Cd, As, Pb and Hg were proved to depend mostly on indicators of anthropogenic activities such as industrial type and distance from urban area, while the regression tree for Cr was found to be mainly influenced by the geogenic characteristics. The FMDM analysis showed that the geometric means of modeled background values for Cd, As, Pb, Hg and Cr were close to their background values previously reported in the study area, while the contamination of Cd and Hg were widespread in the study area, imposing potentially detrimental effects on organisms through the food chain. Finally, the probabilities of single and multiple heavy metals exceeding the threshold values derived from the FMDM were estimated using indicator kriging (IK) and multivariate indicator kriging (MVIK). The high probabilities exceeding the thresholds of heavy metals were associated with metalliferous production and atmospheric deposition of heavy metals transported from the urban and industrial areas. Geostatistics coupled with stochastic models provide an effective way to delineate multiple heavy metal pollution to facilitate improved environmental management.
Using Bayesian Model Selection to Characterize Neonatal Eeg Recordings
NASA Astrophysics Data System (ADS)
Mitchell, Timothy J.
2009-12-01
The brains of premature infants must undergo significant maturation outside of the womb and are thus particularly susceptible to injury. Electroencephalographic (EEG) recordings are an important diagnostic tool in determining if a newborn's brain is functioning normally or if injury has occurred. However, interpreting the recordings is difficult and requires the skills of a trained electroencephelographer. Because these EEG specialists are rare, an automated interpretation of newborn EEG recordings would increase access to an important diagnostic tool for physicians. To automate this procedure, we employ Bayesian probability theory to compute the posterior probability for the EEG features of interest and use the results in a program designed to mimic EEG specialists. Specifically, we will be identifying waveforms of varying frequency and amplitude, as well as periods of flat recordings where brain activity is minimal.
Bayesian-MCMC-based parameter estimation of stealth aircraft RCS models
NASA Astrophysics Data System (ADS)
Xia, Wei; Dai, Xiao-Xia; Feng, Yuan
2015-12-01
When modeling a stealth aircraft with low RCS (Radar Cross Section), conventional parameter estimation methods may cause a deviation from the actual distribution, owing to the fact that the characteristic parameters are estimated via directly calculating the statistics of RCS. The Bayesian-Markov Chain Monte Carlo (Bayesian-MCMC) method is introduced herein to estimate the parameters so as to improve the fitting accuracies of fluctuation models. The parameter estimations of the lognormal and the Legendre polynomial models are reformulated in the Bayesian framework. The MCMC algorithm is then adopted to calculate the parameter estimates. Numerical results show that the distribution curves obtained by the proposed method exhibit improved consistence with the actual ones, compared with those fitted by the conventional method. The fitting accuracy could be improved by no less than 25% for both fluctuation models, which implies that the Bayesian-MCMC method might be a good candidate among the optimal parameter estimation methods for stealth aircraft RCS models. Project supported by the National Natural Science Foundation of China (Grant No. 61101173), the National Basic Research Program of China (Grant No. 613206), the National High Technology Research and Development Program of China (Grant No. 2012AA01A308), the State Scholarship Fund by the China Scholarship Council (CSC), and the Oversea Academic Training Funds, and University of Electronic Science and Technology of China (UESTC).
A Bayesian approach to the semi-analytic model of galaxy formation
NASA Astrophysics Data System (ADS)
Lu, Yu
It is believed that a wide range of physical processes conspire to shape the observed galaxy population but it remains unsure of their detailed interactions. The semi-analytic model (SAM) of galaxy formation uses multi-dimensional parameterizations of the physical processes of galaxy formation and provides a tool to constrain these underlying physical interactions. Because of the high dimensionality and large uncertainties in the model, the parametric problem of galaxy formation can be profitably tackled with a Bayesian-inference based approach, which allows one to constrain theory with data in a statistically rigorous way. In this thesis, I present a newly developed method to build SAM upon the framework of Bayesian inference. I show that, aided by advanced Markov-Chain Monte-Carlo algorithms, the method has the power to efficiently combine information from diverse data sources, rigorously establish confidence bounds on model parameters, and provide powerful probability-based methods for hypothesis test. Using various data sets (stellar mass function, conditional stellar mass function, K-band luminosity function, and cold gas mass functions) of galaxies in the local Universe, I carry out a series of Bayesian model inferences. The results show that SAM contains huge degeneracies among its parameters, indicating that some of the conclusions drawn previously with the conventional approach may not be truly valid but need to be revisited by the Bayesian approach. Second, some of the degeneracy of the model can be broken by adopting multiple data sets that constrain different aspects of the galaxy population. Third, the inferences reveal that model has challenge to simultaneously explain some important observational results, suggesting that some key physics governing the evolution of star formation and feedback may still be missing from the model. These analyses show clearly that the Bayesian inference based SAM can be used to perform systematic and statistically
Likelihood-free Bayesian computation for structural model calibration: a feasibility study
NASA Astrophysics Data System (ADS)
Jin, Seung-Seop; Jung, Hyung-Jo
2016-04-01
Finite element (FE) model updating is often used to associate FE models with corresponding existing structures for the condition assessment. FE model updating is an inverse problem and prone to be ill-posed and ill-conditioning when there are many errors and uncertainties in both an FE model and its corresponding measurements. In this case, it is important to quantify these uncertainties properly. Bayesian FE model updating is one of the well-known methods to quantify parameter uncertainty by updating our prior belief on the parameters with the available measurements. In Bayesian inference, likelihood plays a central role in summarizing the overall residuals between model predictions and corresponding measurements. Therefore, likelihood should be carefully chosen to reflect the characteristics of the residuals. It is generally known that very little or no information is available regarding the statistical characteristics of the residuals. In most cases, the likelihood is assumed to be the independent identically distributed Gaussian distribution with the zero mean and constant variance. However, this assumption may cause biased and over/underestimated estimates of parameters, so that the uncertainty quantification and prediction are questionable. To alleviate the potential misuse of the inadequate likelihood, this study introduced approximate Bayesian computation (i.e., likelihood-free Bayesian inference), which relaxes the need for an explicit likelihood by analyzing the behavior similarities between model predictions and measurements. We performed FE model updating based on likelihood-free Markov chain Monte Carlo (MCMC) without using the likelihood. Based on the result of the numerical study, we observed that the likelihood-free Bayesian computation can quantify the updating parameters correctly and its predictive capability for the measurements, not used in calibrated, is also secured.
Fully Bayesian mixture model for differential gene expression: simulations and model checks.
Lewin, Alex; Bochkina, Natalia; Richardson, Sylvia
2007-01-01
We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene variances as exchangeable to allow for variability between genes. We show how the proportion of differentially expressed genes, and the mixture parameters, can be estimated in a fully Bayesian way, extending previous approaches where this proportion was fixed and empirically estimated. Good estimates of the false discovery rates are also obtained. Different parametric families for the mixture components can lead to quite different classifications of genes for a given data set. Using Affymetrix data from a knock out and wildtype mice experiment, we show how predictive model checks can be used to guide the choice between possible mixture priors. These checks show that extending the mixture model to allow extra variability around zero instead of the usual point mass null fits the data better. A software package for R is available.
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and ﬁxed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciﬁed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciﬁcation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ﬂexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ﬁeld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. PMID:27235697
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes. PMID:27343475
Bayesian state space models for dynamic genetic network construction across multiple tissues.
Liang, Yulan; Kelemen, Arpad
2016-08-01
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach
Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong
2015-01-01
Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality. PMID:26958200
A General and Flexible Approach to Estimating the Social Relations Model Using Bayesian Methods
ERIC Educational Resources Information Center
Ludtke, Oliver; Robitzsch, Alexander; Kenny, David A.; Trautwein, Ulrich
2013-01-01
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo…
Bayesian Analysis of Structural Equation Models with Nonlinear Covariates and Latent Variables
ERIC Educational Resources Information Center
Song, Xin-Yuan; Lee, Sik-Yum
2006-01-01
In this article, we formulate a nonlinear structural equation model (SEM) that can accommodate covariates in the measurement equation and nonlinear terms of covariates and exogenous latent variables in the structural equation. The covariates can come from continuous or discrete distributions. A Bayesian approach is developed to analyze the…
Bayesian Inference for Growth Mixture Models with Latent Class Dependent Missing Data
ERIC Educational Resources Information Center
Lu, Zhenqiu Laura; Zhang, Zhiyong; Lubke, Gitta
2011-01-01
"Growth mixture models" (GMMs) with nonignorable missing data have drawn increasing attention in research communities but have not been fully studied. The goal of this article is to propose and to evaluate a Bayesian method to estimate the GMMs with latent class dependent missing data. An extended GMM is first presented in which class…
ERIC Educational Resources Information Center
Tchumtchoua, Sylvie; Dey, Dipak K.
2012-01-01
This paper proposes a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the…
Hierarchical Bayesian Model (HBM) - Derived Estimates of Air Quality for 2007: Annual Report
This report describes EPA's Hierarchical Bayesian model generated (HBM) estimates of ozone (O_{3}) and fine particulate matter (PM_{2.5} particles with aerodynamic diameter < 2.5 microns)concentrations throughout the continental United States during the 2007 calen...
Hierarchical Bayesian Model (HBM) - Derived Estimates of Air Quality for 2008: Annual Report
This report describes EPA’s Hierarchical Bayesian model generated (HBM) estimates of ozone (O_{3}) and fine particulate matter (PM_{2.5}, particles with aerodynamic diameter < 2.5 microns) concentrations throughout the continental United States during the 2007 ca...
The Bayesian Evaluation of Categorization Models: Comment on Wills and Pothos (2012)
ERIC Educational Resources Information Center
Vanpaemel, Wolf; Lee, Michael D.
2012-01-01
Wills and Pothos (2012) reviewed approaches to evaluating formal models of categorization, raising a series of worthwhile issues, challenges, and goals. Unfortunately, in discussing these issues and proposing solutions, Wills and Pothos (2012) did not consider Bayesian methods in any detail. This means not only that their review excludes a major…
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan; Cai, Jing-Heng
2010-01-01
Analysis of ordered binary and unordered binary data has received considerable attention in social and psychological research. This article introduces a Bayesian approach, which has several nice features in practical applications, for analyzing nonlinear structural equation models with dichotomous data. We demonstrate how to use the software…
Lin, Lin; Chan, Cliburn; West, Mike
2016-01-01
We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets.
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2004 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2004 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Lin, Lin; Chan, Cliburn; West, Mike
2016-01-01
We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets. PMID:26040910
Kharroubi, Samer A; Brennan, Alan; Strong, Mark
2011-01-01
Expected value of sample information (EVSI) involves simulating data collection, Bayesian updating, and reexamining decisions. Bayesian updating in incomplete data models typically requires Markov chain Monte Carlo (MCMC). This article describes a revision to a form of Bayesian Laplace approximation for EVSI computation to support decisions in incomplete data models. The authors develop the approximation, setting out the mathematics for the likelihood and log posterior density function, which are necessary for the method. They compare the accuracy of EVSI estimates in a case study cost-effectiveness model using first- and second-order versions of their approximation formula and traditional Monte Carlo. Computational efficiency gains depend on the complexity of the net benefit functions, the number of inner-level Monte Carlo samples used, and the requirement or otherwise for MCMC methods to produce the posterior distributions. This methodology provides a new and valuable approach for EVSI computation in health economic decision models and potential wider benefits in many fields requiring Bayesian approximation. PMID:21512189
Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory
ERIC Educational Resources Information Center
Muthen, Bengt; Asparouhov, Tihomir
2012-01-01
This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed…
Pretense, Counterfactuals, and Bayesian Causal Models: Why What Is Not Real Really Matters
ERIC Educational Resources Information Center
Weisberg, Deena S.; Gopnik, Alison
2013-01-01
Young children spend a large portion of their time pretending about non-real situations. Why? We answer this question by using the framework of Bayesian causal models to argue that pretending and counterfactual reasoning engage the same component cognitive abilities: disengaging with current reality, making inferences about an alternative…
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2006 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2006 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
ERIC Educational Resources Information Center
Wang, Qiu; Diemer, Matthew A.; Maier, Kimberly S.
2013-01-01
This study integrated Bayesian hierarchical modeling and receiver operating characteristic analysis (BROCA) to evaluate how interest strength (IS) and interest differentiation (ID) predicted low–socioeconomic status (SES) youth's interest-major congruence (IMC). Using large-scale Kuder Career Search online-assessment data, this study fit three…
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2005 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2005 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2003 – Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2003 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2002– Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2002 calendar year. HBM estimates provide the spatial and temporal variance of O_{3} ...
Hierarchical Bayesian Model (HBM)-Derived Estimates of Air Quality for 2001 - Annual Report
This report describes EPA's Hierarchical Bayesian model-generated (HBM) estimates of O_{3} and PM_{2.5} concentrations throughout the continental United States during the 2001 calendar year. HBM estimates provide the spatial and temporal variance of O_{ 3}...
A Robust Bayesian Approach for Structural Equation Models with Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum; Xia, Ye-Mao
2008-01-01
In this paper, normal/independent distributions, including but not limited to the multivariate t distribution, the multivariate contaminated distribution, and the multivariate slash distribution, are used to develop a robust Bayesian approach for analyzing structural equation models with complete or missing data. In the context of a nonlinear…
Xu, Chengcheng; Wang, Wei; Liu, Pan; Li, Zhibin
2015-12-01
This study aimed to develop a real-time crash risk model with limited data in China by using Bayesian meta-analysis and Bayesian inference approach. A systematic review was first conducted by using three different Bayesian meta-analyses, including the fixed effect meta-analysis, the random effect meta-analysis, and the meta-regression. The meta-analyses provided a numerical summary of the effects of traffic variables on crash risks by quantitatively synthesizing results from previous studies. The random effect meta-analysis and the meta-regression produced a more conservative estimate for the effects of traffic variables compared with the fixed effect meta-analysis. Then, the meta-analyses results were used as informative priors for developing crash risk models with limited data. Three different meta-analyses significantly affect model fit and prediction accuracy. The model based on meta-regression can increase the prediction accuracy by about 15% as compared to the model that was directly developed with limited data. Finally, the Bayesian predictive densities analysis was used to identify the outliers in the limited data. It can further improve the prediction accuracy by 5.0%.
NASA Astrophysics Data System (ADS)
Chen, X.; Hao, Z.; Devineni, N.; Lall, U.
2014-04-01
A Hierarchal Bayesian model is presented for one season-ahead forecasts of summer rainfall and streamflow using exogenous climate variables for east central China. The model provides estimates of the posterior forecasted probability distribution for 12 rainfall and 2 streamflow stations considering parameter uncertainty, and cross-site correlation. The model has a multi-level structure with regression coefficients modeled from a common multi-variate normal distribution resulting in partial pooling of information across multiple stations and better representation of parameter and posterior distribution uncertainty. Covariance structure of the residuals across stations is explicitly modeled. Model performance is tested under leave-10-out cross-validation. Frequentist and Bayesian performance metrics used include receiver operating characteristic, reduction of error, coefficient of efficiency, rank probability skill scores, and coverage by posterior credible intervals. The ability of the model to reliably forecast season-ahead regional summer rainfall and streamflow offers potential for developing adaptive water risk management strategies.
An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit
Wong, Rowena Syn Yin; Ismail, Noor Azina
2016-01-01
Background and Objectives There are not many studies that attempt to model intensive care unit (ICU) risk of death in developing countries, especially in South East Asia. The aim of this study was to propose and describe application of a Bayesian approach in modeling in-ICU deaths in a Malaysian ICU. Methods This was a prospective study in a mixed medical-surgery ICU in a multidisciplinary tertiary referral hospital in Malaysia. Data collection included variables that were defined in Acute Physiology and Chronic Health Evaluation IV (APACHE IV) model. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach was applied in the development of four multivariate logistic regression predictive models for the ICU, where the main outcome measure was in-ICU mortality risk. The performance of the models were assessed through overall model fit, discrimination and calibration measures. Results from the Bayesian models were also compared against results obtained using frequentist maximum likelihood method. Results The study involved 1,286 consecutive ICU admissions between January 1, 2009 and June 30, 2010, of which 1,111 met the inclusion criteria. Patients who were admitted to the ICU were generally younger, predominantly male, with low co-morbidity load and mostly under mechanical ventilation. The overall in-ICU mortality rate was 18.5% and the overall mean Acute Physiology Score (APS) was 68.5. All four models exhibited good discrimination, with area under receiver operating characteristic curve (AUC) values approximately 0.8. Calibration was acceptable (Hosmer-Lemeshow p-values > 0.05) for all models, except for model M3. Model M1 was identified as the model with the best overall performance in this study. Conclusion Four prediction models were proposed, where the best model was chosen based on its overall performance in this study. This study has also demonstrated the promising potential of the Bayesian MCMC approach as an alternative in the analysis and modeling of
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
Equifinality of formal (DREAM) and informal (GLUE) bayesian approaches in hydrologic modeling?
Vrugt, Jasper A; Robinson, Bruce A; Ter Braak, Cajo J F; Gupta, Hoshin V
2008-01-01
In recent years, a strong debate has emerged in the hydrologic literature regarding what constitutes an appropriate framework for uncertainty estimation. Particularly, there is strong disagreement whether an uncertainty framework should have its roots within a proper statistical (Bayesian) context, or whether such a framework should be based on a different philosophy and implement informal measures and weaker inference to summarize parameter and predictive distributions. In this paper, we compare a formal Bayesian approach using Markov Chain Monte Carlo (MCMC) with generalized likelihood uncertainty estimation (GLUE) for assessing uncertainty in conceptual watershed modeling. Our formal Bayesian approach is implemented using the recently developed differential evolution adaptive metropolis (DREAM) MCMC scheme with a likelihood function that explicitly considers model structural, input and parameter uncertainty. Our results demonstrate that DREAM and GLUE can generate very similar estimates of total streamflow uncertainty. This suggests that formal and informal Bayesian approaches have more common ground than the hydrologic literature and ongoing debate might suggest. The main advantage of formal approaches is, however, that they attempt to disentangle the effect of forcing, parameter and model structural error on total predictive uncertainty. This is key to improving hydrologic theory and to better understand and predict the flow of water through catchments.
Bayesian Network Model with Application to Smart Power Semiconductor Lifetime Data.
Plankensteiner, Kathrin; Bluder, Olivia; Pilz, Jürgen
2015-09-01
In this article, Bayesian networks are used to model semiconductor lifetime data obtained from a cyclic stress test system. The data of interest are a mixture of log-normal distributions, representing two dominant physical failure mechanisms. Moreover, the data can be censored due to limited test resources. For a better understanding of the complex lifetime behavior, interactions between test settings, geometric designs, material properties, and physical parameters of the semiconductor device are modeled by a Bayesian network. Statistical toolboxes in MATLAB® have been extended and applied to find the best structure of the Bayesian network and to perform parameter learning. Due to censored observations Markov chain Monte Carlo (MCMC) simulations are employed to determine the posterior distributions. For model selection the automatic relevance determination (ARD) algorithm and goodness-of-fit criteria such as marginal likelihoods, Bayes factors, posterior predictive density distributions, and sum of squared errors of prediction (SSEP) are applied and evaluated. The results indicate that the application of Bayesian networks to semiconductor reliability provides useful information about the interactions between the significant covariates and serves as a reliable alternative to currently applied methods.
Choy, Samantha Low; O'Leary, Rebecca; Mengersen, Kerrie
2009-01-01
Bayesian statistical modeling has several benefits within an ecological context. In particular, when observed data are limited in sample size or representativeness, then the Bayesian framework provides a mechanism to combine observed data with other "prior" information. Prior information may be obtained from earlier studies, or in their absence, from expert knowledge. This use of the Bayesian framework reflects the scientific "learning cycle," where prior or initial estimates are updated when new data become available. In this paper we outline a framework for statistical design of expert elicitation processes for quantifying such expert knowledge, in a form suitable for input as prior information into Bayesian models. We identify six key elements: determining the purpose and motivation for using prior information; specifying the relevant expert knowledge available; formulating the statistical model; designing effective and efficient numerical encoding; managing uncertainty; and designing a practical elicitation protocol. We demonstrate this framework applies to a variety of situations, with two examples from the ecological literature and three from our experience. Analysis of these examples reveals several recurring important issues affecting practical design of elicitation in ecological problems.
ERIC Educational Resources Information Center
Rindskopf, David
2012-01-01
Muthen and Asparouhov (2012) made a strong case for the advantages of Bayesian methodology in factor analysis and structural equation models. I show additional extensions and adaptations of their methods and show how non-Bayesians can take advantage of many (though not all) of these advantages by using interval restrictions on parameters. By…
Bayesian spatio-temporal modeling of particulate matter concentrations in Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Manga, Edna; Awang, Norhashidah
2016-06-01
This article presents an application of a Bayesian spatio-temporal Gaussian process (GP) model on particulate matter concentrations from Peninsular Malaysia. We analyze daily PM10 concentration levels from 35 monitoring sites in June and July 2011. The spatiotemporal model set in a Bayesian hierarchical framework allows for inclusion of informative covariates, meteorological variables and spatiotemporal interactions. Posterior density estimates of the model parameters are obtained by Markov chain Monte Carlo methods. Preliminary data analysis indicate information on PM10 levels at sites classified as industrial locations could explain part of the space time variations. We include the site-type indicator in our modeling efforts. Results of the parameter estimates for the fitted GP model show significant spatio-temporal structure and positive effect of the location-type explanatory variable. We also compute some validation criteria for the out of sample sites that show the adequacy of the model for predicting PM10 at unmonitored sites.
NASA Astrophysics Data System (ADS)
Tsai, Frank T.-C.; Elshall, Ahmed S.
2013-09-01
Analysts are often faced with competing propositions for each uncertain model component. How can we judge that we select a correct proposition(s) for an uncertain model component out of numerous possible propositions? We introduce the hierarchical Bayesian model averaging (HBMA) method as a multimodel framework for uncertainty analysis. The HBMA allows for segregating, prioritizing, and evaluating different sources of uncertainty and their corresponding competing propositions through a hierarchy of BMA models that forms a BMA tree. We apply the HBMA to conduct uncertainty analysis on the reconstructed hydrostratigraphic architectures of the Baton Rouge aquifer-fault system, Louisiana. Due to uncertainty in model data, structure, and parameters, multiple possible hydrostratigraphic models are produced and calibrated as base models. The study considers four sources of uncertainty. With respect to data uncertainty, the study considers two calibration data sets. With respect to model structure, the study considers three different variogram models, two geological stationarity assumptions and two fault conceptualizations. The base models are produced following a combinatorial design to allow for uncertainty segregation. Thus, these four uncertain model components with their corresponding competing model propositions result in 24 base models. The results show that the systematic dissection of the uncertain model components along with their corresponding competing propositions allows for detecting the robust model propositions and the major sources of uncertainty.
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets
2015-01-01
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user’s own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery. PMID:25994950
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets.
Clark, Alex M; Dole, Krishna; Coulon-Spektor, Anna; McNutt, Andrew; Grass, George; Freundlich, Joel S; Reynolds, Robert C; Ekins, Sean
2015-06-22
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.
Lessons Learned from a Past Series of Bayesian Model Averaging studies for Soil/Plant Models
NASA Astrophysics Data System (ADS)
Nowak, Wolfgang; Wöhling, Thomas; Schöniger, Anneli
2015-04-01
In this study we evaluate the lessons learned about modelling soil/plant systems from analyzing evapotranspiration data, soil moisture and leaf area index. The data were analyzed with advanced tools from the area of Bayesian Model Averaging, model ranking and Bayesian Model Selection. We have generated a large variety of model conceptualizations by sampling random parameter sets from the vegetation components of the CERES, SUCROS, GECROS, and SPASS models and a common model for soil water movement via Monte-Carlo simulations. We used data from a one vegetation period of winter wheat at a field site in Nellingen, Germany. The data set includes soil moisture, actual evapotranspiration (ETa) from an eddy covariance tower, and leaf-area index (LAI). The focus of data analysis was on how one can do model ranking and model selection. Further analysis steps included the predictive reliability of different soil/plant models calibrated on different subsets of the available data. Our main conclusion is that model selection between different competing soil-plant models remains a large challenge, because 1. different data types and their combinations favor different models, because competing models are more or less good in simulating the coupling processes between the various compartments and their states, 2. singular events (such as the evolution of LAI during plant senescence) can dominate an entire time series, and long time series can be represented well by the few data values where the models disagree most, 3. the different data types differ in their discriminating power for model selection, 4. the level of noise present in ETa and LAI data, and the level of systematic model bias through simplifications of the complex system (e.g., assuming a few internally homogeneous soil layers) substantially reduce the confidence in model ranking and model selection, 5. none of the models withstands a hypothesis test against the available data, 6. even the assumed level of measurement
NASA Astrophysics Data System (ADS)
Gilet, Estelle; Diard, Julien; Palluel-Germain, Richard; Bessière, Pierre
2011-03-01
This paper is about modeling perception-action loops and, more precisely, the study of the influence of motor knowledge during perception tasks. We use the Bayesian Action-Perception (BAP) model, which deals with the sensorimotor loop involved in reading and writing cursive isolated letters and includes an internal simulation of movement loop. By using this probabilistic model we simulate letter recognition, both with and without internal motor simulation. Comparison of their performance yields an experimental prediction, which we set forth.
Bayesian model of dynamic image stabilization in the visual system.
Burak, Yoram; Rokni, Uri; Meister, Markus; Sompolinsky, Haim
2010-11-01
Humans can resolve the fine details of visual stimuli although the image projected on the retina is constantly drifting relative to the photoreceptor array. Here we demonstrate that the brain must take this drift into account when performing high acuity visual tasks. Further, we propose a decoding strategy for interpreting the spikes emitted by the retina, which takes into account the ambiguity caused by retinal noise and the unknown trajectory of the projected image on the retina. A main difficulty, addressed in our proposal, is the exponentially large number of possible stimuli, which renders the ideal Bayesian solution to the problem computationally intractable. In contrast, the strategy that we propose suggests a realistic implementation in the visual cortex. The implementation involves two populations of cells, one that tracks the position of the image and another that represents a stabilized estimate of the image itself. Spikes from the retina are dynamically routed to the two populations and are interpreted in a probabilistic manner. We consider the architecture of neural circuitry that could implement this strategy and its performance under measured statistics of human fixational eye motion. A salient prediction is that in high acuity tasks, fixed features within the visual scene are beneficial because they provide information about the drifting position of the image. Therefore, complete elimination of peripheral features in the visual scene should degrade performance on high acuity tasks involving very small stimuli.
Imprecise (fuzzy) information in geostatistics
Bardossy, A.; Bogardi, I.; Kelly, W.E.
1988-05-01
A methodology based on fuzzy set theory for the utilization of imprecise data in geostatistics is presented. A common problem preventing a broader use of geostatistics has been the insufficient amount of accurate measurement data. In certain cases, additional but uncertain (soft) information is available and can be encoded as subjective probabilities, and then the soft kriging method can be applied (Journal, 1986). In other cases, a fuzzy encoding of soft information may be more realistic and simplify the numerical calculations. Imprecise (fuzzy) spatial information on the possible variogram is integrated into a single variogram which is used in a fuzzy kriging procedure. The overall uncertainty of prediction is represented by the estimation variance and the calculated membership function for each kriged point. The methodology is applied to the permeability prediction of a soil liner for hazardous waste containment. The available number of hard measurement data (20) was not enough for a classical geostatistical analysis. An additional 20 soft data made it possible to prepare kriged contour maps using the fuzzy geostatistical procedure.
Phosphorus load estimation in the Saginaw River, MI using a Bayesian hierarchical/multilevel model.
Cha, YoonKyung; Stow, Craig A; Reckhow, Kenneth H; DeMarchi, Carlo; Johengen, Thomas H
2010-05-01
We propose the use of Bayesian hierarchical/multilevel ratio approach to estimate the annual riverine phosphorus loads in the Saginaw River, Michigan, from 1968 to 2008. The ratio estimator is known to be an unbiased, precise approach for differing flow-concentration relationships and sampling schemes. A Bayesian model can explicitly address the uncertainty in prediction by using a posterior predictive distribution, while in comparison, a Bayesian hierarchical technique can overcome the limitation of interpreting the estimated annual loads inferred from small sample sizes by borrowing strength from the underlying population shared by the years of interest. Thus, by combining the ratio estimator with the Bayesian hierarchical modeling framework, long-term loads estimation can be addressed with explicit quantification of uncertainty. Our study results indicate a slight decrease in total phosphorus load early in the series. The estimated ratio parameter, which can be interpreted as flow-weighted concentration, shows a clearer decrease, damping the noise that yearly flow variation adds to the load. Despite the reductions, it is not likely that Saginaw Bay meets with its target phosphorus load, 440 tonnes/yr. Throughout the decades, the probabilities of the Saginaw Bay not complying with the target load are estimated as 1.00, 0.50, 0.57 and 0.36 in 1977, 1987, 1997, and 2007, respectively. We show that the Bayesian hierarchical model results in reasonable goodness-of-fits to the observations whether or not individual loads are aggregated. Also, this modeling approach can substantially reduce uncertainties associated with small sample sizes both in the estimated parameters and loads. PMID:20382406
Toni, Tina; Welch, David; Strelkowa, Natalja; Ipsen, Andreas; Stumpf, Michael P.H.
2008-01-01
Approximate Bayesian computation (ABC) methods can be used to evaluate posterior distributions without having to calculate likelihoods. In this paper, we discuss and apply an ABC method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models. We show that ABC SMC provides information about the inferability of parameters and model sensitivity to changes in parameters, and tends to perform better than other ABC approaches. The algorithm is applied to several well-known biological systems, for which parameters and their credible intervals are inferred. Moreover, we develop ABC SMC as a tool for model selection; given a range of different mathematical descriptions, ABC SMC is able to choose the best model using the standard Bayesian model selection apparatus. PMID:19205079
Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models
Daunizeau, J.; Friston, K.J.; Kiebel, S.J.
2009-01-01
In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identification or inversion entails the estimation of the marginal likelihood or evidence of a model. This difficult integration problem can be finessed by optimising a free-energy bound on the evidence using results from variational calculus. This yields a deterministic update scheme that optimises an approximation to the posterior density on the unknown model variables. We derive such a variational Bayesian scheme in the context of nonlinear stochastic dynamic hierarchical models, for both model identification and time-series prediction. The computational complexity of the scheme is comparable to that of an extended Kalman filter, which is critical when inverting high dimensional models or long time-series. Using Monte-Carlo simulations, we assess the estimation efficiency of this variational Bayesian approach using three stochastic variants of chaotic dynamic systems. We also demonstrate the model comparison capabilities of the method, its self-consistency and its predictive power. PMID:19862351
Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes.
Pittman, Jennifer; Huang, Erich; Nevins, Joseph; Wang, Quanli; West, Mike
2004-10-01
Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.
A Bayesian approach for inducing sparsity in generalized linear models with multi-category response
2015-01-01
Background The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes. Results A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods. Conclusions Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets. PMID:26423345
Dynamic causal modelling of electrographic seizure activity using Bayesian belief updating.
Cooray, Gerald K; Sengupta, Biswa; Douglas, Pamela K; Friston, Karl
2016-01-15
Seizure activity in EEG recordings can persist for hours with seizure dynamics changing rapidly over time and space. To characterise the spatiotemporal evolution of seizure activity, large data sets often need to be analysed. Dynamic causal modelling (DCM) can be used to estimate the synaptic drivers of cortical dynamics during a seizure; however, the requisite (Bayesian) inversion procedure is computationally expensive. In this note, we describe a straightforward procedure, within the DCM framework, that provides efficient inversion of seizure activity measured with non-invasive and invasive physiological recordings; namely, EEG/ECoG. We describe the theoretical background behind a Bayesian belief updating scheme for DCM. The scheme is tested on simulated and empirical seizure activity (recorded both invasively and non-invasively) and compared with standard Bayesian inversion. We show that the Bayesian belief updating scheme provides similar estimates of time-varying synaptic parameters, compared to standard schemes, indicating no significant qualitative change in accuracy. The difference in variance explained was small (less than 5%). The updating method was substantially more efficient, taking approximately 5-10min compared to approximately 1-2h. Moreover, the setup of the model under the updating scheme allows for a clear specification of how neuronal variables fluctuate over separable timescales. This method now allows us to investigate the effect of fast (neuronal) activity on slow fluctuations in (synaptic) parameters, paving a way forward to understand how seizure activity is generated.
Modelling household finances: A Bayesian approach to a multivariate two-part model
Brown, Sarah; Ghosh, Pulak; Su, Li; Taylor, Karl
2016-01-01
We contribute to the empirical literature on household finances by introducing a Bayesian multivariate two-part model, which has been developed to further our understanding of household finances. Our flexible approach allows for the potential interdependence between the holding of assets and liabilities at the household level and also encompasses a two-part process to allow for differences in the influences on asset or liability holding and on the respective amounts held. Furthermore, the framework is dynamic in order to allow for persistence in household finances over time. Our findings endorse the joint modelling approach and provide evidence supporting the importance of dynamics. In addition, we find that certain independent variables exert different influences on the binary and continuous parts of the model thereby highlighting the flexibility of our framework and revealing a detailed picture of the nature of household finances. PMID:27212801
Curtis, Gary P.; Lu, Dan; Ye, Ming
2015-01-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. This study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict the reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. These reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Limitations of applying MLBMA to the
Lu, Dan; Ye, Ming; Curtis, Gary P.
2015-08-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. Our study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict themore » reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. Moreover, these reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Finally
Lu, Dan; Ye, Ming; Curtis, Gary P.
2015-08-01
While Bayesian model averaging (BMA) has been widely used in groundwater modeling, it is infrequently applied to groundwater reactive transport modeling because of multiple sources of uncertainty in the coupled hydrogeochemical processes and because of the long execution time of each model run. To resolve these problems, this study analyzed different levels of uncertainty in a hierarchical way, and used the maximum likelihood version of BMA, i.e., MLBMA, to improve the computational efficiency. Our study demonstrates the applicability of MLBMA to groundwater reactive transport modeling in a synthetic case in which twenty-seven reactive transport models were designed to predict the reactive transport of hexavalent uranium (U(VI)) based on observations at a former uranium mill site near Naturita, CO. Moreover, these reactive transport models contain three uncertain model components, i.e., parameterization of hydraulic conductivity, configuration of model boundary, and surface complexation reactions that simulate U(VI) adsorption. These uncertain model components were aggregated into the alternative models by integrating a hierarchical structure into MLBMA. The modeling results of the individual models and MLBMA were analyzed to investigate their predictive performance. The predictive logscore results show that MLBMA generally outperforms the best model, suggesting that using MLBMA is a sound strategy to achieve more robust model predictions relative to a single model. MLBMA works best when the alternative models are structurally distinct and have diverse model predictions. When correlation in model structure exists, two strategies were used to improve predictive performance by retaining structurally distinct models or assigning smaller prior model probabilities to correlated models. Since the synthetic models were designed using data from the Naturita site, the results of this study are expected to provide guidance for real-world modeling. Finally, limitations of
Geostatistical analysis of Palmerton soil survey data.
Starks, T H; Sparks, A R; Brown, K W
1987-11-01
This paper describes statistical and geostatistical analyses of data from a soil sampling survey. Soil sampling was performed, in October and November of 1985, to obtain information on the level, extent, and spatial structure of metal pollution of the soil in and around the Palmerton, Pennsylvania, NPL Superfund site. Measurements of the concentrations of cadmium, copper, lead, and zinc in the soil samples were obtained. An appropriate variance stabilizing transformation was determined. Estimation of variance components was performed. Generalized convariance functions for log-transformed concentrations were estimated for each metal. Block kriging was employed using the estimated spatial structure models to obtain estimated metal concentration distributions over the central part of Palmerton.
Non-stationary Bayesian estimation of parameters from a body cover model of the vocal folds.
Hadwin, Paul J; Galindo, Gabriel E; Daun, Kyle J; Zañartu, Matías; Erath, Byron D; Cataldo, Edson; Peterson, Sean D
2016-05-01
The evolution of reduced-order vocal fold models into clinically useful tools for subject-specific diagnosis and treatment hinges upon successfully and accurately representing an individual patient in the modeling framework. This, in turn, requires inference of model parameters from clinical measurements in order to tune a model to the given individual. Bayesian analysis is a powerful tool for estimating model parameter probabilities based upon a set of observed data. In this work, a Bayesian particle filter sampling technique capable of estimating time-varying model parameters, as occur in complex vocal gestures, is introduced. The technique is compared with time-invariant Bayesian estimation and least squares methods for determining both stationary and non-stationary parameters. The current technique accurately estimates the time-varying unknown model parameter and maintains tight credibility bounds. The credibility bounds are particularly relevant from a clinical perspective, as they provide insight into the confidence a clinician should have in the model predictions. PMID:27250162
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way. PMID:26497359
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
NASA Astrophysics Data System (ADS)
Messier, K. P.; Serre, M. L.
2015-12-01
Radon (222Rn) is a naturally occurring chemically inert, colorless, and odorless radioactive gas produced from the decay of uranium (238U), which is ubiquitous in rocks and soils worldwide. Exposure to 222Rn is likely the second leading cause of lung cancer after cigarette smoking via inhalation; however, exposure through untreated groundwater is also a contributing factor to both inhalation and ingestion routes. A land use regression (LUR) model for groundwater 222Rn with anisotropic geological and 238U based explanatory variables is developed, which helps elucidate the factors contributing to elevated 222Rn across North Carolina. Geological and uranium based variables are constructed in elliptical buffers surrounding each observation such that they capture the lateral geometric anisotropy present in groundwater 222Rn. Moreover, geological features are defined at three different geological spatial scales to allow the model to distinguish between large area and small area effects of geology on groundwater 222Rn. The LUR is also integrated into the Bayesian Maximum Entropy (BME) geostatistical framework to increase accuracy and produce a point-level LUR-BME model of groundwater 222Rn across North Carolina including prediction uncertainty. The LUR-BME model of groundwater 222Rn results in a leave-one out cross-validation of 0.46 (Pearson correlation coefficient= 0.68), effectively predicting within the spatial covariance range. Modeled results of 222Rn concentrations show variability among Intrusive Felsic geological formations likely due to average bedrock 238U defined on the basis of overlying stream-sediment 238U concentrations that is a widely distributed consistently analyzed point-source data.
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2015-12-01
Models in biogeoscience involve uncertainties in observation data, model inputs, model structure, model processes and modeling scenarios. To accommodate for different sources of uncertainty, multimodal analysis such as model combination, model selection, model elimination or model discrimination are becoming more popular. To illustrate theoretical and practical challenges of multimodal analysis, we use an example about microbial soil respiration modeling. Global soil respiration releases more than ten times more carbon dioxide to the atmosphere than all anthropogenic emissions. Thus, improving our understanding of microbial soil respiration is essential for improving climate change models. This study focuses on a poorly understood phenomena, which is the soil microbial respiration pulses in response to episodic rainfall pulses (the "Birch effect"). We hypothesize that the "Birch effect" is generated by the following three mechanisms. To test our hypothesis, we developed and assessed five evolving microbial-enzyme models against field measurements from a semiarid Savannah that is characterized by pulsed precipitation. These five model evolve step-wise such that the first model includes none of these three mechanism, while the fifth model includes the three mechanisms. The basic component of Bayesian multimodal analysis is the estimation of marginal likelihood to rank the candidate models based on their overall likelihood with respect to observation data. The first part of the study focuses on using this Bayesian scheme to discriminate between these five candidate models. The second part discusses some theoretical and practical challenges, which are mainly the effect of likelihood function selection and the marginal likelihood estimation methods on both model ranking and Bayesian model averaging. The study shows that making valid inference from scientific data is not a trivial task, since we are not only uncertain about the candidate scientific models, but also about
Prediction and assimilation of surf-zone processes using a Bayesian network: Part I: Forward models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
Prediction of coastal processes, including waves, currents, and sediment transport, can be obtained from a variety of detailed geophysical-process models with many simulations showing significant skill. This capability supports a wide range of research and applied efforts that can benefit from accurate numerical predictions. However, the predictions are only as accurate as the data used to drive the models and, given the large temporal and spatial variability of the surf zone, inaccuracies in data are unavoidable such that useful predictions require corresponding estimates of uncertainty. We demonstrate how a Bayesian-network model can be used to provide accurate predictions of wave-height evolution in the surf zone given very sparse and/or inaccurate boundary-condition data. The approach is based on a formal treatment of a data-assimilation problem that takes advantage of significant reduction of the dimensionality of the model system. We demonstrate that predictions of a detailed geophysical model of the wave evolution are reproduced accurately using a Bayesian approach. In this surf-zone application, forward prediction skill was 83%, and uncertainties in the model inputs were accurately transferred to uncertainty in output variables. We also demonstrate that if modeling uncertainties were not conveyed to the Bayesian network (i.e., perfect data or model were assumed), then overly optimistic prediction uncertainties were computed. More consistent predictions and uncertainties were obtained by including model-parameter errors as a source of input uncertainty. Improved predictions (skill of 90%) were achieved because the Bayesian network simultaneously estimated optimal parameters while predicting wave heights.
Prediction and assimilation of surf-zone processes using a Bayesian network: Part II: Inverse models
Plant, Nathaniel G.; Holland, K. Todd
2011-01-01
A Bayesian network model has been developed to simulate a relatively simple problem of wave propagation in the surf zone (detailed in Part I). Here, we demonstrate that this Bayesian model can provide both inverse modeling and data-assimilation solutions for predicting offshore wave heights and depth estimates given limited wave-height and depth information from an onshore location. The inverse method is extended to allow data assimilation using observational inputs that are not compatible with deterministic solutions of the problem. These inputs include sand bar positions (instead of bathymetry) and estimates of the intensity of wave breaking (instead of wave-height observations). Our results indicate that wave breaking information is essential to reduce prediction errors. In many practical situations, this information could be provided from a shore-based observer or from remote-sensing systems. We show that various combinations of the assimilated inputs significantly reduce the uncertainty in the estimates of water depths and wave heights in the model domain. Application of the Bayesian network model to new field data demonstrated significant predictive skill (R2 = 0.7) for the inverse estimate of a month-long time series of offshore wave heights. The Bayesian inverse results include uncertainty estimates that were shown to be most accurate when given uncertainty in the inputs (e.g., depth and tuning parameters). Furthermore, the inverse modeling was extended to directly estimate tuning parameters associated with the underlying wave-process model. The inverse estimates of the model parameters not only showed an offshore wave height dependence consistent with results of previous studies but the uncertainty estimates of the tuning parameters also explain previously reported variations in the model parameters.
Construction of an Improved Bayesian Clutter Suppression Model for Gas Detection
Heasler, Patrick G.; Anderson, Kevin K.; Hylden, Jeffrey L.
2002-10-28
This technical report describes a nonlinear Bayesian Regression model that can be used to estimate effuent concentrations from IR hyperspectral data. As the title implies, the model is constructed to account for background clutter more effectively than current estimators. Although the main objective is to account for background clutter, which is the dominant source of variability in IR data, the model could easily be extended to allow for uncertainties in the atmosphere. The term, "clutter," refers to the variations that occur in the image spectra because emissivity and background temperature change from pixel to pixel. The Bayesian regression model utilizes a more complete description of background clutter to obtain better estimates. The description is in terms of a "prior distribution" on background radiance.
Some comments on misspecification of priors in Bayesian modelling of measurement error problems.
Richardson, S; Leblond, L
In this paper we discuss some aspects of misspecification of prior distributions in the context of Bayesian modelling of measurement error problems. A Bayesian approach to the treatment of common measurement error situations encountered in epidemiology has been recently proposed. Its implementation involves, first, the structural specification, through conditional independence relationships, of three submodels-a measurement model, an exposure model and a disease model- and secondly, the choice of functional forms for the distributions involved in the submodels. We present some results indicating how the estimation of the regression parameters of interest, which is carried out using Gibbs sampling, can be influenced by a misspecification of the parametric shape of the prior distribution of exposure. PMID:9004392
NASA Technical Reports Server (NTRS)
He, Yuning
2015-01-01
The behavior of complex aerospace systems is governed by numerous parameters. For safety analysis it is important to understand how the system behaves with respect to these parameter values. In particular, understanding the boundaries between safe and unsafe regions is of major importance. In this paper, we describe a hierarchical Bayesian statistical modeling approach for the online detection and characterization of such boundaries. Our method for classification with active learning uses a particle filter-based model and a boundary-aware metric for best performance. From a library of candidate shapes incorporated with domain expert knowledge, the location and parameters of the boundaries are estimated using advanced Bayesian modeling techniques. The results of our boundary analysis are then provided in a form understandable by the domain expert. We illustrate our approach using a simulation model of a NASA neuro-adaptive flight control system, as well as a system for the detection of separation violations in the terminal airspace.
A Bayesian Multinomial Probit MODEL FOR THE ANALYSIS OF PANEL CHOICE DATA.
Fong, Duncan K H; Kim, Sunghoon; Chen, Zhe; DeSarbo, Wayne S
2016-03-01
A new Bayesian multinomial probit model is proposed for the analysis of panel choice data. Using a parameter expansion technique, we are able to devise a Markov Chain Monte Carlo algorithm to compute our Bayesian estimates efficiently. We also show that the proposed procedure enables the estimation of individual level coefficients for the single-period multinomial probit model even when the available prior information is vague. We apply our new procedure to consumer purchase data and reanalyze a well-known scanner panel dataset that reveals new substantive insights. In addition, we delineate a number of advantageous features of our proposed procedure over several benchmark models. Finally, through a simulation analysis employing a fractional factorial design, we demonstrate that the results from our proposed model are quite robust with respect to differing factors across various conditions.
Models and simulation of 3D neuronal dendritic trees using Bayesian networks.
López-Cruz, Pedro L; Bielza, Concha; Larrañaga, Pedro; Benavides-Piccione, Ruth; DeFelipe, Javier
2011-12-01
Neuron morphology is crucial for neuronal connectivity and brain information processing. Computational models are important tools for studying dendritic morphology and its role in brain function. We applied a class of probabilistic graphical models called Bayesian networks to generate virtual dendrites from layer III pyramidal neurons from three different regions of the neocortex of the mouse. A set of 41 morphological variables were measured from the 3D reconstructions of real dendrites and their probability distributions used in a machine learning algorithm to induce the model from the data. A simulation algorithm is also proposed to obtain new dendrites by sampling values from Bayesian networks. The main advantage of this approach is that it takes into account and automatically locates the relationships between variables in the data instead of using predefined dependencies. Therefore, the methodology can be applied to any neuronal class while at the same time exploiting class-specific properties. Also, a Bayesian network was defined for each part of the dendrite, allowing the relationships to change in the different sections and to model heterogeneous developmental factors or spatial influences. Several univariate statistical tests and a novel multivariate test based on Kullback-Leibler divergence estimation confirmed that virtual dendrites were similar to real ones. The analyses of the models showed relationships that conform to current neuroanatomical knowledge and support model correctness. At the same time, studying the relationships in the models can help to identify new interactions between variables related to dendritic morphology.
A Genomic Bayesian Multi-trait and Multi-environment Model.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H; Pérez-Hernández, Oscar; Eskridge, Kent M; Rutkoski, Jessica
2016-09-08
When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-[Formula: see text] priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance-covariance) improved prediction accuracy compared to the model with diagonal and standard variance-covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses.
A Genomic Bayesian Multi-trait and Multi-environment Model.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H; Pérez-Hernández, Oscar; Eskridge, Kent M; Rutkoski, Jessica
2016-01-01
When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-[Formula: see text] priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance-covariance) improved prediction accuracy compared to the model with diagonal and standard variance-covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses. PMID:27342738
Bayesian Monte Carlo updating of Hudson River PCB model using water column PCB measurements
Zhang, S.; Toll, J.; Cothern, K.
1995-12-31
The authors have developed prior probability distributions for model parameters and terms describing physico-chemical processes in sediment and water column models of PCB fate in a segment of the lower Hudson River, and performed importance analyses to identify the key uncertainties affecting the models` predictive power. In this work, the authors employ field measurements of the mean total water column PCB concentration from nearby river segments to refine the prior probability distributions for the important parameters and terms in the water column PCB model, using Bayesian Monte Carlo analysis. The principal objectives of the current work are (1) to implement Bayesian Monte Carlo analysis, to demonstrate the technique and evaluate its potential benefits, and (2) to improve the parameterization of the water column PCB model on the basis of site-specific PCB concentration data. The Bayesian updating procedure resulted in improved estimates of PCB mass loading and re-suspension velocity terms, but posteriors for three other key parameters -- settling velocity and particulate PCB fractions in the water column and surface sediments -- were unaffected by the information extracted from the new field data. In addition, the authors found that some of the high posterior probability parameter vectors, though mathematically plausible, were physically implausible, as a consequence of the unrealistic (but common) Monte Carlo assumption that the model`s parameters are independently distributed. The implications of this and other findings are discussed.
A Genomic Bayesian Multi-trait and Multi-environment Model
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Toledo, Fernando H.; Pérez-Hernández, Oscar; Eskridge, Kent M.; Rutkoski, Jessica
2016-01-01
When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-t priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance–covariance) improved prediction accuracy compared to the model with diagonal and standard variance–covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses. PMID:27342738
Models and simulation of 3D neuronal dendritic trees using Bayesian networks.
López-Cruz, Pedro L; Bielza, Concha; Larrañaga, Pedro; Benavides-Piccione, Ruth; DeFelipe, Javier
2011-12-01
Neuron morphology is crucial for neuronal connectivity and brain information processing. Computational models are important tools for studying dendritic morphology and its role in brain function. We applied a class of probabilistic graphical models called Bayesian networks to generate virtual dendrites from layer III pyramidal neurons from three different regions of the neocortex of the mouse. A set of 41 morphological variables were measured from the 3D reconstructions of real dendrites and their probability distributions used in a machine learning algorithm to induce the model from the data. A simulation algorithm is also proposed to obtain new dendrites by sampling values from Bayesian networks. The main advantage of this approach is that it takes into account and automatically locates the relationships between variables in the data instead of using predefined dependencies. Therefore