Sample records for predictive statistical models

  1. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  2. A two-component rain model for the prediction of attenuation statistics

    NASA Technical Reports Server (NTRS)

    Crane, R. K.

    1982-01-01

    A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.

  3. Population activity statistics dissect subthreshold and spiking variability in V1.

    PubMed

    Bányai, Mihály; Koman, Zsombor; Orbán, Gergő

    2017-07-01

    Response variability, as measured by fluctuating responses upon repeated performance of trials, is a major component of neural responses, and its characterization is key to interpret high dimensional population recordings. Response variability and covariability display predictable changes upon changes in stimulus and cognitive or behavioral state, providing an opportunity to test the predictive power of models of neural variability. Still, there is little agreement on which model to use as a building block for population-level analyses, and models of variability are often treated as a subject of choice. We investigate two competing models, the doubly stochastic Poisson (DSP) model assuming stochasticity at spike generation, and the rectified Gaussian (RG) model tracing variability back to membrane potential variance, to analyze stimulus-dependent modulation of both single-neuron and pairwise response statistics. Using a pair of model neurons, we demonstrate that the two models predict similar single-cell statistics. However, DSP and RG models have contradicting predictions on the joint statistics of spiking responses. To test the models against data, we build a population model to simulate stimulus change-related modulations in pairwise response statistics. We use single-unit data from the primary visual cortex (V1) of monkeys to show that while model predictions for variance are qualitatively similar to experimental data, only the RG model's predictions are compatible with joint statistics. These results suggest that models using Poisson-like variability might fail to capture important properties of response statistics. We argue that membrane potential-level modeling of stochasticity provides an efficient strategy to model correlations. NEW & NOTEWORTHY Neural variability and covariability are puzzling aspects of cortical computations. For efficient decoding and prediction, models of information encoding in neural populations hinge on an appropriate model of variability. Our work shows that stimulus-dependent changes in pairwise but not in single-cell statistics can differentiate between two widely used models of neuronal variability. Contrasting model predictions with neuronal data provides hints on the noise sources in spiking and provides constraints on statistical models of population activity. Copyright © 2017 the American Physiological Society.

  4. Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2015-09-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.

  5. Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2016-02-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

  6. Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.

    PubMed

    Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret

    2005-01-01

    Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.

  7. 78 FR 70303 - Announcement of Requirements and Registration for the Predict the Influenza Season Challenge

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-11-25

    ... public. Mathematical and statistical models can be useful in predicting the timing and impact of the... applying any mathematical, statistical, or other approach to predictive modeling. This challenge will... Services (HHS) region level(s) in the United States by developing mathematical and statistical models that...

  8. Comparing and combining process-based crop models and statistical models with some implications for climate change

    NASA Astrophysics Data System (ADS)

    Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram

    2017-09-01

    We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.

  9. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  10. Testing prediction methods: Earthquake clustering versus the Poisson model

    USGS Publications Warehouse

    Michael, A.J.

    1997-01-01

    Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.

  11. A simple rain attenuation model for earth-space radio links operating at 10-35 GHz

    NASA Technical Reports Server (NTRS)

    Stutzman, W. L.; Yon, K. M.

    1986-01-01

    The simple attenuation model has been improved from an earlier version and now includes the effect of wave polarization. The model is for the prediction of rain attenuation statistics on earth-space communication links operating in the 10-35 GHz band. Simple calculations produce attenuation values as a function of average rain rate. These together with rain rate statistics (either measured or predicted) can be used to predict annual rain attenuation statistics. In this paper model predictions are compared to measured data from a data base of 62 experiments performed in the U.S., Europe, and Japan. Comparisons are also made to predictions from other models.

  12. Risk prediction model: Statistical and artificial neural network approach

    NASA Astrophysics Data System (ADS)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  13. Predicting lettuce canopy photosynthesis with statistical and neural network models

    NASA Technical Reports Server (NTRS)

    Frick, J.; Precetti, C.; Mitchell, C. A.

    1998-01-01

    An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).

  14. Future missions studies: Combining Schatten's solar activity prediction model with a chaotic prediction model

    NASA Technical Reports Server (NTRS)

    Ashrafi, S.

    1991-01-01

    K. Schatten (1991) recently developed a method for combining his prediction model with our chaotic model. The philosophy behind this combined model and his method of combination is explained. Because the Schatten solar prediction model (KS) uses a dynamo to mimic solar dynamics, accurate prediction is limited to long-term solar behavior (10 to 20 years). The Chaotic prediction model (SA) uses the recently developed techniques of nonlinear dynamics to predict solar activity. It can be used to predict activity only up to the horizon. In theory, the chaotic prediction should be several orders of magnitude better than statistical predictions up to that horizon; beyond the horizon, chaotic predictions would theoretically be just as good as statistical predictions. Therefore, chaos theory puts a fundamental limit on predictability.

  15. Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

    DTIC Science & Technology

    2015-07-15

    Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

  16. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  17. Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation.

    PubMed

    Pearce, Marcus T

    2018-05-11

    Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.

  18. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  19. The proposed 'concordance-statistic for benefit' provided a useful metric when modeling heterogeneous treatment effects.

    PubMed

    van Klaveren, David; Steyerberg, Ewout W; Serruys, Patrick W; Kent, David M

    2018-02-01

    Clinical prediction models that support treatment decisions are usually evaluated for their ability to predict the risk of an outcome rather than treatment benefit-the difference between outcome risk with vs. without therapy. We aimed to define performance metrics for a model's ability to predict treatment benefit. We analyzed data of the Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery (SYNTAX) trial and of three recombinant tissue plasminogen activator trials. We assessed alternative prediction models with a conventional risk concordance-statistic (c-statistic) and a novel c-statistic for benefit. We defined observed treatment benefit by the outcomes in pairs of patients matched on predicted benefit but discordant for treatment assignment. The 'c-for-benefit' represents the probability that from two randomly chosen matched patient pairs with unequal observed benefit, the pair with greater observed benefit also has a higher predicted benefit. Compared to a model without treatment interactions, the SYNTAX score II had improved ability to discriminate treatment benefit (c-for-benefit 0.590 vs. 0.552), despite having similar risk discrimination (c-statistic 0.725 vs. 0.719). However, for the simplified stroke-thrombolytic predictive instrument (TPI) vs. the original stroke-TPI, the c-for-benefit (0.584 vs. 0.578) was similar. The proposed methodology has the potential to measure a model's ability to predict treatment benefit not captured with conventional performance metrics. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Seasonal Drought Prediction: Advances, Challenges, and Future Prospects

    NASA Astrophysics Data System (ADS)

    Hao, Zengchao; Singh, Vijay P.; Xia, Youlong

    2018-03-01

    Drought prediction is of critical importance to early warning for drought managements. This review provides a synthesis of drought prediction based on statistical, dynamical, and hybrid methods. Statistical drought prediction is achieved by modeling the relationship between drought indices of interest and a suite of potential predictors, including large-scale climate indices, local climate variables, and land initial conditions. Dynamical meteorological drought prediction relies on seasonal climate forecast from general circulation models (GCMs), which can be employed to drive hydrological models for agricultural and hydrological drought prediction with the predictability determined by both climate forcings and initial conditions. Challenges still exist in drought prediction at long lead time and under a changing environment resulting from natural and anthropogenic factors. Future research prospects to improve drought prediction include, but are not limited to, high-quality data assimilation, improved model development with key processes related to drought occurrence, optimal ensemble forecast to select or weight ensembles, and hybrid drought prediction to merge statistical and dynamical forecasts.

  1. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  2. Seismic activity prediction using computational intelligence techniques in northern Pakistan

    NASA Astrophysics Data System (ADS)

    Asim, Khawaja M.; Awais, Muhammad; Martínez-Álvarez, F.; Iqbal, Talat

    2017-10-01

    Earthquake prediction study is carried out for the region of northern Pakistan. The prediction methodology includes interdisciplinary interaction of seismology and computational intelligence. Eight seismic parameters are computed based upon the past earthquakes. Predictive ability of these eight seismic parameters is evaluated in terms of information gain, which leads to the selection of six parameters to be used in prediction. Multiple computationally intelligent models have been developed for earthquake prediction using selected seismic parameters. These models include feed-forward neural network, recurrent neural network, random forest, multi layer perceptron, radial basis neural network, and support vector machine. The performance of every prediction model is evaluated and McNemar's statistical test is applied to observe the statistical significance of computational methodologies. Feed-forward neural network shows statistically significant predictions along with accuracy of 75% and positive predictive value of 78% in context of northern Pakistan.

  3. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  4. Testing the Predictive Power of Coulomb Stress on Aftershock Sequences

    NASA Astrophysics Data System (ADS)

    Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.

    2009-12-01

    Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.

  5. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  6. OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

    USGS Publications Warehouse

    Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

    2007-01-01

    The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.

  7. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  8. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    NASA Astrophysics Data System (ADS)

    Zaichik, Leonid I.; Alipchenkov, Vladimir M.

    2009-10-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  9. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  10. Canonical Statistical Model for Maximum Expected Immission of Wire Conductor in an Aperture Enclosure

    NASA Technical Reports Server (NTRS)

    Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.

    2016-01-01

    Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.

  11. Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.

    PubMed

    Wang, Ming; Long, Qi

    2016-09-01

    Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.

  12. Modelling Complexity: Making Sense of Leadership Issues in 14-19 Education

    ERIC Educational Resources Information Center

    Briggs, Ann R. J.

    2008-01-01

    Modelling of statistical data is a well established analytical strategy. Statistical data can be modelled to represent, and thereby predict, the forces acting upon a structure or system. For the rapidly changing systems in the world of education, modelling enables the researcher to understand, to predict and to enable decisions to be based upon…

  13. Referenceless perceptual fog density prediction model

    NASA Astrophysics Data System (ADS)

    Choi, Lark Kwon; You, Jaehee; Bovik, Alan C.

    2014-02-01

    We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and "fog aware" statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.

  14. Prognostic factors in patients with advanced cancer: use of the patient-generated subjective global assessment in survival prediction.

    PubMed

    Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie

    2010-10-01

    To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.

  15. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Hurricane Weather Research and Forecast System ANALYSIS FORECAST MODEL GSI Gridpoint Statistical Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author

  16. Predicting network modules of cell cycle regulators using relative protein abundance statistics.

    PubMed

    Oguz, Cihan; Watson, Layne T; Baumann, William T; Tyson, John J

    2017-02-28

    Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.

  17. Statistical Models for Predicting Automobile Driving Postures for Men and Women Including Effects of Age.

    PubMed

    Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J

    2016-03-01

    Previously published statistical models of driving posture have been effective for vehicle design but have not taken into account the effects of age. The present study developed new statistical models for predicting driving posture. Driving postures of 90 U.S. drivers with a wide range of age and body size were measured in laboratory mockup in nine package conditions. Posture-prediction models for female and male drivers were separately developed by employing a stepwise regression technique using age, body dimensions, vehicle package conditions, and two-way interactions, among other variables. Driving posture was significantly associated with age, and the effects of other variables depended on age. A set of posture-prediction models is presented for women and men. The results are compared with a previously developed model. The present study is the first study of driver posture to include a large cohort of older drivers and the first to report a significant effect of age. The posture-prediction models can be used to position computational human models or crash-test dummies for vehicle design and assessment. © 2015, Human Factors and Ergonomics Society.

  18. United States Air Force Summer Research Program 1991. High School Apprenticeship Program (HSAP) Reports. Volume 11. Phillips Laboratory, Civil Engineering Laboratory

    DTIC Science & Technology

    1992-01-09

    Crystal Polymers Tracy Reed Geophysics Laboratory (GEO) 9 Analysis of Model Output Statistics Thunderstorm Prediction Model Frank Lasley 10...four hours to twenty-four hours. It was predicted that the dogbones would turn brown once they reached the approximate annealing temperature. This was...LYS Hanscom AFB Frank A. Lasley Abstracft. Model Output Statistics (MOS) Thunderstorm prediction information and Service A weather observations

  19. Statistical Prediction of Sea Ice Concentration over Arctic

    NASA Astrophysics Data System (ADS)

    Kim, Jongho; Jeong, Jee-Hoon; Kim, Baek-Min

    2017-04-01

    In this study, a statistical method that predict sea ice concentration (SIC) over the Arctic is developed. We first calculate the Season-reliant Empirical Orthogonal Functions (S-EOFs) of monthly Arctic SIC from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data, which contain the seasonal cycles (12 months long) of dominant SIC anomaly patterns. Then, the current SIC state index is determined by projecting observed SIC anomalies for latest 12 months to the S-EOFs. Assuming the current SIC anomalies follow the spatio-temporal evolution in the S-EOFs, we project the future (upto 12 months) SIC anomalies by multiplying the SI and the corresponding S-EOF and then taking summation. The predictive skill is assessed by hindcast experiments initialized at all the months for 1980-2010. When comparing predictive skill of SIC predicted by statistical model and NCEP CFS v2, the statistical model shows a higher skill in predicting sea ice concentration and extent.

  20. Progress of statistical analysis in biomedical research through the historical review of the development of the Framingham score.

    PubMed

    Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija

    2017-12-02

    The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.

  1. Predictive data modeling of human type II diabetes related statistics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Kristina L.; Jaenisch, Holger M.; Handley, James W.; Albritton, Nathaniel G.

    2009-04-01

    During the course of routine Type II treatment of one of the authors, it was decided to derive predictive analytical Data Models of the daily sampled vital statistics: namely weight, blood pressure, and blood sugar, to determine if the covariance among the observed variables could yield a descriptive equation based model, or better still, a predictive analytical model that could forecast the expected future trend of the variables and possibly eliminate the number of finger stickings required to montior blood sugar levels. The personal history and analysis with resulting models are presented.

  2. Survey of statistical techniques used in validation studies of air pollution prediction models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bornstein, R D; Anderson, S F

    1979-03-01

    Statistical techniques used by meteorologists to validate predictions made by air pollution models are surveyed. Techniques are divided into the following three groups: graphical, tabular, and summary statistics. Some of the practical problems associated with verification are also discussed. Characteristics desired in any validation program are listed and a suggested combination of techniques that possesses many of these characteristics is presented.

  3. Predicting juvenile recidivism: new method, old problems.

    PubMed

    Benda, B B

    1987-01-01

    This prediction study compared three statistical procedures for accuracy using two assessment methods. The criterion is return to a juvenile prison after the first release, and the models tested are logit analysis, predictive attribute analysis, and a Burgess procedure. No significant differences are found between statistics in prediction.

  4. Vibration Response Models of a Stiffened Aluminum Plate Excited by a Shaker

    NASA Technical Reports Server (NTRS)

    Cabell, Randolph H.

    2008-01-01

    Numerical models of structural-acoustic interactions are of interest to aircraft designers and the space program. This paper describes a comparison between two energy finite element codes, a statistical energy analysis code, a structural finite element code, and the experimentally measured response of a stiffened aluminum plate excited by a shaker. Different methods for modeling the stiffeners and the power input from the shaker are discussed. The results show that the energy codes (energy finite element and statistical energy analysis) accurately predicted the measured mean square velocity of the plate. In addition, predictions from an energy finite element code had the best spatial correlation with measured velocities. However, predictions from a considerably simpler, single subsystem, statistical energy analysis model also correlated well with the spatial velocity distribution. The results highlight a need for further work to understand the relationship between modeling assumptions and the prediction results.

  5. Calculation of precise firing statistics in a neural network model

    NASA Astrophysics Data System (ADS)

    Cho, Myoung Won

    2017-08-01

    A precise prediction of neural firing dynamics is requisite to understand the function of and the learning process in a biological neural network which works depending on exact spike timings. Basically, the prediction of firing statistics is a delicate manybody problem because the firing probability of a neuron at a time is determined by the summation over all effects from past firing states. A neural network model with the Feynman path integral formulation is recently introduced. In this paper, we present several methods to calculate firing statistics in the model. We apply the methods to some cases and compare the theoretical predictions with simulation results.

  6. Development of the AFRL Aircrew Perfomance and Protection Data Bank

    DTIC Science & Technology

    2007-12-01

    Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It

  7. Numerical and Qualitative Contrasts of Two Statistical Models ...

    EPA Pesticide Factsheets

    Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-

  8. [Statistical prediction methods in violence risk assessment and its application].

    PubMed

    Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song

    2013-06-01

    It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.

  9. Watershed regressions for pesticides (WARP) for predicting atrazine concentration in Corn Belt streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.

    2011-01-01

    The 95-percent prediction intervals are well within a factor of 10 above and below the predicted concentration statistic. WARP-CB model predictions were within a factor of 5 of the observed concentration statistic for over 90 percent of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. The WARP-CB models provide improved predictions of the probability of exceeding a specified criterion or benchmark for Corn Belt streams draining watersheds with high atrazine use intensities; however, National WARP models should be used for Corn Belt streams where atrazine use intensities are less than 17 kg/km2 of watershed area.

  10. Statistical analysis of modeling error in structural dynamic systems

    NASA Technical Reports Server (NTRS)

    Hasselman, T. K.; Chrostowski, J. D.

    1990-01-01

    The paper presents a generic statistical model of the (total) modeling error for conventional space structures in their launch configuration. Modeling error is defined as the difference between analytical prediction and experimental measurement. It is represented by the differences between predicted and measured real eigenvalues and eigenvectors. Comparisons are made between pre-test and post-test models. Total modeling error is then subdivided into measurement error, experimental error and 'pure' modeling error, and comparisons made between measurement error and total modeling error. The generic statistical model presented in this paper is based on the first four global (primary structure) modes of four different structures belonging to the generic category of Conventional Space Structures (specifically excluding large truss-type space structures). As such, it may be used to evaluate the uncertainty of predicted mode shapes and frequencies, sinusoidal response, or the transient response of other structures belonging to the same generic category.

  11. A hybrid model for predicting carbon monoxide from vehicular exhausts in urban environments

    NASA Astrophysics Data System (ADS)

    Gokhale, Sharad; Khare, Mukesh

    Several deterministic-based air quality models evaluate and predict the frequently occurring pollutant concentration well but, in general, are incapable of predicting the 'extreme' concentrations. In contrast, the statistical distribution models overcome the above limitation of the deterministic models and predict the 'extreme' concentrations. However, the environmental damages are caused by both extremes as well as by the sustained average concentration of pollutants. Hence, the model should predict not only 'extreme' ranges but also the 'middle' ranges of pollutant concentrations, i.e. the entire range. Hybrid modelling is one of the techniques that estimates/predicts the 'entire range' of the distribution of pollutant concentrations by combining the deterministic based models with suitable statistical distribution models ( Jakeman, et al., 1988). In the present paper, a hybrid model has been developed to predict the carbon monoxide (CO) concentration distributions at one of the traffic intersections, Income Tax Office (ITO), in the Delhi city, where the traffic is heterogeneous in nature and meteorology is 'tropical'. The model combines the general finite line source model (GFLSM) as its deterministic, and log logistic distribution (LLD) model, as its statistical components. The hybrid (GFLSM-LLD) model is then applied at the ITO intersection. The results show that the hybrid model predictions match with that of the observed CO concentration data within the 5-99 percentiles range. The model is further validated at different street location, i.e. Sirifort roadway. The validation results show that the model predicts CO concentrations fairly well ( d=0.91) in 10-95 percentiles range. The regulatory compliance is also developed to estimate the probability of exceedance of hourly CO concentration beyond the National Ambient Air Quality Standards (NAAQS) of India. It consists of light vehicles, heavy vehicles, three- wheelers (auto rickshaws) and two-wheelers (scooters, motorcycles, etc).

  12. A review of statistical updating methods for clinical prediction models.

    PubMed

    Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

    2018-01-01

    A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.

  13. Lung Cancer Risk Prediction Model Incorporating Lung Function: Development and Validation in the UK Biobank Prospective Cohort Study.

    PubMed

    Muller, David C; Johansson, Mattias; Brennan, Paul

    2017-03-10

    Purpose Several lung cancer risk prediction models have been developed, but none to date have assessed the predictive ability of lung function in a population-based cohort. We sought to develop and internally validate a model incorporating lung function using data from the UK Biobank prospective cohort study. Methods This analysis included 502,321 participants without a previous diagnosis of lung cancer, predominantly between 40 and 70 years of age. We used flexible parametric survival models to estimate the 2-year probability of lung cancer, accounting for the competing risk of death. Models included predictors previously shown to be associated with lung cancer risk, including sex, variables related to smoking history and nicotine addiction, medical history, family history of lung cancer, and lung function (forced expiratory volume in 1 second [FEV1]). Results During accumulated follow-up of 1,469,518 person-years, there were 738 lung cancer diagnoses. A model incorporating all predictors had excellent discrimination (concordance (c)-statistic [95% CI] = 0.85 [0.82 to 0.87]). Internal validation suggested that the model will discriminate well when applied to new data (optimism-corrected c-statistic = 0.84). The full model, including FEV1, also had modestly superior discriminatory power than one that was designed solely on the basis of questionnaire variables (c-statistic = 0.84 [0.82 to 0.86]; optimism-corrected c-statistic = 0.83; p FEV1 = 3.4 × 10 -13 ). The full model had better discrimination than standard lung cancer screening eligibility criteria (c-statistic = 0.66 [0.64 to 0.69]). Conclusion A risk prediction model that includes lung function has strong predictive ability, which could improve eligibility criteria for lung cancer screening programs.

  14. Machine Learning Predictions of a Multiresolution Climate Model Ensemble

    NASA Astrophysics Data System (ADS)

    Anderson, Gemma J.; Lucas, Donald D.

    2018-05-01

    Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.

  15. Predicting survival of Escherichia coli O157:H7 in dry fermented sausage using artificial neural networks.

    PubMed

    Palanichamy, A; Jayas, D S; Holley, R A

    2008-01-01

    The Canadian Food Inspection Agency required the meat industry to ensure Escherichia coli O157:H7 does not survive (experiences > or = 5 log CFU/g reduction) in dry fermented sausage (salami) during processing after a series of foodborne illness outbreaks resulting from this pathogenic bacterium occurred. The industry is in need of an effective technique like predictive modeling for estimating bacterial viability, because traditional microbiological enumeration is a time-consuming and laborious method. The accuracy and speed of artificial neural networks (ANNs) for this purpose is an attractive alternative (developed from predictive microbiology), especially for on-line processing in industry. Data from a study of interactive effects of different levels of pH, water activity, and the concentrations of allyl isothiocyanate at various times during sausage manufacture in reducing numbers of E. coli O157:H7 were collected. Data were used to develop predictive models using a general regression neural network (GRNN), a form of ANN, and a statistical linear polynomial regression technique. Both models were compared for their predictive error, using various statistical indices. GRNN predictions for training and test data sets had less serious errors when compared with the statistical model predictions. GRNN models were better and slightly better for training and test sets, respectively, than was the statistical model. Also, GRNN accurately predicted the level of allyl isothiocyanate required, ensuring a 5-log reduction, when an appropriate production set was created by interpolation. Because they are simple to generate, fast, and accurate, ANN models may be of value for industrial use in dry fermented sausage manufacture to reduce the hazard associated with E. coli O157:H7 in fresh beef and permit production of consistently safe products from this raw material.

  16. A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

    NASA Astrophysics Data System (ADS)

    Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

    2018-04-01

    In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.

  17. A Review of Statistical Failure Time Models with Application of a Discrete Hazard Based Model to 1Cr1Mo-0.25V Steel for Turbine Rotors and Shafts

    PubMed Central

    2017-01-01

    Producing predictions of the probabilistic risks of operating materials for given lengths of time at stated operating conditions requires the assimilation of existing deterministic creep life prediction models (that only predict the average failure time) with statistical models that capture the random component of creep. To date, these approaches have rarely been combined to achieve this objective. The first half of this paper therefore provides a summary review of some statistical models to help bridge the gap between these two approaches. The second half of the paper illustrates one possible assimilation using 1Cr1Mo-0.25V steel. The Wilshire equation for creep life prediction is integrated into a discrete hazard based statistical model—the former being chosen because of its novelty and proven capability in accurately predicting average failure times and the latter being chosen because of its flexibility in modelling the failure time distribution. Using this model it was found that, for example, if this material had been in operation for around 15 years at 823 K and 130 MPa, the chances of failure in the next year is around 35%. However, if this material had been in operation for around 25 years, the chance of failure in the next year rises dramatically to around 80%. PMID:29039773

  18. Web 2.0 Articles: Content Analysis and a Statistical Model to Predict Recognition of the Need for New Instructional Design Strategies

    ERIC Educational Resources Information Center

    Liu, Leping; Maddux, Cleborne D.

    2008-01-01

    This article presents a study of Web 2.0 articles intended to (a) analyze the content of what is written and (b) develop a statistical model to predict whether authors' write about the need for new instructional design strategies and models. Eighty-eight technology articles were subjected to lexical analysis and a logistic regression model was…

  19. Bootstrap study of genome-enabled prediction reliabilities using haplotype blocks across Nordic Red cattle breeds.

    PubMed

    Cuyabano, B C D; Su, G; Rosa, G J M; Lund, M S; Gianola, D

    2015-10-01

    This study compared the accuracy of genome-enabled prediction models using individual single nucleotide polymorphisms (SNP) or haplotype blocks as covariates when using either a single breed or a combined population of Nordic Red cattle. The main objective was to compare predictions of breeding values of complex traits using a combined training population with haplotype blocks, with predictions using a single breed as training population and individual SNP as predictors. To compare the prediction reliabilities, bootstrap samples were taken from the test data set. With the bootstrapped samples of prediction reliabilities, we built and graphed confidence ellipses to allow comparisons. Finally, measures of statistical distances were used to calculate the gain in predictive ability. Our analyses are innovative in the context of assessment of predictive models, allowing a better understanding of prediction reliabilities and providing a statistical basis to effectively calibrate whether one prediction scenario is indeed more accurate than another. An ANOVA indicated that use of haplotype blocks produced significant gains mainly when Bayesian mixture models were used but not when Bayesian BLUP was fitted to the data. Furthermore, when haplotype blocks were used to train prediction models in a combined Nordic Red cattle population, we obtained up to a statistically significant 5.5% average gain in prediction accuracy, over predictions using individual SNP and training the model with a single breed. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  20. Sex-Specific Prediction Models for Sleep Apnea From the Hispanic Community Health Study/Study of Latinos.

    PubMed

    Shah, Neomi; Hanna, David B; Teng, Yanping; Sotres-Alvarez, Daniela; Hall, Martica; Loredo, Jose S; Zee, Phyllis; Kim, Mimi; Yaggi, H Klar; Redline, Susan; Kaplan, Robert C

    2016-06-01

    We developed and validated the first-ever sleep apnea (SA) risk calculator in a large population-based cohort of Hispanic/Latino subjects. Cross-sectional data on adults from the Hispanic Community Health Study/Study of Latinos (2008-2011) were analyzed. Subjective and objective sleep measurements were obtained. Clinically significant SA was defined as an apnea-hypopnea index ≥ 15 events per hour. Using logistic regression, four prediction models were created: three sex-specific models (female-only, male-only, and a sex × covariate interaction model to allow differential predictor effects), and one overall model with sex included as a main effect only. Models underwent 10-fold cross-validation and were assessed by using the C statistic. SA and its predictive variables; a total of 17 variables were considered. A total of 12,158 participants had complete sleep data available; 7,363 (61%) were women. The population-weighted prevalence of SA (apnea-hypopnea index ≥ 15 events per hour) was 6.1% in female subjects and 13.5% in male subjects. Male-only (C statistic, 0.808) and female-only (C statistic, 0.836) prediction models had the same predictor variables (ie, age, BMI, self-reported snoring). The sex-interaction model (C statistic, 0.836) contained sex, age, age × sex, BMI, BMI × sex, and self-reported snoring. The final overall model (C statistic, 0.832) contained age, BMI, snoring, and sex. We developed two websites for our SA risk calculator: one in English (https://www.montefiore.org/sleepapneariskcalc.html) and another in Spanish (http://www.montefiore.org/sleepapneariskcalc-es.html). We created an internally validated, highly discriminating, well-calibrated, and parsimonious prediction model for SA. Contrary to the study hypothesis, the variables did not have different predictive magnitudes in male and female subjects. Copyright © 2016 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.

  1. Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

    PubMed

    White, H; Racine, J

    2001-01-01

    We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.

  2. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    PubMed

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Comparison of two statistical methods for probability prediction of monthly precipitation during summer over Huaihe River Basin in China, and applications in runoff prediction based on hydrological model

    NASA Astrophysics Data System (ADS)

    Liu, L.; Du, L.; Liao, Y.

    2017-12-01

    Based on the ensemble hindcast dataset of CSM1.1m by NCC, CMA, Bayesian merging models and a two-step statistical model are developed and employed to predict monthly grid/station precipitation in the Huaihe River China during summer at the lead-time of 1 to 3 months. The hindcast datasets span a period of 1991 to 2014. The skill of the two models is evaluated using area under the ROC curve (AUC) in a leave-one-out cross-validation framework, and is compared to the skill of CSM1.1m. CSM1.1m has highest skill for summer precipitation from April while lowest from May, and has highest skill for precipitation in June but lowest for precipitation in July. Compared with raw outputs of climate models, some schemes of the two approaches have higher skill for the prediction from March and May, but almost schemes have lower skill for prediction from April. Compared to two-step approach, one sampling scheme of Bayesian merging approach has higher skill for the prediction from March, but has lower skill from May. The results suggest that there is potential to apply the two statistical models for monthly precipitation forecast in summer from March and from May over Huaihe River basin, but is potential to apply CSM1.1m forecast from April. Finally, the summer runoff during 1991 to 2014 is simulated based on one hydrological model using the climate hindcast of CSM1.1m and the two statistical models.

  4. Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

    This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less

  5. Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population

    PubMed Central

    2013-01-01

    Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963

  6. External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches.

    PubMed

    Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris

    2018-05-01

    Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.

  7. A Stochastic Model of Space-Time Variability of Mesoscale Rainfall: Statistics of Spatial Averages

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, Thomas L.

    2003-01-01

    A characteristic feature of rainfall statistics is that they depend on the space and time scales over which rain data are averaged. A previously developed spectral model of rain statistics that is designed to capture this property, predicts power law scaling behavior for the second moment statistics of area-averaged rain rate on the averaging length scale L as L right arrow 0. In the present work a more efficient method of estimating the model parameters is presented, and used to fit the model to the statistics of area-averaged rain rate derived from gridded radar precipitation data from TOGA COARE. Statistical properties of the data and the model predictions are compared over a wide range of averaging scales. An extension of the spectral model scaling relations to describe the dependence of the average fraction of grid boxes within an area containing nonzero rain (the "rainy area fraction") on the grid scale L is also explored.

  8. A Model for Investigating Predictive Validity at Highly Selective Institutions.

    ERIC Educational Resources Information Center

    Gross, Alan L.; And Others

    A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…

  9. Water quality management using statistical analysis and time-series prediction model

    NASA Astrophysics Data System (ADS)

    Parmar, Kulwinder Singh; Bhardwaj, Rashmi

    2014-12-01

    This paper deals with water quality management using statistical analysis and time-series prediction model. The monthly variation of water quality standards has been used to compare statistical mean, median, mode, standard deviation, kurtosis, skewness, coefficient of variation at Yamuna River. Model validated using R-squared, root mean square error, mean absolute percentage error, maximum absolute percentage error, mean absolute error, maximum absolute error, normalized Bayesian information criterion, Ljung-Box analysis, predicted value and confidence limits. Using auto regressive integrated moving average model, future water quality parameters values have been estimated. It is observed that predictive model is useful at 95 % confidence limits and curve is platykurtic for potential of hydrogen (pH), free ammonia, total Kjeldahl nitrogen, dissolved oxygen, water temperature (WT); leptokurtic for chemical oxygen demand, biochemical oxygen demand. Also, it is observed that predicted series is close to the original series which provides a perfect fit. All parameters except pH and WT cross the prescribed limits of the World Health Organization /United States Environmental Protection Agency, and thus water is not fit for drinking, agriculture and industrial use.

  10. Watershed regressions for pesticides (warp) models for predicting atrazine concentrations in Corn Belt streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.

    2012-01-01

    Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.

  11. Assessing the prediction accuracy of a cure model for censored survival data with long-term survivors: Application to breast cancer data.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro

    2017-01-01

    The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.

  12. Statistical validation of predictive TRANSP simulations of baseline discharges in preparation for extrapolation to JET D-T

    NASA Astrophysics Data System (ADS)

    Kim, Hyun-Tae; Romanelli, M.; Yuan, X.; Kaye, S.; Sips, A. C. C.; Frassinetti, L.; Buchanan, J.; Contributors, JET

    2017-06-01

    This paper presents for the first time a statistical validation of predictive TRANSP simulations of plasma temperature using two transport models, GLF23 and TGLF, over a database of 80 baseline H-mode discharges in JET-ILW. While the accuracy of the predicted T e with TRANSP-GLF23 is affected by plasma collisionality, the dependency of predictions on collisionality is less significant when using TRANSP-TGLF, indicating that the latter model has a broader applicability across plasma regimes. TRANSP-TGLF also shows a good matching of predicted T i with experimental measurements allowing for a more accurate prediction of the neutron yields. The impact of input data and assumptions prescribed in the simulations are also investigated in this paper. The statistical validation and the assessment of uncertainty level in predictive TRANSP simulations for JET-ILW-DD will constitute the basis for the extrapolation to JET-ILW-DT experiments.

  13. Enhancing seasonal climate prediction capacity for the Pacific countries

    NASA Astrophysics Data System (ADS)

    Kuleshov, Y.; Jones, D.; Hendon, H.; Charles, A.; Cottrill, A.; Lim, E.-P.; Langford, S.; de Wit, R.; Shelton, K.

    2012-04-01

    Seasonal and inter-annual climate variability is a major factor in determining the vulnerability of many Pacific Island Countries to climate change and there is need to improve weekly to seasonal range climate prediction capabilities beyond what is currently available from statistical models. In the seasonal climate prediction project under the Australian Government's Pacific Adaptation Strategy Assistance Program (PASAP), we describe a comprehensive project to strengthen the climate prediction capacities in National Meteorological Services in 14 Pacific Island Countries and East Timor. The intent is particularly to reduce the vulnerability of current services to a changing climate, and improve the overall level of information available assist with managing climate variability. Statistical models cannot account for aspects of climate variability and change that are not represented in the historical record. In contrast, dynamical physics-based models implicitly include the effects of a changing climate whatever its character or cause and can predict outcomes not seen previously. The transition from a statistical to a dynamical prediction system provides more valuable and applicable climate information to a wide range of climate sensitive sectors throughout the countries of the Pacific region. In this project, we have developed seasonal climate outlooks which are based upon the current dynamical model POAMA (Predictive Ocean-Atmosphere Model for Australia) seasonal forecast system. At present, meteorological services of the Pacific Island Countries largely employ statistical models for seasonal outlooks. Outcomes of the PASAP project enhanced capabilities of the Pacific Island Countries in seasonal prediction providing National Meteorological Services with an additional tool to analyse meteorological variables such as sea surface temperatures, air temperature, pressure and rainfall using POAMA outputs and prepare more accurate seasonal climate outlooks.

  14. Multisite external validation of a risk prediction model for the diagnosis of blood stream infections in febrile pediatric oncology patients without severe neutropenia.

    PubMed

    Esbenshade, Adam J; Zhao, Zhiguo; Aftandilian, Catherine; Saab, Raya; Wattier, Rachel L; Beauchemin, Melissa; Miller, Tamara P; Wilkes, Jennifer J; Kelly, Michael J; Fernbach, Alison; Jeng, Michael; Schwartz, Cindy L; Dvorak, Christopher C; Shyr, Yu; Moons, Karl G M; Sulis, Maria-Luisa; Friedman, Debra L

    2017-10-01

    Pediatric oncology patients are at an increased risk of invasive bacterial infection due to immunosuppression. The risk of such infection in the absence of severe neutropenia (absolute neutrophil count ≥ 500/μL) is not well established and a validated prediction model for blood stream infection (BSI) risk offers clinical usefulness. A 6-site retrospective external validation was conducted using a previously published risk prediction model for BSI in febrile pediatric oncology patients without severe neutropenia: the Esbenshade/Vanderbilt (EsVan) model. A reduced model (EsVan2) excluding 2 less clinically reliable variables also was created using the initial EsVan model derivative cohort, and was validated using all 5 external validation cohorts. One data set was used only in sensitivity analyses due to missing some variables. From the 5 primary data sets, there were a total of 1197 febrile episodes and 76 episodes of bacteremia. The overall C statistic for predicting bacteremia was 0.695, with a calibration slope of 0.50 for the original model and a calibration slope of 1.0 when recalibration was applied to the model. The model performed better in predicting high-risk bacteremia (gram-negative or Staphylococcus aureus infection) versus BSI alone, with a C statistic of 0.801 and a calibration slope of 0.65. The EsVan2 model outperformed the EsVan model across data sets with a C statistic of 0.733 for predicting BSI and a C statistic of 0.841 for high-risk BSI. The results of this external validation demonstrated that the EsVan and EsVan2 models are able to predict BSI across multiple performance sites and, once validated and implemented prospectively, could assist in decision making in clinical practice. Cancer 2017;123:3781-3790. © 2017 American Cancer Society. © 2017 American Cancer Society.

  15. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. Part 2: Theoretical development of a dynamic model and application to rain fade durations and tolerable control delays for fade countermeasures

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1987-01-01

    A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.

  16. A two-component rain model for the prediction of attenuation and diversity improvement

    NASA Technical Reports Server (NTRS)

    Crane, R. K.

    1982-01-01

    A new model was developed to predict attenuation statistics for a single Earth-satellite or terrestrial propagation path. The model was extended to provide predictions of the joint occurrences of specified or higher attenuation values on two closely spaced Earth-satellite paths. The joint statistics provide the information required to obtain diversity gain or diversity advantage estimates. The new model is meteorologically based. It was tested against available Earth-satellite beacon observations and terrestrial path measurements. The model employs the rain climate region descriptions of the Global rain model. The rms deviation between the predicted and observed attenuation values for the terrestrial path data was 35 percent, a result consistent with the expectations of the Global model when the rain rate distribution for the path is not used in the calculation. Within the United States the rms deviation between measurement and prediction was 36 percent but worldwide it was 79 percent.

  17. Negative impacts of climate change on cereal yields: statistical evidence from France

    NASA Astrophysics Data System (ADS)

    Gammans, Matthew; Mérel, Pierre; Ortiz-Bobea, Ariel

    2017-05-01

    In several world regions, climate change is predicted to negatively affect crop productivity. The recent statistical yield literature emphasizes the importance of flexibly accounting for the distribution of growing-season temperature to better represent the effects of warming on crop yields. We estimate a flexible statistical yield model using a long panel from France to investigate the impacts of temperature and precipitation changes on wheat and barley yields. Winter varieties appear sensitive to extreme cold after planting. All yields respond negatively to an increase in spring-summer temperatures and are a decreasing function of precipitation about historical precipitation levels. Crop yields are predicted to be negatively affected by climate change under a wide range of climate models and emissions scenarios. Under warming scenario RCP8.5 and holding growing areas and technology constant, our model ensemble predicts a 21.0% decline in winter wheat yield, a 17.3% decline in winter barley yield, and a 33.6% decline in spring barley yield by the end of the century. Uncertainty from climate projections dominates uncertainty from the statistical model. Finally, our model predicts that continuing technology trends would counterbalance most of the effects of climate change.

  18. Predicting survival of de novo metastatic breast cancer in Asian women: systematic review and validation study.

    PubMed

    Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G M; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M

    2014-01-01

    In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48-0.53) to 0.63 (95% CI, 0.60-0.66). The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R.; Brooks, Dusty Marie

    In pressurized water reactors, the prevention, detection, and repair of cracks within dissimilar metal welds is essential to ensure proper plant functionality and safety. Weld residual stresses, which are difficult to model and cannot be directly measured, contribute to the formation and growth of cracks due to primary water stress corrosion cracking. Additionally, the uncertainty in weld residual stress measurements and modeling predictions is not well understood, further complicating the prediction of crack evolution. The purpose of this document is to develop methodology to quantify the uncertainty associated with weld residual stress that can be applied to modeling predictions andmore » experimental measurements. Ultimately, the results can be used to assess the current state of uncertainty and to build confidence in both modeling and experimental procedures. The methodology consists of statistically modeling the variation in the weld residual stress profiles using functional data analysis techniques. Uncertainty is quantified using statistical bounds (e.g. confidence and tolerance bounds) constructed with a semi-parametric bootstrap procedure. Such bounds describe the range in which quantities of interest, such as means, are expected to lie as evidenced by the data. The methodology is extended to provide direct comparisons between experimental measurements and modeling predictions by constructing statistical confidence bounds for the average difference between the two quantities. The statistical bounds on the average difference can be used to assess the level of agreement between measurements and predictions. The methodology is applied to experimental measurements of residual stress obtained using two strain relief measurement methods and predictions from seven finite element models developed by different organizations during a round robin study.« less

  20. Predicting adsorptive removal of chlorophenol from aqueous solution using artificial intelligence based modeling approaches.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali

    2013-04-01

    The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.

  1. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  2. A neighborhood statistics model for predicting stream pathogen indicator levels.

    PubMed

    Pandey, Pramod K; Pasternack, Gregory B; Majumder, Mahbubul; Soupir, Michelle L; Kaiser, Mark S

    2015-03-01

    Because elevated levels of water-borne Escherichia coli in streams are a leading cause of water quality impairments in the U.S., water-quality managers need tools for predicting aqueous E. coli levels. Presently, E. coli levels may be predicted using complex mechanistic models that have a high degree of unchecked uncertainty or simpler statistical models. To assess spatio-temporal patterns of instream E. coli levels, herein we measured E. coli, a pathogen indicator, at 16 sites (at four different times) within the Squaw Creek watershed, Iowa, and subsequently, the Markov Random Field model was exploited to develop a neighborhood statistics model for predicting instream E. coli levels. Two observed covariates, local water temperature (degrees Celsius) and mean cross-sectional depth (meters), were used as inputs to the model. Predictions of E. coli levels in the water column were compared with independent observational data collected from 16 in-stream locations. The results revealed that spatio-temporal averages of predicted and observed E. coli levels were extremely close. Approximately 66 % of individual predicted E. coli concentrations were within a factor of 2 of the observed values. In only one event, the difference between prediction and observation was beyond one order of magnitude. The mean of all predicted values at 16 locations was approximately 1 % higher than the mean of the observed values. The approach presented here will be useful while assessing instream contaminations such as pathogen/pathogen indicator levels at the watershed scale.

  3. Developing Risk Prediction Models for Kidney Injury and Assessing Incremental Value for Novel Biomarkers

    PubMed Central

    Kerr, Kathleen F.; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G.

    2014-01-01

    The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients’ risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. PMID:24855282

  4. Predicting risk for portal vein thrombosis in acute pancreatitis patients: A comparison of radical basis function artificial neural network and logistic regression models.

    PubMed

    Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei

    2017-06-01

    To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Assessment of the long-lead probabilistic prediction for the Asian summer monsoon precipitation (1983-2011) based on the APCC multimodel system and a statistical model

    NASA Astrophysics Data System (ADS)

    Sohn, Soo-Jin; Min, Young-Mi; Lee, June-Yi; Tam, Chi-Yung; Kang, In-Sik; Wang, Bin; Ahn, Joong-Bae; Yamagata, Toshio

    2012-02-01

    The performance of the probabilistic multimodel prediction (PMMP) system of the APEC Climate Center (APCC) in predicting the Asian summer monsoon (ASM) precipitation at a four-month lead (with February initial condition) was compared with that of a statistical model using hindcast data for 1983-2005 and real-time forecasts for 2006-2011. Particular attention was paid to probabilistic precipitation forecasts for the boreal summer after the mature phase of El Niño and Southern Oscillation (ENSO). Taking into account the fact that coupled models' skill for boreal spring and summer precipitation mainly comes from their ability to capture ENSO teleconnection, we developed the statistical model using linear regression with the preceding winter ENSO condition as the predictor. Our results reveal several advantages and disadvantages in both forecast systems. First, the PMMP appears to have higher skills for both above- and below-normal categories in the six-year real-time forecast period, whereas the cross-validated statistical model has higher skills during the 23-year hindcast period. This implies that the cross-validated statistical skill may be overestimated. Second, the PMMP is the better tool for capturing atypical ENSO (or non-canonical ENSO related) teleconnection, which has affected the ASM precipitation during the early 1990s and in the recent decade. Third, the statistical model is more sensitive to the ENSO phase and has an advantage in predicting the ASM precipitation after the mature phase of La Niña.

  6. Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

    NASA Astrophysics Data System (ADS)

    Xu, Y.; Jones, A. D.; Rhoades, A.

    2017-12-01

    Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for by each predictor. By comparing the statistical model derived from true world information and modeled world information, we assess the errors lying in the physics of the numerical models. This work provides a unique insight to assess the performance of numerical climate models, and can be used to guide improvement of precipitation prediction.

  7. What can 35 years and over 700,000 measurements tell us about noise exposure in the mining industry?

    PubMed

    Roberts, Benjamin; Sun, Kan; Neitzel, Richard L

    2017-01-01

    To analyse over 700,000 cross-sectional measurements from the Mine Safety and Health Administration (MHSA) and develop statistical models to predict noise exposure for a worker. Descriptive statistics were used to summarise the data. Two linear regression models were used to predict noise exposure based on MSHA-permissible exposure limit (PEL) and action level (AL), respectively. Twofold cross validation was used to compare the exposure estimates from the models to actual measurement. The mean difference and t-statistic was calculated for each job title to determine whether the model predictions were significantly different from the actual data. Measurements were acquired from MSHA through a Freedom of Information Act request. From 1979 to 2014, noise exposure has decreased. Measurements taken before the implementation of MSHA's revised noise regulation in 2000 were on average 4.5 dBA higher than after the law was implemented. Both models produced exposure predictions that were less than 1 dBA different than the holdout data. Overall noise levels in mines have been decreasing. However, this decrease has not been uniform across all mining sectors. The exposure predictions from the model will be useful to help predict hearing loss in workers in the mining industry.

  8. The Role of Feature Selection and Statistical Weighting in Predicting In Vivo Toxicity Using In Vitro Assay and QSAR Data (SOT)

    EPA Science Inventory

    Our study assesses the value of both in vitro assay and quantitative structure activity relationship (QSAR) data in predicting in vivo toxicity using numerous statistical models and approaches to process the data. Our models are built on datasets of (i) 586 chemicals for which bo...

  9. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Review of Nearshore Morphologic Prediction

    NASA Astrophysics Data System (ADS)

    Plant, N. G.; Dalyander, S.; Long, J.

    2014-12-01

    The evolution of the world's erodible coastlines will determine the balance between the benefits and costs associated with human and ecological utilization of shores, beaches, dunes, barrier islands, wetlands, and estuaries. So, we would like to predict coastal evolution to guide management and planning of human and ecological response to coastal changes. After decades of research investment in data collection, theoretical and statistical analysis, and model development we have a number of empirical, statistical, and deterministic models that can predict the evolution of the shoreline, beaches, dunes, and wetlands over time scales of hours to decades, and even predict the evolution of geologic strata over the course of millennia. Comparisons of predictions to data have demonstrated that these models can have meaningful predictive skill. But these comparisons also highlight the deficiencies in fundamental understanding, formulations, or data that are responsible for prediction errors and uncertainty. Here, we review a subset of predictive models of the nearshore to illustrate tradeoffs in complexity, predictive skill, and sensitivity to input data and parameterization errors. We identify where future improvement in prediction skill will result from improved theoretical understanding, and data collection, and model-data assimilation.

  11. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    NASA Astrophysics Data System (ADS)

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  12. Survival Regression Modeling Strategies in CVD Prediction.

    PubMed

    Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-04-01

    A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.

  13. Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

    PubMed

    Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

    2013-06-01

    In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Use of statistical and neural net approaches in predicting toxicity of chemicals.

    PubMed

    Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

    2000-01-01

    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.

  15. Examination of Solar Cycle Statistical Model and New Prediction of Solar Cycle 23

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Wilson, John W.

    2000-01-01

    Sunspot numbers in the current solar cycle 23 were estimated by using a statistical model with the accumulating cycle sunspot data based on the odd-even behavior of historical sunspot cycles from 1 to 22. Since cycle 23 has progressed and the accurate solar minimum occurrence has been defined, the statistical model is validated by comparing the previous prediction with the new measured sunspot number; the improved sunspot projection in short range of future time is made accordingly. The current cycle is expected to have a moderate level of activity. Errors of this model are shown to be self-correcting as cycle observations become available.

  16. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2012-09-30

    data collected by Paramo and Gerlotto. The data were consistent with the Anderson model in that both the data and model had a mode in the...10.1098/rsfs.2012.0027 [published, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (submitted), “Modeling statistics of fish school

  17. An order statistics approach to the halo model for galaxies

    NASA Astrophysics Data System (ADS)

    Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.

    2017-04-01

    We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.

  18. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  19. Predictive Model for the Design of Zwitterionic Polymer Brushes: A Statistical Design of Experiments Approach.

    PubMed

    Kumar, Ramya; Lahann, Joerg

    2016-07-06

    The performance of polymer interfaces in biology is governed by a wide spectrum of interfacial properties. With the ultimate goal of identifying design parameters for stem cell culture coatings, we developed a statistical model that describes the dependence of brush properties on surface-initiated polymerization (SIP) parameters. Employing a design of experiments (DOE) approach, we identified operating boundaries within which four gel architecture regimes can be realized, including a new regime of associated brushes in thin films. Our statistical model can accurately predict the brush thickness and the degree of intermolecular association of poly[{2-(methacryloyloxy) ethyl} dimethyl-(3-sulfopropyl) ammonium hydroxide] (PMEDSAH), a previously reported synthetic substrate for feeder-free and xeno-free culture of human embryonic stem cells. DOE-based multifunctional predictions offer a powerful quantitative framework for designing polymer interfaces. For example, model predictions can be used to decrease the critical thickness at which the wettability transition occurs by simply increasing the catalyst quantity from 1 to 3 mol %.

  20. Prediction of the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors.

    PubMed

    Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro

    2014-01-01

    The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.

  1. Use of model calibration to achieve high accuracy in analysis of computer networks

    DOEpatents

    Frogner, Bjorn; Guarro, Sergio; Scharf, Guy

    2004-05-11

    A system and method are provided for creating a network performance prediction model, and calibrating the prediction model, through application of network load statistical analyses. The method includes characterizing the measured load on the network, which may include background load data obtained over time, and may further include directed load data representative of a transaction-level event. Probabilistic representations of load data are derived to characterize the statistical persistence of the network performance variability and to determine delays throughout the network. The probabilistic representations are applied to the network performance prediction model to adapt the model for accurate prediction of network performance. Certain embodiments of the method and system may be used for analysis of the performance of a distributed application characterized as data packet streams.

  2. Spatial Statistical Network Models for Stream and River Temperature in the Chesapeake Bay Watershed, USA

    EPA Science Inventory

    Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...

  3. A statistical model including age to predict passenger postures in the rear seats of automobiles.

    PubMed

    Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J

    2016-06-01

    Few statistical models of rear seat passenger posture have been published, and none has taken into account the effects of occupant age. This study developed new statistical models for predicting passenger postures in the rear seats of automobiles. Postures of 89 adults with a wide range of age and body size were measured in a laboratory mock-up in seven seat configurations. Posture-prediction models for female and male passengers were separately developed by stepwise regression using age, body dimensions, seat configurations and two-way interactions as potential predictors. Passenger posture was significantly associated with age and the effects of other two-way interaction variables depended on age. A set of posture-prediction models are presented for women and men, and the prediction results are compared with previously published models. This study is the first study of passenger posture to include a large cohort of older passengers and the first to report a significant effect of age for adults. The presented models can be used to position computational and physical human models for vehicle design and assessment. Practitioner Summary: The significant effects of age, body dimensions and seat configuration on rear seat passenger posture were identified. The models can be used to accurately position computational human models or crash test dummies for older passengers in known rear seat configurations.

  4. A New Scoring System to Predict the Risk for High-risk Adenoma and Comparison of Existing Risk Calculators.

    PubMed

    Murchie, Brent; Tandon, Kanwarpreet; Hakim, Seifeldin; Shah, Kinchit; O'Rourke, Colin; Castro, Fernando J

    2017-04-01

    Colorectal cancer (CRC) screening guidelines likely over-generalizes CRC risk, 35% of Americans are not up to date with screening, and there is growing incidence of CRC in younger patients. We developed a practical prediction model for high-risk colon adenomas in an average-risk population, including an expanded definition of high-risk polyps (≥3 nonadvanced adenomas), exposing higher than average-risk patients. We also compared results with previously created calculators. Patients aged 40 to 59 years, undergoing first-time average-risk screening or diagnostic colonoscopies were evaluated. Risk calculators for advanced adenomas and high-risk adenomas were created based on age, body mass index, sex, race, and smoking history. Previously established calculators with similar risk factors were selected for comparison of concordance statistic (c-statistic) and external validation. A total of 5063 patients were included. Advanced adenomas, and high-risk adenomas were seen in 5.7% and 7.4% of the patient population, respectively. The c-statistic for our calculator was 0.639 for the prediction of advanced adenomas, and 0.650 for high-risk adenomas. When applied to our population, all previous models had lower c-statistic results although one performed similarly. Our model compares favorably to previously established prediction models. Age and body mass index were used as continuous variables, likely improving the c-statistic. It also reports absolute predictive probabilities of advanced and high-risk polyps, allowing for more individualized risk assessment of CRC.

  5. Can spatial statistical river temperature models be transferred between catchments?

    NASA Astrophysics Data System (ADS)

    Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.

    2017-09-01

    There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across multiple catchments and larger spatial scales.

  6. Model identification using stochastic differential equation grey-box models in diabetes.

    PubMed

    Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik

    2013-03-01

    The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.

  7. Validation of statistical predictive models meant to select melanoma patients for sentinel lymph node biopsy.

    PubMed

    Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G

    2012-01-01

    To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.

  8. The extension of total gain (TG) statistic in survival models: properties and applications.

    PubMed

    Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B

    2015-07-01

    The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.

  9. Comparison and validation of statistical methods for predicting power outage durations in the event of hurricanes.

    PubMed

    Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M

    2011-12-01

    This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.

  10. Predicting Survival of De Novo Metastatic Breast Cancer in Asian Women: Systematic Review and Validation Study

    PubMed Central

    Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G. M.; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M.

    2014-01-01

    Background In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. Materials and Methods We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). Results We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48–0.53) to 0.63 (95% CI, 0.60–0.66). Conclusion The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making. PMID:24695692

  11. The Real World Significance of Performance Prediction

    ERIC Educational Resources Information Center

    Pardos, Zachary A.; Wang, Qing Yang; Trivedi, Shubhendu

    2012-01-01

    In recent years, the educational data mining and user modeling communities have been aggressively introducing models for predicting student performance on external measures such as standardized tests as well as within-tutor performance. While these models have brought statistically reliable improvement to performance prediction, the real world…

  12. Evaluating pictogram prediction in a location-aware augmentative and alternative communication system.

    PubMed

    Garcia, Luís Filipe; de Oliveira, Luís Caldas; de Matos, David Martins

    2016-01-01

    This study compared the performance of two statistical location-aware pictogram prediction mechanisms, with an all-purpose (All) pictogram prediction mechanism, having no location knowledge. The All approach had a unique language model under all locations. One of the location-aware alternatives, the location-specific (Spec) approach, made use of specific language models for pictogram prediction in each location of interest. The other location-aware approach resulted from combining the Spec and the All approaches, and was designated the mixed approach (Mix). In this approach, the language models acquired knowledge from all locations, but a higher relevance was assigned to the vocabulary from the associated location. Results from simulations showed that the Mix and Spec approaches could only outperform the baseline in a statistically significant way if pictogram users reuse more than 50% and 75% of their sentences, respectively. Under low sentence reuse conditions there were no statistically significant differences between the location-aware approaches and the All approach. Under these conditions, the Mix approach performed better than the Spec approach in a statistically significant way.

  13. Guidelines 13 and 14—Prediction uncertainty

    USGS Publications Warehouse

    Hill, Mary C.; Tiedeman, Claire

    2005-01-01

    An advantage of using optimization for model development and calibration is that optimization provides methods for evaluating and quantifying prediction uncertainty. Both deterministic and statistical methods can be used. Guideline 13 discusses using regression and post-audits, which we classify as deterministic methods. Guideline 14 discusses inferential statistics and Monte Carlo methods, which we classify as statistical methods.

  14. Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase

    NASA Astrophysics Data System (ADS)

    Andersson, C. David; Hillgren, J. Mikael; Lindgren, Cecilia; Qian, Weixing; Akfur, Christine; Berg, Lotta; Ekström, Fredrik; Linusson, Anna

    2015-03-01

    Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.

  15. Does objective cluster analysis serve as a useful precursor to seasonal precipitation prediction at local scale? Application to western Ethiopia

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Moges, Semu; Block, Paul

    2018-01-01

    Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.

  16. Developing risk prediction models for kidney injury and assessing incremental value for novel biomarkers.

    PubMed

    Kerr, Kathleen F; Meisner, Allison; Thiessen-Philbrook, Heather; Coca, Steven G; Parikh, Chirag R

    2014-08-07

    The field of nephrology is actively involved in developing biomarkers and improving models for predicting patients' risks of AKI and CKD and their outcomes. However, some important aspects of evaluating biomarkers and risk models are not widely appreciated, and statistical methods are still evolving. This review describes some of the most important statistical concepts for this area of research and identifies common pitfalls. Particular attention is paid to metrics proposed within the last 5 years for quantifying the incremental predictive value of a new biomarker. Copyright © 2014 by the American Society of Nephrology.

  17. A Statistical Framework for Analyzing Cyber Threats

    DTIC Science & Technology

    defender cares most about the attacks against certain ports or services). The grey-box statistical framework formulates a new methodology of Cybersecurity ...the design of prediction models. Our research showed that the grey-box framework is effective in predicting cybersecurity situational awareness.

  18. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment.

    PubMed

    Berkes, Pietro; Orbán, Gergo; Lengyel, Máté; Fiser, József

    2011-01-07

    The brain maintains internal models of its environment to interpret sensory inputs and to prepare actions. Although behavioral studies have demonstrated that these internal models are optimally adapted to the statistics of the environment, the neural underpinning of this adaptation is unknown. Using a Bayesian model of sensory cortical processing, we related stimulus-evoked and spontaneous neural activities to inferences and prior expectations in an internal model and predicted that they should match if the model is statistically optimal. To test this prediction, we analyzed visual cortical activity of awake ferrets during development. Similarity between spontaneous and evoked activities increased with age and was specific to responses evoked by natural scenes. This demonstrates the progressive adaptation of internal models to the statistics of natural stimuli at the neural level.

  19. Modelling the effect of structural QSAR parameters on skin penetration using genetic programming

    NASA Astrophysics Data System (ADS)

    Chung, K. K.; Do, D. Q.

    2010-09-01

    In order to model relationships between chemical structures and biological effects in quantitative structure-activity relationship (QSAR) data, an alternative technique of artificial intelligence computing—genetic programming (GP)—was investigated and compared to the traditional method—statistical. GP, with the primary advantage of generating mathematical equations, was employed to model QSAR data and to define the most important molecular descriptions in QSAR data. The models predicted by GP agreed with the statistical results, and the most predictive models of GP were significantly improved when compared to the statistical models using ANOVA. Recently, artificial intelligence techniques have been applied widely to analyse QSAR data. With the capability of generating mathematical equations, GP can be considered as an effective and efficient method for modelling QSAR data.

  20. Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

    NASA Astrophysics Data System (ADS)

    Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

    2016-03-01

    How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.

  1. Validation of Statistical Predictive Models Meant to Select Melanoma Patients for Sentinel Lymph Node Biopsy

    PubMed Central

    Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.

    2013-01-01

    Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550

  2. What can 35 years and over 700,000 measurements tell us about noise exposure in the mining industry?

    PubMed Central

    Roberts, Benjamin; Sun, Kan; Neitzel, Richard L.

    2017-01-01

    Objective To analyze over 700,000 cross-sectional measurements from the Mine Safety and Health Administration (MHSA) and develop statistical models to predict noise exposure for a worker. Design Descriptive statistics were used to summarize the data. Two linear regression models were used to predict noise exposure based on MSHA permissible exposure limit (PEL) and action level (AL) respectively. Two-fold cross validation was used to compare the exposure estimates from the models to actual measurements in the hold out data. The mean difference and t-statistic was calculated for each job title to determine if the model exposure predictions were significantly different from the actual data. Study Sample Measurements were acquired from MSHA through a Freedom of Information Act request. Results From 1979 to 2014 the average noise measurement has decreased. Measurements taken before the implementation of MSHA’s revised noise regulation in 2000 were on average 4.5 dBA higher than after the law came in to effect. Both models produced mean exposure predictions that were less than 1 dBA different compared to the holdout data. Conclusion Overall noise levels in mines have been decreasing. However, this decrease has not been uniform across all mining sectors. The exposure predictions from the model will be useful to help predict hearing loss in workers from the mining industry. PMID:27871188

  3. Statistical distribution of mechanical properties for three graphite-epoxy material systems

    NASA Technical Reports Server (NTRS)

    Reese, C.; Sorem, J., Jr.

    1981-01-01

    Graphite-epoxy composites are playing an increasing role as viable alternative materials in structural applications necessitating thorough investigation into the predictability and reproducibility of their material strength properties. This investigation was concerned with tension, compression, and short beam shear coupon testing of large samples from three different material suppliers to determine their statistical strength behavior. Statistical results indicate that a two Parameter Weibull distribution model provides better overall characterization of material behavior for the graphite-epoxy systems tested than does the standard Normal distribution model that is employed for most design work. While either a Weibull or Normal distribution model provides adequate predictions for average strength values, the Weibull model provides better characterization in the lower tail region where the predictions are of maximum design interest. The two sets of the same material were found to have essentially the same material properties, and indicate that repeatability can be achieved.

  4. Assessing Discriminative Performance at External Validation of Clinical Prediction Models

    PubMed Central

    Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.

    2016-01-01

    Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753

  5. Assessing Discriminative Performance at External Validation of Clinical Prediction Models.

    PubMed

    Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W

    2016-01-01

    External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.

  6. Discharge destination following lower limb fracture: development of a prediction model to assist with decision making.

    PubMed

    Kimmel, Lara A; Holland, Anne E; Edwards, Elton R; Cameron, Peter A; De Steiger, Richard; Page, Richard S; Gabbe, Belinda

    2012-06-01

    Accurate prediction of the likelihood of discharge to inpatient rehabilitation following lower limb fracture made on admission to hospital may assist patient discharge planning and decrease the burden on the hospital system caused by delays in decision making. To develop a prognostic model for discharge to inpatient rehabilitation. Isolated lower extremity fracture cases (excluding fractured neck of femur), captured by the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR), were extracted for analysis. A training data set was created for model development and validation data set for evaluation. A multivariable logistic regression model was developed based on patient and injury characteristics. Models were assessed using measures of discrimination (C-statistic) and calibration (Hosmer-Lemeshow (H-L) statistic). A total of 1429 patients met the inclusion criteria and were randomly split into training and test data sets. Increasing age, more proximal fracture type, compensation or private fund source for the admission, metropolitan location of residence, not working prior to injury and having a self-reported pre-injury disability were included in the final prediction model. The C-statistic for the model was 0.92 (95% confidence interval (CI) 0.88, 0.95) with an H-L statistic of χ(2)=11.62, p=0.17. For the test data set, the C-statistic was 0.86 (95% CI 0.83, 0.90) with an H-L statistic of χ(2)=37.98, p<0.001. A model to predict discharge to inpatient rehabilitation following lower limb fracture was developed with excellent discrimination although the calibration was reduced in the test data set. This model requires prospective testing but could form an integral part of decision making in regards to discharge disposition to facilitate timely and accurate referral to rehabilitation and optimise resource allocation. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Modeling Center NOAA Center for Weather and Climate Prediction (NCWCP) 5830 University Research Court

  8. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Environmental Modeling Center NOAA Center for Weather and Climate Prediction (NCWCP) 5830 University Research

  9. Drivers and seasonal predictability of extreme wind speeds in the ECMWF System 4 and a statistical model

    NASA Astrophysics Data System (ADS)

    Walz, M. A.; Donat, M.; Leckebusch, G. C.

    2017-12-01

    As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.

  10. Statistical analysis for understanding and predicting battery degradations in real-life electric vehicle use

    NASA Astrophysics Data System (ADS)

    Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine

    2014-01-01

    This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.

  11. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  12. Analysis of model development strategies: predicting ventral hernia recurrence.

    PubMed

    Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K

    2016-11-01

    There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. A Predictive Statistical Model of Navy Career Enlisted Retention Behavior Utilizing Economic Variables.

    DTIC Science & Technology

    1980-12-01

    career retention rates , and to predict future career retention rates in the Navy. The statistical model utilizes economic variables as predictors...The model developed r has a high correlation with Navy career retention rates . The problem of Navy career retention has not been adequately studied, 0D...findings indicate Navy policymakers must be cognizant of the relationships of economic factors to Navy career retention rates . Accrzsiofl ’or NTIS GRA&I

  14. Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.

    PubMed

    Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying

    2018-01-01

    Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.

  15. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    published 3-D multi-beam data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were...submitted, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D

  16. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were consistent with the...Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D data from a

  17. Comparison of statistical models for analyzing wheat yield time series.

    PubMed

    Michel, Lucie; Makowski, David

    2013-01-01

    The world's population is predicted to exceed nine billion by 2050 and there is increasing concern about the capability of agriculture to feed such a large population. Foresight studies on food security are frequently based on crop yield trends estimated from yield time series provided by national and regional statistical agencies. Various types of statistical models have been proposed for the analysis of yield time series, but the predictive performances of these models have not yet been evaluated in detail. In this study, we present eight statistical models for analyzing yield time series and compare their ability to predict wheat yield at the national and regional scales, using data provided by the Food and Agriculture Organization of the United Nations and by the French Ministry of Agriculture. The Holt-Winters and dynamic linear models performed equally well, giving the most accurate predictions of wheat yield. However, dynamic linear models have two advantages over Holt-Winters models: they can be used to reconstruct past yield trends retrospectively and to analyze uncertainty. The results obtained with dynamic linear models indicated a stagnation of wheat yields in many countries, but the estimated rate of increase of wheat yield remained above 0.06 t ha⁻¹ year⁻¹ in several countries in Europe, Asia, Africa and America, and the estimated values were highly uncertain for several major wheat producing countries. The rate of yield increase differed considerably between French regions, suggesting that efforts to identify the main causes of yield stagnation should focus on a subnational scale.

  18. A Statistical Weather-Driven Streamflow Model: Enabling future flow predictions in data-scarce headwater streams

    NASA Astrophysics Data System (ADS)

    Rosner, A.; Letcher, B. H.; Vogel, R. M.

    2014-12-01

    Predicting streamflow in headwaters and over a broad spatial scale pose unique challenges due to limited data availability. Flow observation gages for headwaters streams are less common than for larger rivers, and gages with records lengths of ten year or more are even more scarce. Thus, there is a great need for estimating streamflows in ungaged or sparsely-gaged headwaters. Further, there is often insufficient basin information to develop rainfall-runoff models that could be used to predict future flows under various climate scenarios. Headwaters in the northeastern U.S. are of particular concern to aquatic biologists, as these stream serve as essential habitat for native coldwater fish. In order to understand fish response to past or future environmental drivers, estimates of seasonal streamflow are needed. While there is limited flow data, there is a wealth of data for historic weather conditions. Observed data has been modeled to interpolate a spatially continuous historic weather dataset. (Mauer et al 2002). We present a statistical model developed by pairing streamflow observations with precipitation and temperature information for the same and preceding time-steps. We demonstrate this model's use to predict flow metrics at the seasonal time-step. While not a physical model, this statistical model represents the weather drivers. Since this model can predict flows not directly tied to reference gages, we can generate flow estimates for historic as well as potential future conditions.

  19. Prediction of crime occurrence from multi-modal data using deep learning

    PubMed Central

    Kang, Hyeon-Woo

    2017-01-01

    In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models. PMID:28437486

  20. Prediction of crime occurrence from multi-modal data using deep learning.

    PubMed

    Kang, Hyeon-Woo; Kang, Hang-Bong

    2017-01-01

    In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models.

  1. Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

    DOE PAGES

    Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

    2017-10-29

    Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less

  2. Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

    Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less

  3. Genetic Programming as Alternative for Predicting Development Effort of Individual Software Projects

    PubMed Central

    Chavoya, Arturo; Lopez-Martin, Cuauhtemoc; Andalon-Garcia, Irma R.; Meda-Campaña, M. E.

    2012-01-01

    Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment. PMID:23226305

  4. The Prediction of Noise Due to Jet Turbulence Convecting Past Flight Vehicle Trailing Edges

    NASA Technical Reports Server (NTRS)

    Miller, Steven A. E.

    2014-01-01

    High intensity acoustic radiation occurs when turbulence convects past airframe trailing edges. A mathematical model is developed to predict this acoustic radiation. The model is dependent on the local flow and turbulent statistics above the trailing edge of the flight vehicle airframe. These quantities are dependent on the jet and flight vehicle Mach numbers and jet temperature. A term in the model approximates the turbulent statistics of single-stream heated jet flows and is developed based upon measurement. The developed model is valid for a wide range of jet Mach numbers, jet temperature ratios, and flight vehicle Mach numbers. The model predicts traditional trailing edge noise if the jet is not interacting with the airframe. Predictions of mean-flow quantities and the cross-spectrum of static pressure near the airframe trailing edge are compared with measurement. Finally, predictions of acoustic intensity are compared with measurement and the model is shown to accurately capture the phenomenon.

  5. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    NASA Astrophysics Data System (ADS)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.

  6. Improving the Validity of Activity of Daily Living Dependency Risk Assessment

    PubMed Central

    Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.

    2015-01-01

    Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867

  7. Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation

    NASA Technical Reports Server (NTRS)

    DePriest, Douglas; Morgan, Carolyn

    2003-01-01

    The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.

  8. Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop

    NASA Technical Reports Server (NTRS)

    Morrison, Joseph H.

    2010-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.

  9. Statistical and dynamical forecast of regional precipitation after mature phase of ENSO

    NASA Astrophysics Data System (ADS)

    Sohn, S.; Min, Y.; Lee, J.; Tam, C.; Ahn, J.

    2010-12-01

    While the seasonal predictability of general circulation models (GCMs) has been improved, the current model atmosphere in the mid-latitude does not respond correctly to external forcing such as tropical sea surface temperature (SST), particularly over the East Asia and western North Pacific summer monsoon regions. In addition, the time-scale of prediction scope is considerably limited and the model forecast skill still is very poor beyond two weeks. Although recent studies indicate that coupled model based multi-model ensemble (MME) forecasts show the better performance, the long-lead forecasts exceeding 9 months still show a dramatic decrease of the seasonal predictability. This study aims at diagnosing the dynamical MME forecasts comprised of the state of art 1-tier models as well as comparing them with the statistical model forecasts, focusing on the East Asian summer precipitation predictions after mature phase of ENSO. The lagged impact of El Nino as major climate contributor on the summer monsoon in model environments is also evaluated, in the sense of the conditional probabilities. To evaluate the probability forecast skills, the reliability (attributes) diagram and the relative operating characteristics following the recommendations of the World Meteorological Organization (WMO) Standardized Verification System for Long-Range Forecasts are used in this study. The results should shed light on the prediction skill for dynamical model and also for the statistical model, in forecasting the East Asian summer monsoon rainfall with a long-lead time.

  10. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  11. Systematic review of prediction models for delirium in the older adult inpatient.

    PubMed

    Lindroth, Heidi; Bratzke, Lisa; Purvis, Suzanne; Brown, Roger; Coburn, Mark; Mrkobrada, Marko; Chan, Matthew T V; Davis, Daniel H J; Pandharipande, Pratik; Carlsson, Cynthia M; Sanders, Robert D

    2018-04-28

    To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population. Systematic review. PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. age >60 years, inpatient, developed/validated a prognostic delirium prediction model. alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author. The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified. Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  12. Improving the long-lead predictability of El Niño using a novel forecasting scheme based on a dynamic components model

    NASA Astrophysics Data System (ADS)

    Petrova, Desislava; Koopman, Siem Jan; Ballester, Joan; Rodó, Xavier

    2017-02-01

    El Niño (EN) is a dominant feature of climate variability on inter-annual time scales driving changes in the climate throughout the globe, and having wide-spread natural and socio-economic consequences. In this sense, its forecast is an important task, and predictions are issued on a regular basis by a wide array of prediction schemes and climate centres around the world. This study explores a novel method for EN forecasting. In the state-of-the-art the advantageous statistical technique of unobserved components time series modeling, also known as structural time series modeling, has not been applied. Therefore, we have developed such a model where the statistical analysis, including parameter estimation and forecasting, is based on state space methods, and includes the celebrated Kalman filter. The distinguishing feature of this dynamic model is the decomposition of a time series into a range of stochastically time-varying components such as level (or trend), seasonal, cycles of different frequencies, irregular, and regression effects incorporated as explanatory covariates. These components are modeled separately and ultimately combined in a single forecasting scheme. Customary statistical models for EN prediction essentially use SST and wind stress in the equatorial Pacific. In addition to these, we introduce a new domain of regression variables accounting for the state of the subsurface ocean temperature in the western and central equatorial Pacific, motivated by our analysis, as well as by recent and classical research, showing that subsurface processes and heat accumulation there are fundamental for the genesis of EN. An important feature of the scheme is that different regression predictors are used at different lead months, thus capturing the dynamical evolution of the system and rendering more efficient forecasts. The new model has been tested with the prediction of all warm events that occurred in the period 1996-2015. Retrospective forecasts of these events were made for long lead times of at least two and a half years. Hence, the present study demonstrates that the theoretical limit of ENSO prediction should be sought much longer than the commonly accepted "Spring Barrier". The high correspondence between the forecasts and observations indicates that the proposed model outperforms all current operational statistical models, and behaves comparably to the best dynamical models used for EN prediction. Thus, the novel way in which the modeling scheme has been structured could also be used for improving other statistical and dynamical modeling systems.

  13. Scale Dependence of Statistics of Spatially Averaged Rain Rate Seen in TOGA COARE Comparison with Predictions from a Stochastic Model

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, T. L.; Lau, William K. M. (Technical Monitor)

    2002-01-01

    A characteristic feature of rainfall statistics is that they in general depend on the space and time scales over which rain data are averaged. As a part of an earlier effort to determine the sampling error of satellite rain averages, a space-time model of rainfall statistics was developed to describe the statistics of gridded rain observed in GATE. The model allows one to compute the second moment statistics of space- and time-averaged rain rate which can be fitted to satellite or rain gauge data to determine the four model parameters appearing in the precipitation spectrum - an overall strength parameter, a characteristic length separating the long and short wavelength regimes and a characteristic relaxation time for decay of the autocorrelation of the instantaneous local rain rate and a certain 'fractal' power law exponent. For area-averaged instantaneous rain rate, this exponent governs the power law dependence of these statistics on the averaging length scale $L$ predicted by the model in the limit of small $L$. In particular, the variance of rain rate averaged over an $L \\times L$ area exhibits a power law singularity as $L \\rightarrow 0$. In the present work the model is used to investigate how the statistics of area-averaged rain rate over the tropical Western Pacific measured with ship borne radar during TOGA COARE (Tropical Ocean Global Atmosphere Coupled Ocean Atmospheric Response Experiment) and gridded on a 2 km grid depends on the size of the spatial averaging scale. Good agreement is found between the data and predictions from the model over a wide range of averaging length scales.

  14. Prediction of biomechanical parameters of the proximal femur using statistical appearance models and support vector regression.

    PubMed

    Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer

    2008-01-01

    Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.

  15. The prediction of epidemics through mathematical modeling.

    PubMed

    Schaus, Catherine

    2014-01-01

    Mathematical models may be resorted to in an endeavor to predict the development of epidemics. The SIR model is one of the applications. Still too approximate, the use of statistics awaits more data in order to come closer to reality.

  16. An emission-weighted proximity model for air pollution exposure assessment.

    PubMed

    Zou, Bin; Wilson, J Gaines; Zhan, F Benjamin; Zeng, Yongnian

    2009-08-15

    Among the most common spatial models for estimating personal exposure are Traditional Proximity Models (TPMs). Though TPMs are straightforward to configure and interpret, they are prone to extensive errors in exposure estimates and do not provide prospective estimates. To resolve these inherent problems with TPMs, we introduce here a novel Emission Weighted Proximity Model (EWPM) to improve the TPM, which takes into consideration the emissions from all sources potentially influencing the receptors. EWPM performance was evaluated by comparing the normalized exposure risk values of sulfur dioxide (SO(2)) calculated by EWPM with those calculated by TPM and monitored observations over a one-year period in two large Texas counties. In order to investigate whether the limitations of TPM in potential exposure risk prediction without recorded incidence can be overcome, we also introduce a hybrid framework, a 'Geo-statistical EWPM'. Geo-statistical EWPM is a synthesis of Ordinary Kriging Geo-statistical interpolation and EWPM. The prediction results are presented as two potential exposure risk prediction maps. The performance of these two exposure maps in predicting individual SO(2) exposure risk was validated with 10 virtual cases in prospective exposure scenarios. Risk values for EWPM were clearly more agreeable with the observed concentrations than those from TPM. Over the entire study area, the mean SO(2) exposure risk from EWPM was higher relative to TPM (1.00 vs. 0.91). The mean bias of the exposure risk values of 10 virtual cases between EWPM and 'Geo-statistical EWPM' are much smaller than those between TPM and 'Geo-statistical TPM' (5.12 vs. 24.63). EWPM appears to more accurately portray individual exposure relative to TPM. The 'Geo-statistical EWPM' effectively augments the role of the standard proximity model and makes it possible to predict individual risk in future exposure scenarios resulting in adverse health effects from environmental pollution.

  17. VIIRS satellite and ground pm2.5 monitoring data

    EPA Pesticide Factsheets

    contains all satellite, pm2.5, and meteorological data used in statistical modeling effort to improve prediction of pm2.5This dataset is associated with the following publication:Schliep, E., A. Gelfand, and D. Holland. Autoregressive Spatially-Varying Coefficient Models for Predicting Daily PM2:5 Using VIIRS Satellite AOT. Advances in Statistical Climatology, Meteorology and Oceanography. Copernicus Publications, Katlenburg-Lindau, GERMANY, 1(0): 59-74, (2015).

  18. Risk prediction models of breast cancer: a systematic review of model performances.

    PubMed

    Anothaisintawee, Thunyarat; Teerawattananon, Yot; Wiratkapun, Chollathip; Kasamesup, Vijj; Thakkinstian, Ammarin

    2012-05-01

    The number of risk prediction models has been increasingly developed, for estimating about breast cancer in individual women. However, those model performances are questionable. We therefore have conducted a study with the aim to systematically review previous risk prediction models. The results from this review help to identify the most reliable model and indicate the strengths and weaknesses of each model for guiding future model development. We searched MEDLINE (PubMed) from 1949 and EMBASE (Ovid) from 1974 until October 2010. Observational studies which constructed models using regression methods were selected. Information about model development and performance were extracted. Twenty-five out of 453 studies were eligible. Of these, 18 developed prediction models and 7 validated existing prediction models. Up to 13 variables were included in the models and sample sizes for each study ranged from 550 to 2,404,636. Internal validation was performed in four models, while five models had external validation. Gail and Rosner and Colditz models were the significant models which were subsequently modified by other scholars. Calibration performance of most models was fair to good (expected/observe ratio: 0.87-1.12), but discriminatory accuracy was poor to fair both in internal validation (concordance statistics: 0.53-0.66) and in external validation (concordance statistics: 0.56-0.63). Most models yielded relatively poor discrimination in both internal and external validation. This poor discriminatory accuracy of existing models might be because of a lack of knowledge about risk factors, heterogeneous subtypes of breast cancer, and different distributions of risk factors across populations. In addition the concordance statistic itself is insensitive to measure the improvement of discrimination. Therefore, the new method such as net reclassification index should be considered to evaluate the improvement of the performance of a new develop model.

  19. Cross-validation of Peak Oxygen Consumption Prediction Models From OMNI Perceived Exertion.

    PubMed

    Mays, R J; Goss, F L; Nagle, E F; Gallagher, M; Haile, L; Schafer, M A; Kim, K H; Robertson, R J

    2016-09-01

    This study cross-validated statistical models for prediction of peak oxygen consumption using ratings of perceived exertion from the Adult OMNI Cycle Scale of Perceived Exertion. 74 participants (men: n=36; women: n=38) completed a graded cycle exercise test. Ratings of perceived exertion for the overall body, legs, and chest/breathing were recorded each test stage and entered into previously developed 3-stage peak oxygen consumption prediction models. There were no significant differences (p>0.05) between measured and predicted peak oxygen consumption from ratings of perceived exertion for the overall body, legs, and chest/breathing within men (mean±standard deviation: 3.16±0.52 vs. 2.92±0.33 vs. 2.90±0.29 vs. 2.90±0.26 L·min(-1)) and women (2.17±0.29 vs. 2.02±0.22 vs. 2.03±0.19 vs. 2.01±0.19 L·min(-1)) participants. Previously developed statistical models for prediction of peak oxygen consumption based on subpeak OMNI ratings of perceived exertion responses were similar to measured peak oxygen consumption in a separate group of participants. These findings provide practical implications for the use of the original statistical models in standard health-fitness settings. © Georg Thieme Verlag KG Stuttgart · New York.

  20. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  1. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  2. Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop

    NASA Technical Reports Server (NTRS)

    Morrison, Joseph H.

    2013-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.

  3. Season-ahead water quality forecasts for the Schuylkill River, Pennsylvania

    NASA Astrophysics Data System (ADS)

    Block, P. J.; Leung, K.

    2013-12-01

    Anticipating and preparing for elevated water quality parameter levels in critical water sources, using weather forecasts, is not uncommon. In this study, we explore the feasibility of extending this prediction scale to a season-ahead for the Schuylkill River in Philadelphia, utilizing both statistical and dynamical prediction models, to characterize the season. This advance information has relevance for recreational activities, ecosystem health, and water treatment, as the Schuylkill provides 40% of Philadelphia's water supply. The statistical model associates large-scale climate drivers with streamflow and water quality parameter levels; numerous variables from NOAA's CFSv2 model are evaluated for the dynamical approach. A multi-model combination is also assessed. Results indicate moderately skillful prediction of average summertime total coliform and wintertime turbidity, using season-ahead oceanic and atmospheric variables, predominantly from the North Atlantic Ocean. Models predicting the number of elevated turbidity events across the wintertime season are also explored.

  4. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  5. In silico environmental chemical science: properties and processes from statistical and computational modelling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tratnyek, Paul G.; Bylaska, Eric J.; Weber, Eric J.

    2017-01-01

    Quantitative structure–activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with “in silico” results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs usingmore » descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for “in silico environmental chemical science” are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.« less

  6. RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

    PubMed

    Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

    2017-06-06

    An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

  7. Comparing self-reported health status and diagnosis-based risk adjustment to predict 1- and 2 to 5-year mortality.

    PubMed

    Pietz, Kenneth; Petersen, Laura A

    2007-04-01

    To compare the ability of two diagnosis-based risk adjustment systems and health self-report to predict short- and long-term mortality. Data were obtained from the Department of Veterans Affairs (VA) administrative databases. The study population was 78,164 VA beneficiaries at eight medical centers during fiscal year (FY) 1998, 35,337 of whom completed an 36-Item Short Form Health Survey for veterans (SF-36V) survey. We tested the ability of Diagnostic Cost Groups (DCGs), Adjusted Clinical Groups (ACGs), SF-36V Physical Component score (PCS) and Mental Component Score (MCS), and eight SF-36V scales to predict 1- and 2-5 year all-cause mortality. The additional predictive value of adding PCS and MCS to ACGs and DCGs was also evaluated. Logistic regression models were compared using Akaike's information criterion, the c-statistic, and the Hosmer-Lemeshow test. The c-statistics for the eight scales combined with age and gender were 0.766 for 1-year mortality and 0.771 for 2-5-year mortality. For DCGs with age and gender the c-statistics for 1- and 2-5-year mortality were 0.778 and 0.771, respectively. Adding PCS and MCS to the DCG model increased the c-statistics to 0.798 for 1-year and 0.784 for 2-5-year mortality. The DCG model showed slightly better performance than the eight-scale model in predicting 1-year mortality, but the two models showed similar performance for 2-5-year mortality. Health self-report may add health risk information in addition to age, gender, and diagnosis for predicting longer-term mortality.

  8. Comparing Self-Reported Health Status and Diagnosis-Based Risk Adjustment to Predict 1- and 2 to 5-Year Mortality

    PubMed Central

    Pietz, Kenneth; Petersen, Laura A

    2007-01-01

    Objectives To compare the ability of two diagnosis-based risk adjustment systems and health self-report to predict short- and long-term mortality. Data Sources/Study Setting Data were obtained from the Department of Veterans Affairs (VA) administrative databases. The study population was 78,164 VA beneficiaries at eight medical centers during fiscal year (FY) 1998, 35,337 of whom completed an 36-Item Short Form Health Survey for veterans (SF-36V) survey. Study Design We tested the ability of Diagnostic Cost Groups (DCGs), Adjusted Clinical Groups (ACGs), SF-36V Physical Component score (PCS) and Mental Component Score (MCS), and eight SF-36V scales to predict 1- and 2–5 year all-cause mortality. The additional predictive value of adding PCS and MCS to ACGs and DCGs was also evaluated. Logistic regression models were compared using Akaike's information criterion, the c-statistic, and the Hosmer–Lemeshow test. Principal Findings The c-statistics for the eight scales combined with age and gender were 0.766 for 1-year mortality and 0.771 for 2–5-year mortality. For DCGs with age and gender the c-statistics for 1- and 2–5-year mortality were 0.778 and 0.771, respectively. Adding PCS and MCS to the DCG model increased the c-statistics to 0.798 for 1-year and 0.784 for 2–5-year mortality. Conclusions The DCG model showed slightly better performance than the eight-scale model in predicting 1-year mortality, but the two models showed similar performance for 2–5-year mortality. Health self-report may add health risk information in addition to age, gender, and diagnosis for predicting longer-term mortality. PMID:17362210

  9. Seasonal Atmospheric and Oceanic Predictions

    NASA Technical Reports Server (NTRS)

    Roads, John; Rienecker, Michele (Technical Monitor)

    2003-01-01

    Several projects associated with dynamical, statistical, single column, and ocean models are presented. The projects include: 1) Regional Climate Modeling; 2) Statistical Downscaling; 3) Evaluation of SCM and NSIPP AGCM Results at the ARM Program Sites; and 4) Ocean Forecasts.

  10. Estimating current and future streamflow characteristics at ungaged sites, central and eastern Montana, with application to evaluating effects of climate change on fish populations

    USGS Publications Warehouse

    Sando, Roy; Chase, Katherine J.

    2017-03-23

    A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.

  11. Prediction of rainfall anomalies during the dry to wet transition season over the Southern Amazonia using machine learning tools

    NASA Astrophysics Data System (ADS)

    Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.

    2017-12-01

    Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.

  12. Comparison of Statistical Models for Analyzing Wheat Yield Time Series

    PubMed Central

    Michel, Lucie; Makowski, David

    2013-01-01

    The world's population is predicted to exceed nine billion by 2050 and there is increasing concern about the capability of agriculture to feed such a large population. Foresight studies on food security are frequently based on crop yield trends estimated from yield time series provided by national and regional statistical agencies. Various types of statistical models have been proposed for the analysis of yield time series, but the predictive performances of these models have not yet been evaluated in detail. In this study, we present eight statistical models for analyzing yield time series and compare their ability to predict wheat yield at the national and regional scales, using data provided by the Food and Agriculture Organization of the United Nations and by the French Ministry of Agriculture. The Holt-Winters and dynamic linear models performed equally well, giving the most accurate predictions of wheat yield. However, dynamic linear models have two advantages over Holt-Winters models: they can be used to reconstruct past yield trends retrospectively and to analyze uncertainty. The results obtained with dynamic linear models indicated a stagnation of wheat yields in many countries, but the estimated rate of increase of wheat yield remained above 0.06 t ha−1 year−1 in several countries in Europe, Asia, Africa and America, and the estimated values were highly uncertain for several major wheat producing countries. The rate of yield increase differed considerably between French regions, suggesting that efforts to identify the main causes of yield stagnation should focus on a subnational scale. PMID:24205280

  13. Performance of statistical models to predict mental health and substance abuse cost.

    PubMed

    Montez-Rath, Maria; Christiansen, Cindy L; Ettner, Susan L; Loveland, Susan; Rosen, Amy K

    2006-10-26

    Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS) models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice. Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH) or substance abuse (SA) diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS); Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE), mean absolute prediction error (MAPE), and predictive ratios of predicted to observed cost (PR) among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample. The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples. Models with square-root transformation or link fit the data best. This function (whether used as transformation or as a link) seems to help deal with the high comorbidity of this population by introducing a form of interaction. The Gamma distribution helps with the long tail of the distribution. However, the Normal distribution is suitable if the correct transformation of the outcome is used.

  14. Statistical prediction with Kanerva's sparse distributed memory

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1989-01-01

    A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.

  15. Adaptation of clinical prediction models for application in local settings.

    PubMed

    Kappen, Teus H; Vergouwe, Yvonne; van Klei, Wilton A; van Wolfswinkel, Leo; Kalkman, Cor J; Moons, Karel G M

    2012-01-01

    When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. To demonstrate how clinical information can direct updating a prediction model and development of a strategy for handling missing predictor values in clinical practice. A previously derived and validated prediction model for postoperative nausea and vomiting was updated using a data set of 1847 patients. The update consisted of 1) changing the definition of an existing predictor, 2) reestimating the regression coefficient of a predictor, and 3) adding a new predictor to the model. The updated model was then validated in a new series of 3822 patients. Furthermore, several imputation models were considered to handle real-time missing values, so that possible missing predictor values could be anticipated during actual model use. Differences in clinical practice between our local population and the original derivation population guided the update strategy of the prediction model. The predictive accuracy of the updated model was better (c statistic, 0.68; calibration slope, 1.0) than the original model (c statistic, 0.62; calibration slope, 0.57). Inclusion of logistical variables in the imputation models, besides observed patient characteristics, contributed to a strategy to deal with missing predictor values at the time of risk calculation. Extensive knowledge of local, clinical processes provides crucial information to guide the process of adapting a prediction model to new clinical practices.

  16. How well can we predict forage species occurrence and abundance?

    USDA-ARS?s Scientific Manuscript database

    As part of a larger effort focused on forage species production and management, we have been developing a statistical modeling approach to predict the probability of species occurrence and the abundance for Orchard Grass over the Northeast region of the United States using two selected statistical m...

  17. A New Approach of Juvenile Age Estimation using Measurements of the Ilium and Multivariate Adaptive Regression Splines (MARS) Models for Better Age Prediction.

    PubMed

    Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal

    2017-01-01

    Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.

  18. Prediction of drug transport processes using simple parameters and PLS statistics. The use of ACD/logP and ACD/ChemSketch descriptors.

    PubMed

    Osterberg, T; Norinder, U

    2001-01-01

    A method of modelling and predicting biopharmaceutical properties using simple theoretically computed molecular descriptors and multivariate statistics has been investigated for several data sets related to solubility, IAM chromatography, permeability across Caco-2 cell monolayers, human intestinal perfusion, brain-blood partitioning, and P-glycoprotein ATPase activity. The molecular descriptors (e.g. molar refractivity, molar volume, index of refraction, surface tension and density) and logP were computed with ACD/ChemSketch and ACD/logP, respectively. Good statistical models were derived that permit simple computational prediction of biopharmaceutical properties. All final models derived had R(2) values ranging from 0.73 to 0.95 and Q(2) values ranging from 0.69 to 0.86. The RMSEP values for the external test sets ranged from 0.24 to 0.85 (log scale).

  19. Physics-based statistical learning approach to mesoscopic model selection.

    PubMed

    Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab

    2015-11-01

    In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.

  20. Development of a Predictive Corrosion Model Using Locality-Specific Corrosion Indices

    DTIC Science & Technology

    2017-09-12

    6 3.2.1 Statistical data analysis methods ...6 3.2.2 Algorithm development method ...components, and method ) were compiled into an executable program that uses mathematical models of materials degradation, and statistical calcula- tions

  1. Multimodel predictive system for carbon dioxide solubility in saline formation waters.

    PubMed

    Wang, Zan; Small, Mitchell J; Karamalidis, Athanasios K

    2013-02-05

    The prediction of carbon dioxide solubility in brine at conditions relevant to carbon sequestration (i.e., high temperature, pressure, and salt concentration (T-P-X)) is crucial when this technology is applied. Eleven mathematical models for predicting CO(2) solubility in brine are compared and considered for inclusion in a multimodel predictive system. Model goodness of fit is evaluated over the temperature range 304-433 K, pressure range 74-500 bar, and salt concentration range 0-7 m (NaCl equivalent), using 173 published CO(2) solubility measurements, particularly selected for those conditions. The performance of each model is assessed using various statistical methods, including the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Different models emerge as best fits for different subranges of the input conditions. A classification tree is generated using machine learning methods to predict the best-performing model under different T-P-X subranges, allowing development of a multimodel predictive system (MMoPS) that selects and applies the model expected to yield the most accurate CO(2) solubility prediction. Statistical analysis of the MMoPS predictions, including a stratified 5-fold cross validation, shows that MMoPS outperforms each individual model and increases the overall accuracy of CO(2) solubility prediction across the range of T-P-X conditions likely to be encountered in carbon sequestration applications.

  2. Evaluating observations in the context of predictions for the death valley regional groundwater system

    USGS Publications Warehouse

    Ely, D.M.; Hill, M.C.; Tiedeman, C.R.; O'Brien, G. M.

    2004-01-01

    When a model is calibrated by nonlinear regression, calculated diagnostic and inferential statistics provide a wealth of information about many aspects of the system. This work uses linear inferential statistics that are measures of prediction uncertainty to investigate the likely importance of continued monitoring of hydraulic head to the accuracy of model predictions. The measurements evaluated are hydraulic heads; the predictions of interest are subsurface transport from 15 locations. The advective component of transport is considered because it is the component most affected by the system dynamics represented by the regional-scale model being used. The problem is addressed using the capabilities of the U.S. Geological Survey computer program MODFLOW-2000, with its Advective Travel Observation (ADV) Package. Copyright ASCE 2004.

  3. On entropy, financial markets and minority games

    NASA Astrophysics Data System (ADS)

    Zapart, Christopher A.

    2009-04-01

    The paper builds upon an earlier statistical analysis of financial time series with Shannon information entropy, published in [L. Molgedey, W. Ebeling, Local order, entropy and predictability of financial time series, European Physical Journal B-Condensed Matter and Complex Systems 15/4 (2000) 733-737]. A novel generic procedure is proposed for making multistep-ahead predictions of time series by building a statistical model of entropy. The approach is first demonstrated on the chaotic Mackey-Glass time series and later applied to Japanese Yen/US dollar intraday currency data. The paper also reinterprets Minority Games [E. Moro, The minority game: An introductory guide, Advances in Condensed Matter and Statistical Physics (2004)] within the context of physical entropy, and uses models derived from minority game theory as a tool for measuring the entropy of a model in response to time series. This entropy conditional upon a model is subsequently used in place of information-theoretic entropy in the proposed multistep prediction algorithm.

  4. A High Precision Prediction Model Using Hybrid Grey Dynamic Model

    ERIC Educational Resources Information Center

    Li, Guo-Dong; Yamaguchi, Daisuke; Nagai, Masatake; Masuda, Shiro

    2008-01-01

    In this paper, we propose a new prediction analysis model which combines the first order one variable Grey differential equation Model (abbreviated as GM(1,1) model) from grey system theory and time series Autoregressive Integrated Moving Average (ARIMA) model from statistics theory. We abbreviate the combined GM(1,1) ARIMA model as ARGM(1,1)…

  5. Validity of Models for Predicting BRCA1 and BRCA2 Mutations

    PubMed Central

    Parmigiani, Giovanni; Chen, Sining; Iversen, Edwin S.; Friebel, Tara M.; Finkelstein, Dianne M.; Anton-Culver, Hoda; Ziogas, Argyrios; Weber, Barbara L.; Eisen, Andrea; Malone, Kathleen E.; Daling, Janet R.; Hsu, Li; Ostrander, Elaine A.; Peterson, Leif E.; Schildkraut, Joellen M.; Isaacs, Claudine; Corio, Camille; Leondaridis, Leoni; Tomlinson, Gail; Amos, Christopher I.; Strong, Louise C.; Berry, Donald A.; Weitzel, Jeffrey N.; Sand, Sharon; Dutson, Debra; Kerber, Rich; Peshkin, Beth N.; Euhus, David M.

    2008-01-01

    Background Deleterious mutations of the BRCA1 and BRCA2 genes confer susceptibility to breast and ovarian cancer. At least 7 models for estimating the probabilities of having a mutation are used widely in clinical and scientific activities; however, the merits and limitations of these models are not fully understood. Objective To systematically quantify the accuracy of the following publicly available models to predict mutation carrier status: BRCAPRO, family history assessment tool, Finnish, Myriad, National Cancer Institute, University of Pennsylvania, and Yale University. Design Cross-sectional validation study, using model predictions and BRCA1 or BRCA2 mutation status of patients different from those used to develop the models. Setting Multicenter study across Cancer Genetics Network participating centers. Patients 3 population-based samples of participants in research studies and 8 samples from genetic counseling clinics. Measurements Discrimination between individuals testing positive for a mutation in BRCA1 or BRCA2 from those testing negative, as measured by the c-statistic, and sensitivity and specificity of model predictions. Results The 7 models differ in their predictions. The better-performing models have a c-statistic around 80%. BRCAPRO has the largest c-statistic overall and in all but 2 patient subgroups, although the margin over other models is narrow in many strata. Outside of high-risk populations, all models have high false-negative and false-positive rates across a range of probability thresholds used to refer for mutation testing. Limitation Three recently published models were not included. Conclusions All models identify women who probably carry a deleterious mutation of BRCA1 or BRCA2 with adequate discrimination to support individualized genetic counseling, although discrimination varies across models and populations. PMID:17909205

  6. Using sensitivity analysis in model calibration efforts

    USGS Publications Warehouse

    Tiedeman, Claire; Hill, Mary C.

    2003-01-01

    In models of natural and engineered systems, sensitivity analysis can be used to assess relations among system state observations, model parameters, and model predictions. The model itself links these three entities, and model sensitivities can be used to quantify the links. Sensitivities are defined as the derivatives of simulated quantities (such as simulated equivalents of observations, or model predictions) with respect to model parameters. We present four measures calculated from model sensitivities that quantify the observation-parameter-prediction links and that are especially useful during the calibration and prediction phases of modeling. These four measures are composite scaled sensitivities (CSS), prediction scaled sensitivities (PSS), the value of improved information (VOII) statistic, and the observation prediction (OPR) statistic. These measures can be used to help guide initial calibration of models, collection of field data beneficial to model predictions, and recalibration of models updated with new field information. Once model sensitivities have been calculated, each of the four measures requires minimal computational effort. We apply the four measures to a three-layer MODFLOW-2000 (Harbaugh et al., 2000; Hill et al., 2000) model of the Death Valley regional ground-water flow system (DVRFS), located in southern Nevada and California. D’Agnese et al. (1997, 1999) developed and calibrated the model using nonlinear regression methods. Figure 1 shows some of the observations, parameters, and predictions for the DVRFS model. Observed quantities include hydraulic heads and spring flows. The 23 defined model parameters include hydraulic conductivities, vertical anisotropies, recharge rates, evapotranspiration rates, and pumpage. Predictions of interest for this regional-scale model are advective transport paths from potential contamination sites underlying the Nevada Test Site and Yucca Mountain.

  7. A multibody knee model with discrete cartilage prediction of tibio-femoral contact mechanics.

    PubMed

    Guess, Trent M; Liu, Hongzeng; Bhashyam, Sampath; Thiagarajan, Ganesh

    2013-01-01

    Combining musculoskeletal simulations with anatomical joint models capable of predicting cartilage contact mechanics would provide a valuable tool for studying the relationships between muscle force and cartilage loading. As a step towards producing multibody musculoskeletal models that include representation of cartilage tissue mechanics, this research developed a subject-specific multibody knee model that represented the tibia plateau cartilage as discrete rigid bodies that interacted with the femur through deformable contacts. Parameters for the compliant contact law were derived using three methods: (1) simplified Hertzian contact theory, (2) simplified elastic foundation contact theory and (3) parameter optimisation from a finite element (FE) solution. The contact parameters and contact friction were evaluated during a simulated walk in a virtual dynamic knee simulator, and the resulting kinematics were compared with measured in vitro kinematics. The effects on predicted contact pressures and cartilage-bone interface shear forces during the simulated walk were also evaluated. The compliant contact stiffness parameters had a statistically significant effect on predicted contact pressures as well as all tibio-femoral motions except flexion-extension. The contact friction was not statistically significant to contact pressures, but was statistically significant to medial-lateral translation and all rotations except flexion-extension. The magnitude of kinematic differences between model formulations was relatively small, but contact pressure predictions were sensitive to model formulation. The developed multibody knee model was computationally efficient and had a computation time 283 times faster than a FE simulation using the same geometries and boundary conditions.

  8. An image based method for crop yield prediction using remotely sensed and crop canopy data: the case of Paphos district, western Cyprus

    NASA Astrophysics Data System (ADS)

    Papadavid, G.; Hadjimitsis, D.

    2014-08-01

    Remote sensing techniques development have provided the opportunity for optimizing yields in the agricultural procedure and moreover to predict the forthcoming yield. Yield prediction plays a vital role in Agricultural Policy and provides useful data to policy makers. In this context, crop and soil parameters along with NDVI index which are valuable sources of information have been elaborated statistically to test if a) Durum wheat yield can be predicted and b) when is the actual time-window to predict the yield in the district of Paphos, where Durum wheat is the basic cultivation and supports the rural economy of the area. 15 plots cultivated with Durum wheat from the Agricultural Research Institute of Cyprus for research purposes, in the area of interest, have been under observation for three years to derive the necessary data. Statistical and remote sensing techniques were then applied to derive and map a model that can predict yield of Durum wheat in this area. Indeed the semi-empirical model developed for this purpose, with very high correlation coefficient R2=0.886, has shown in practice that can predict yields very good. Students T test has revealed that predicted values and real values of yield have no statistically significant difference. The developed model can and will be further elaborated with more parameters and applied for other crops in the near future.

  9. Reevaluation of a walleye (Sander vitreus) bioenergetics model

    USGS Publications Warehouse

    Madenjian, Charles P.; Wang, Chunfang

    2013-01-01

    Walleye (Sander vitreus) is an important sport fish throughout much of North America, and walleye populations support valuable commercial fisheries in certain lakes as well. Using a corrected algorithm for balancing the energy budget, we reevaluated the performance of the Wisconsin bioenergetics model for walleye in the laboratory. Walleyes were fed rainbow smelt (Osmerus mordax) in four laboratory tanks each day during a 126-day experiment. Feeding rates ranged from 1.4 to 1.7 % of walleye body weight per day. Based on a statistical comparison of bioenergetics model predictions of monthly consumption with observed monthly consumption, we concluded that the bioenergetics model estimated food consumption by walleye without any significant bias. Similarly, based on a statistical comparison of bioenergetics model predictions of weight at the end of the monthly test period with observed weight, we concluded that the bioenergetics model predicted walleye growth without any detectable bias. In addition, the bioenergetics model predictions of cumulative consumption over the 126-day experiment differed fromobserved cumulative consumption by less than 10 %. Although additional laboratory and field testing will be needed to fully evaluate model performance, based on our laboratory results, the Wisconsin bioenergetics model for walleye appears to be providing unbiased predictions of food consumption.

  10. Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

    NASA Astrophysics Data System (ADS)

    Al Sobhi, Mashail M.

    2015-02-01

    Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.

  11. Effects and detection of raw material variability on the performance of near-infrared calibration models for pharmaceutical products.

    PubMed

    Igne, Benoit; Shi, Zhenqi; Drennen, James K; Anderson, Carl A

    2014-02-01

    The impact of raw material variability on the prediction ability of a near-infrared calibration model was studied. Calibrations, developed from a quaternary mixture design comprising theophylline anhydrous, lactose monohydrate, microcrystalline cellulose, and soluble starch, were challenged by intentional variation of raw material properties. A design with two theophylline physical forms, three lactose particle sizes, and two starch manufacturers was created to test model robustness. Further challenges to the models were accomplished through environmental conditions. Along with full-spectrum partial least squares (PLS) modeling, variable selection by dynamic backward PLS and genetic algorithms was utilized in an effort to mitigate the effects of raw material variability. In addition to evaluating models based on their prediction statistics, prediction residuals were analyzed by analyses of variance and model diagnostics (Hotelling's T(2) and Q residuals). Full-spectrum models were significantly affected by lactose particle size. Models developed by selecting variables gave lower prediction errors and proved to be a good approach to limit the effect of changing raw material characteristics. Hotelling's T(2) and Q residuals provided valuable information that was not detectable when studying only prediction trends. Diagnostic statistics were demonstrated to be critical in the appropriate interpretation of the prediction of quality parameters. © 2013 Wiley Periodicals, Inc. and the American Pharmacists Association.

  12. Meteorological models for estimating phenology of corn

    NASA Technical Reports Server (NTRS)

    Daughtry, C. S. T.; Cochran, J. C.; Hollinger, S. E.

    1984-01-01

    Knowledge of when critical crop stages occur and how the environment affects them should provide useful information for crop management decisions and crop production models. Two sources of data were evaluated for predicting dates of silking and physiological maturity of corn (Zea mays L.). Initial evaluations were conducted using data of an adapted corn hybrid grown on a Typic Agriaquoll at the Purdue University Agronomy Farm. The second phase extended the analyses to large areas using data acquired by the Statistical Reporting Service of USDA for crop reporting districts (CRD) in Indiana and Iowa. Several thermal models were compared to calendar days for predicting dates of silking and physiological maturity. Mixed models which used a combination of thermal units to predict silking and days after silking to predict physiological maturity were also evaluated. At the Agronomy Farm the models were calibrated and tested on the same data. The thermal models were significantly less biased and more accurate than calendar days for predicting dates of silking. Differences among the thermal models were small. Significant improvements in both bias and accuracy were observed when the mixed models were used to predict dates of physiological maturity. The results indicate that statistical data for CRD can be used to evaluate models developed at agricultural experiment stations.

  13. Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

    The additivity model assumed that field-scale reaction properties in a sediment including surface area, reactive site concentration, and reaction rate can be predicted from field-scale grain-size distribution by linearly adding reaction properties estimated in laboratory for individual grain-size fractions. This study evaluated the additivity model in scaling mass transfer-limited, multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment. Experimental data of rate-limited U(VI) desorption in a stirred flow-cell reactor were used to estimate the statistical properties of the rate constants for individual grain-size fractions, which were then used to predict rate-limited U(VI) desorption in the composite sediment. The resultmore » indicated that the additivity model with respect to the rate of U(VI) desorption provided a good prediction of U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel-size fraction (2 to 8 mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less

  14. Modeling student success in engineering education

    NASA Astrophysics Data System (ADS)

    Jin, Qu

    In order for the United States to maintain its global competitiveness, the long-term success of our engineering students in specific courses, programs, and colleges is now, more than ever, an extremely high priority. Numerous studies have focused on factors that impact student success, namely academic performance, retention, and/or graduation. However, there are only a limited number of works that have systematically developed models to investigate important factors and to predict student success in engineering. Therefore, this research presents three separate but highly connected investigations to address this gap. The first investigation involves explaining and predicting engineering students' success in Calculus I courses using statistical models. The participants were more than 4000 first-year engineering students (cohort years 2004 - 2008) who enrolled in Calculus I courses during the first semester in a large Midwestern university. Predictions from statistical models were proposed to be used to place engineering students into calculus courses. The success rates were improved by 12% in Calculus IA using predictions from models developed over traditional placement method. The results showed that these statistical models provided a more accurate calculus placement method than traditional placement methods and help improve success rates in those courses. In the second investigation, multi-outcome and single-outcome neural network models were designed to understand and to predict first-year retention and first-year GPA of engineering students. The participants were more than 3000 first year engineering students (cohort years 2004 - 2005) enrolled in a large Midwestern university. The independent variables include both high school academic performance factors and affective factors measured prior to entry. The prediction performances of the multi-outcome and single-outcome models were comparable. The ability to predict cumulative GPA at the end of an engineering student's first year of college was about a half of a grade point for both models. The predictors of retention and cumulative GPA while being similar differ in that high school academic metrics play a more important role in predicting cumulative GPA with the affective measures playing a more important role in predicting retention. In the last investigation, multi-outcome neural network models were used to understand and to predict engineering students' retention, GPA, and graduation from entry to departure. The participants were more than 4000 engineering students (cohort years 2004 - 2006) enrolled in a large Midwestern university. Different patterns of important predictors were identified for GPA, retention, and graduation. Overall, this research explores the feasibility of using modeling to enhance a student's educational experience in engineering. Student success modeling was used to identify the most important cognitive and affective predictors for a student's first calculus course retention, GPA, and graduation. The results suggest that the statistical modeling methods have great potential to assist decision making and help ensure student success in engineering education.

  15. Dynamical-statistical seasonal prediction for western North Pacific typhoons based on APCC multi-models

    NASA Astrophysics Data System (ADS)

    Kim, Ok-Yeon; Kim, Hye-Mi; Lee, Myong-In; Min, Young-Mi

    2017-01-01

    This study aims at predicting the seasonal number of typhoons (TY) over the western North Pacific with an Asia-Pacific Climate Center (APCC) multi-model ensemble (MME)-based dynamical-statistical hybrid model. The hybrid model uses the statistical relationship between the number of TY during the typhoon season (July-October) and the large-scale key predictors forecasted by APCC MME for the same season. The cross validation result from the MME hybrid model demonstrates high prediction skill, with a correlation of 0.67 between the hindcasts and observation for 1982-2008. The cross validation from the hybrid model with individual models participating in MME indicates that there is no single model which consistently outperforms the other models in predicting typhoon number. Although the forecast skill of MME is not always the highest compared to that of each individual model, the skill of MME presents rather higher averaged correlations and small variance of correlations. Given large set of ensemble members from multi-models, a relative operating characteristic score reveals an 82 % (above-) and 78 % (below-normal) improvement for the probabilistic prediction of the number of TY. It implies that there is 82 % (78 %) probability that the forecasts can successfully discriminate between above normal (below-normal) from other years. The forecast skill of the hybrid model for the past 7 years (2002-2008) is more skillful than the forecast from the Tropical Storm Risk consortium. Using large set of ensemble members from multi-models, the APCC MME could provide useful deterministic and probabilistic seasonal typhoon forecasts to the end-users in particular, the residents of tropical cyclone-prone areas in the Asia-Pacific region.

  16. Does rational selection of training and test sets improve the outcome of QSAR modeling?

    PubMed

    Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

    2012-10-22

    Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

  17. Detecting the influence of rare stressors on rare species in Yosemite National Park using a novel stratified permutation test

    USGS Publications Warehouse

    Matchett, John R.; Stark, Philip B.; Ostoja, Steven M.; Knapp, Roland A.; McKenny, Heather C.; Brooks, Matthew L.; Langford, William T.; Joppa, Lucas N.; Berlow, Eric L.

    2015-01-01

    Statistical models often use observational data to predict phenomena; however, interpreting model terms to understand their influence can be problematic. This issue poses a challenge in species conservation where setting priorities requires estimating influences of potential stressors using observational data. We present a novel approach for inferring influence of a rare stressor on a rare species by blending predictive models with nonparametric permutation tests. We illustrate the approach with two case studies involving rare amphibians in Yosemite National Park, USA. The endangered frog, Rana sierrae, is known to be negatively impacted by non-native fish, while the threatened toad, Anaxyrus canorus, is potentially affected by packstock. Both stressors and amphibians are rare, occurring in ~10% of potential habitat patches. We first predict amphibian occupancy with a statistical model that includes all predictors but the stressor to stratify potential habitat by predicted suitability. A stratified permutation test then evaluates the association between stressor and amphibian, all else equal. Our approach confirms the known negative relationship between fish and R. sierrae, but finds no evidence of a negative relationship between current packstock use and A. canorus breeding. Our statistical approach has potential broad application for deriving understanding (not just prediction) from observational data.

  18. Detecting the influence of rare stressors on rare species in Yosemite National Park using a novel stratified permutation test

    PubMed Central

    Matchett, J. R.; Stark, Philip B.; Ostoja, Steven M.; Knapp, Roland A.; McKenny, Heather C.; Brooks, Matthew L.; Langford, William T.; Joppa, Lucas N.; Berlow, Eric L.

    2015-01-01

    Statistical models often use observational data to predict phenomena; however, interpreting model terms to understand their influence can be problematic. This issue poses a challenge in species conservation where setting priorities requires estimating influences of potential stressors using observational data. We present a novel approach for inferring influence of a rare stressor on a rare species by blending predictive models with nonparametric permutation tests. We illustrate the approach with two case studies involving rare amphibians in Yosemite National Park, USA. The endangered frog, Rana sierrae, is known to be negatively impacted by non-native fish, while the threatened toad, Anaxyrus canorus, is potentially affected by packstock. Both stressors and amphibians are rare, occurring in ~10% of potential habitat patches. We first predict amphibian occupancy with a statistical model that includes all predictors but the stressor to stratify potential habitat by predicted suitability. A stratified permutation test then evaluates the association between stressor and amphibian, all else equal. Our approach confirms the known negative relationship between fish and R. sierrae, but finds no evidence of a negative relationship between current packstock use and A. canorus breeding. Our statistical approach has potential broad application for deriving understanding (not just prediction) from observational data. PMID:26031755

  19. Prediction of Chemical Function: Model Development and Application

    EPA Science Inventory

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...

  20. Accuracy of topographic index models at identifying ephemeral gully trajectories on agricultural fields

    NASA Astrophysics Data System (ADS)

    Sheshukov, Aleksey Y.; Sekaluvu, Lawrence; Hutchinson, Stacy L.

    2018-04-01

    Topographic index (TI) models have been widely used to predict trajectories and initiation points of ephemeral gullies (EGs) in agricultural landscapes. Prediction of EGs strongly relies on the selected value of critical TI threshold, and the accuracy depends on topographic features, agricultural management, and datasets of observed EGs. This study statistically evaluated the predictions by TI models in two paired watersheds in Central Kansas that had different levels of structural disturbances due to implemented conservation practices. Four TI models with sole dependency on topographic factors of slope, contributing area, and planform curvature were used in this study. The observed EGs were obtained by field reconnaissance and through the process of hydrological reconditioning of digital elevation models (DEMs). The Kernel Density Estimation analysis was used to evaluate TI distribution within a 10-m buffer of the observed EG trajectories. The EG occurrence within catchments was analyzed using kappa statistics of the error matrix approach, while the lengths of predicted EGs were compared with the observed dataset using the Nash-Sutcliffe Efficiency (NSE) statistics. The TI frequency analysis produced bi-modal distribution of topographic indexes with the pixels within the EG trajectory having a higher peak. The graphs of kappa and NSE versus critical TI threshold showed similar profile for all four TI models and both watersheds with the maximum value representing the best comparison with the observed data. The Compound Topographic Index (CTI) model presented the overall best accuracy with NSE of 0.55 and kappa of 0.32. The statistics for the disturbed watershed showed higher best critical TI threshold values than for the undisturbed watershed. Structural conservation practices implemented in the disturbed watershed reduced ephemeral channels in headwater catchments, thus producing less variability in catchments with EGs. The variation in critical thresholds for all TI models suggested that TI models tend to predict EG occurrence and length over a range of thresholds rather than find a single best value.

  1. The forecasting of menstruation based on a state-space modeling of basal body temperature time series.

    PubMed

    Fukaya, Keiichi; Kawamori, Ai; Osada, Yutaka; Kitazawa, Masumi; Ishiguro, Makio

    2017-09-20

    Women's basal body temperature (BBT) shows a periodic pattern that associates with menstrual cycle. Although this fact suggests a possibility that daily BBT time series can be useful for estimating the underlying phase state as well as for predicting the length of current menstrual cycle, little attention has been paid to model BBT time series. In this study, we propose a state-space model that involves the menstrual phase as a latent state variable to explain the daily fluctuation of BBT and the menstruation cycle length. Conditional distributions of the phase are obtained by using sequential Bayesian filtering techniques. A predictive distribution of the next menstruation day can be derived based on this conditional distribution and the model, leading to a novel statistical framework that provides a sequentially updated prediction for upcoming menstruation day. We applied this framework to a real data set of women's BBT and menstruation days and compared prediction accuracy of the proposed method with that of previous methods, showing that the proposed method generally provides a better prediction. Because BBT can be obtained with relatively small cost and effort, the proposed method can be useful for women's health management. Potential extensions of this framework as the basis of modeling and predicting events that are associated with the menstrual cycles are discussed. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. Track-pattern-based seasonal prediction model for intense tropical cyclone activities over the North Atlantic and the western North Pacific basins

    NASA Astrophysics Data System (ADS)

    Choi, W.; Ho, C. H.

    2015-12-01

    Intense tropical cyclones (TCs) accompanying heavy rainfall and destructive wind gusts sometimes cause incredible socio-economic damages in the regions near their landfall. This study aims to analyze intense TC activities in the North Atlantic (NA) and the western North Pacific (WNP) basins and develop their track propensity seasonal prediction model. Considering that the number of TCs in the NA basin is much smaller than that in the WNP basin, different intensity criteria are used; category 1 and above for NA and category 3 and above for WNP based on Saffir-Simpson hurricane wind scale. By using a fuzzy clustering method, intense TC tracks in the NA and the WNP basins are classified into two and three representative patterns, respectively. Each pattern shows empirical relationships with climate variabilities such as sea surface temperature distribution associated with El Niño/La Niña or Atlantic Meridional Mode, Pacific decadal oscillation, upper and low level zonal wind, and strength of subtropical high. The hybrid statistical-dynamical method has been used to develop the seasonal prediction model for each pattern based on statistical relationships between the intense TC activity and seasonal averaged key predictors. The model performance is statistically assessed by cross validation for the training period (1982-2013) and has been applied for the 2014 and 2015 prediction. This study suggests applicability of this model to real prediction work and provide bridgehead of attempt for intense TC prediction.

  3. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  4. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  5. Statistical Mining of Predictability of Seasonal Precipitation over the United States

    NASA Technical Reports Server (NTRS)

    Lau, William K. M.; Kim, Kyu-Myong; Shen, S. P.

    2001-01-01

    Results from a new ensemble canonical correlation (ECC) prediction model yield a remarkable (10-20%) increases in baseline prediction skills for seasonal precipitation over the US for all seasons, compared to traditional statistical predictions. While the tropical Pacific, i.e., El Nino, contributes to the largest share of potential predictability in the southern tier States during boreal winter, the North Pacific and the North Atlantic are responsible for enhanced predictability in the northern Great Plains, Midwest and the southwest US during boreal summer. Most importantly, ECC significantly reduces the spring predictability barrier over the conterminous US, thereby raising the skill bar for dynamical predictions.

  6. Gaussian covariance graph models accounting for correlated marker effects in genome-wide prediction.

    PubMed

    Martínez, C A; Khare, K; Rahman, S; Elzo, M A

    2017-10-01

    Several statistical models used in genome-wide prediction assume uncorrelated marker allele substitution effects, but it is known that these effects may be correlated. In statistics, graphical models have been identified as a useful tool for covariance estimation in high-dimensional problems and it is an area that has recently experienced a great expansion. In Gaussian covariance graph models (GCovGM), the joint distribution of a set of random variables is assumed to be Gaussian and the pattern of zeros of the covariance matrix is encoded in terms of an undirected graph G. In this study, methods adapting the theory of GCovGM to genome-wide prediction were developed (Bayes GCov, Bayes GCov-KR and Bayes GCov-H). In simulated data sets, improvements in correlation between phenotypes and predicted breeding values and accuracies of predicted breeding values were found. Our models account for correlation of marker effects and permit to accommodate general structures as opposed to models proposed in previous studies, which consider spatial correlation only. In addition, they allow incorporation of biological information in the prediction process through its use when constructing graph G, and their extension to the multi-allelic loci case is straightforward. © 2017 Blackwell Verlag GmbH.

  7. High-resolution vertical profiles of groundwater electrical conductivity (EC) and chloride from direct-push EC logs

    NASA Astrophysics Data System (ADS)

    Bourke, Sarah A.; Hermann, Kristian J.; Hendry, M. Jim

    2017-11-01

    Elevated groundwater salinity associated with produced water, leaching from landfills or secondary salinity can degrade arable soils and potable water resources. Direct-push electrical conductivity (EC) profiling enables rapid, relatively inexpensive, high-resolution in-situ measurements of subsurface salinity, without requiring core collection or installation of groundwater wells. However, because the direct-push tool measures the bulk EC of both solid and liquid phases (ECa), incorporation of ECa data into regional or historical groundwater data sets requires the prediction of pore water EC (ECw) or chloride (Cl-) concentrations from measured ECa. Statistical linear regression and physically based models for predicting ECw and Cl- from ECa profiles were tested on a brine plume in central Saskatchewan, Canada. A linear relationship between ECa/ECw and porosity was more accurate for predicting ECw and Cl- concentrations than a power-law relationship (Archie's Law). Despite clay contents of up to 96%, the addition of terms to account for electrical conductance in the solid phase did not improve model predictions. In the absence of porosity data, statistical linear regression models adequately predicted ECw and Cl- concentrations from direct-push ECa profiles (ECw = 5.48 ECa + 0.78, R 2 = 0.87; Cl- = 1,978 ECa - 1,398, R 2 = 0.73). These statistical models can be used to predict ECw in the absence of lithologic data and will be particularly useful for initial site assessments. The more accurate linear physically based model can be used to predict ECw and Cl- as porosity data become available and the site-specific ECw-Cl- relationship is determined.

  8. A digital spatial predictive model of land-use change using economic and environmental inputs and a statistical tree classification approach: Thailand, 1970s--1990s

    NASA Astrophysics Data System (ADS)

    Felkner, John Sames

    The scale and extent of global land use change is massive, and has potentially powerful effects on the global climate and global atmospheric composition (Turner & Meyer, 1994). Because of this tremendous change and impact, there is an urgent need for quantitative, empirical models of land use change, especially predictive models with an ability to capture the trajectories of change (Agarwal, Green, Grove, Evans, & Schweik, 2000; Lambin et al., 1999). For this research, a spatial statistical predictive model of land use change was created and run in two provinces of Thailand. The model utilized an extensive spatial database, and used a classification tree approach for explanatory model creation and future land use (Breiman, Friedman, Olshen, & Stone, 1984). Eight input variables were used, and the trees were run on a dependent variable of land use change measured from 1979 to 1989 using classified satellite imagery. The derived tree models were used to create probability of change surfaces, and these were then used to create predicted land cover maps for 1999. These predicted 1999 maps were compared with actual 1999 landcover derived from 1999 Landsat 7 imagery. The primary research hypothesis was that an explanatory model using both economic and environmental input variables would better predict future land use change than would either a model using only economic variables or a model using only environmental. Thus, the eight input variables included four economic and four environmental variables. The results indicated a very slight superiority of the full models to predict future agricultural change and future deforestation, but a slight superiority of the economic models to predict future built change. However, the margins of superiority were too small to be statistically significant. The resulting tree structures were used, however, to derive a series of principles or "rules" governing land use change in both provinces. The model was able to predict future land use, given a series of assumptions, with 90 percent overall accuracies. The model can be used in other developing or developed country locations for future land use prediction, determination of future threatened areas, or to derive "rules" or principles driving land use change.

  9. Evaluation of Fast-Time Wake Vortex Prediction Models

    NASA Technical Reports Server (NTRS)

    Proctor, Fred H.; Hamilton, David W.

    2009-01-01

    Current fast-time wake models are reviewed and three basic types are defined. Predictions from several of the fast-time models are compared. Previous statistical evaluations of the APA-Sarpkaya and D2P fast-time models are discussed. Root Mean Square errors between fast-time model predictions and Lidar wake measurements are examined for a 24 hr period at Denver International Airport. Shortcomings in current methodology for evaluating wake errors are also discussed.

  10. Derivation and validation of in-hospital mortality prediction models in ischaemic stroke patients using administrative data.

    PubMed

    Lee, Jason; Morishima, Toshitaka; Kunisawa, Susumu; Sasaki, Noriko; Otsubo, Tetsuya; Ikai, Hiroshi; Imanaka, Yuichi

    2013-01-01

    Stroke and other cerebrovascular diseases are a major cause of death and disability. Predicting in-hospital mortality in ischaemic stroke patients can help to identify high-risk patients and guide treatment approaches. Chart reviews provide important clinical information for mortality prediction, but are laborious and limiting in sample sizes. Administrative data allow for large-scale multi-institutional analyses but lack the necessary clinical information for outcome research. However, administrative claims data in Japan has seen the recent inclusion of patient consciousness and disability information, which may allow more accurate mortality prediction using administrative data alone. The aim of this study was to derive and validate models to predict in-hospital mortality in patients admitted for ischaemic stroke using administrative data. The sample consisted of 21,445 patients from 176 Japanese hospitals, who were randomly divided into derivation and validation subgroups. Multivariable logistic regression models were developed using 7- and 30-day and overall in-hospital mortality as dependent variables. Independent variables included patient age, sex, comorbidities upon admission, Japan Coma Scale (JCS) score, Barthel Index score, modified Rankin Scale (mRS) score, and admissions after hours and on weekends/public holidays. Models were developed in the derivation subgroup, and coefficients from these models were applied to the validation subgroup. Predictive ability was analysed using C-statistics; calibration was evaluated with Hosmer-Lemeshow χ(2) tests. All three models showed predictive abilities similar or surpassing that of chart review-based models. The C-statistics were highest in the 7-day in-hospital mortality prediction model, at 0.906 and 0.901 in the derivation and validation subgroups, respectively. For the 30-day in-hospital mortality prediction models, the C-statistics for the derivation and validation subgroups were 0.893 and 0.872, respectively; in overall in-hospital mortality prediction these values were 0.883 and 0.876. In this study, we have derived and validated in-hospital mortality prediction models for three different time spans using a large population of ischaemic stroke patients in a multi-institutional analysis. The recent inclusion of JCS, Barthel Index, and mRS scores in Japanese administrative data has allowed the prediction of in-hospital mortality with accuracy comparable to that of chart review analyses. The models developed using administrative data had consistently high predictive abilities for all models in both the derivation and validation subgroups. These results have implications in the role of administrative data in future mortality prediction analyses. Copyright © 2013 S. Karger AG, Basel.

  11. Improvement of PM concentration predictability using WRF-CMAQ-DLM coupled system and its applications

    NASA Astrophysics Data System (ADS)

    Lee, Soon Hwan; Kim, Ji Sun; Lee, Kang Yeol; Shon, Keon Tae

    2017-04-01

    Air quality due to increasing Particulate Matter(PM) in Korea in Asia is getting worse. At present, the PM forecast is announced based on the PM concentration predicted from the air quality prediction numerical model. However, forecast accuracy is not as high as expected due to various uncertainties for PM physical and chemical characteristics. The purpose of this study was to develop a numerical-statistically ensemble models to improve the accuracy of prediction of PM10 concentration. Numerical models used in this study are the three dimensional atmospheric model Weather Research and Forecasting(WRF) and the community multiscale air quality model (CMAQ). The target areas for the PM forecast are Seoul, Busan, Daegu, and Daejeon metropolitan areas in Korea. The data used in the model development are PM concentration and CMAQ predictions and the data period is 3 months (March 1 - May 31, 2014). The dynamic-statistical technics for reducing the systematic error of the CMAQ predictions was applied to the dynamic linear model(DLM) based on the Baysian Kalman filter technic. As a result of applying the metrics generated from the dynamic linear model to the forecasting of PM concentrations accuracy was improved. Especially, at the high PM concentration where the damage is relatively large, excellent improvement results are shown.

  12. Statistical prediction of September Arctic Sea Ice minimum based on stable teleconnections with global climate and oceanic patterns

    NASA Astrophysics Data System (ADS)

    Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.

    2016-12-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  13. SPATIO-TEMPORAL MODELING OF AGRICULTURAL YIELD DATA WITH AN APPLICATION TO PRICING CROP INSURANCE CONTRACTS

    PubMed Central

    Ozaki, Vitor A.; Ghosh, Sujit K.; Goodwin, Barry K.; Shirota, Ricardo

    2009-01-01

    This article presents a statistical model of agricultural yield data based on a set of hierarchical Bayesian models that allows joint modeling of temporal and spatial autocorrelation. This method captures a comprehensive range of the various uncertainties involved in predicting crop insurance premium rates as opposed to the more traditional ad hoc, two-stage methods that are typically based on independent estimation and prediction. A panel data set of county-average yield data was analyzed for 290 counties in the State of Paraná (Brazil) for the period of 1990 through 2002. Posterior predictive criteria are used to evaluate different model specifications. This article provides substantial improvements in the statistical and actuarial methods often applied to the calculation of insurance premium rates. These improvements are especially relevant to situations where data are limited. PMID:19890450

  14. Predicting the stochastic guiding of kinesin-driven microtubules in microfabricated tracks: a statistical-mechanics-based modeling approach.

    PubMed

    Lin, Chih-Tin; Meyhofer, Edgar; Kurabayashi, Katsuo

    2010-01-01

    Directional control of microtubule shuttles via microfabricated tracks is key to the development of controlled nanoscale mass transport by kinesin motor molecules. Here we develop and test a model to quantitatively predict the stochastic behavior of microtubule guiding when they mechanically collide with the sidewalls of lithographically patterned tracks. By taking into account appropriate probability distributions of microscopic states of the microtubule system, the model allows us to theoretically analyze the roles of collision conditions and kinesin surface densities in determining how the motion of microtubule shuttles is controlled. In addition, we experimentally observe the statistics of microtubule collision events and compare our theoretical prediction with experimental data to validate our model. The model will direct the design of future hybrid nanotechnology devices that integrate nanoscale transport systems powered by kinesin-driven molecular shuttles.

  15. Artificial intelligence in predicting bladder cancer outcome: a comparison of neuro-fuzzy modeling and artificial neural networks.

    PubMed

    Catto, James W F; Linkens, Derek A; Abbod, Maysam F; Chen, Minyou; Burton, Julian L; Feeley, Kenneth M; Hamdy, Freddie C

    2003-09-15

    New techniques for the prediction of tumor behavior are needed, because statistical analysis has a poor accuracy and is not applicable to the individual. Artificial intelligence (AI) may provide these suitable methods. Whereas artificial neural networks (ANN), the best-studied form of AI, have been used successfully, its hidden networks remain an obstacle to its acceptance. Neuro-fuzzy modeling (NFM), another AI method, has a transparent functional layer and is without many of the drawbacks of ANN. We have compared the predictive accuracies of NFM, ANN, and traditional statistical methods, for the behavior of bladder cancer. Experimental molecular biomarkers, including p53 and the mismatch repair proteins, and conventional clinicopathological data were studied in a cohort of 109 patients with bladder cancer. For all three of the methods, models were produced to predict the presence and timing of a tumor relapse. Both methods of AI predicted relapse with an accuracy ranging from 88% to 95%. This was superior to statistical methods (71-77%; P < 0.0006). NFM appeared better than ANN at predicting the timing of relapse (P = 0.073). The use of AI can accurately predict cancer behavior. NFM has a similar or superior predictive accuracy to ANN. However, unlike the impenetrable "black-box" of a neural network, the rules of NFM are transparent, enabling validation from clinical knowledge and the manipulation of input variables to allow exploratory predictions. This technique could be used widely in a variety of areas of medicine.

  16. Statistical Analysis of Complexity Generators for Cost Estimation

    NASA Technical Reports Server (NTRS)

    Rowell, Ginger Holmes

    1999-01-01

    Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.

  17. A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA

    USGS Publications Warehouse

    Nolan, Bernard T.; Fienen, Michael N.; Lorenz, David L.

    2015-01-01

    We used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R2 and a version with R2 within one standard error of the maximum (the 1SE model). The former yielded CV training R2 values of 0.94–1.0. Cross-validation testing R2 values indicate predictive performance, and these were 0.22–0.39 for the maximum R2 models and 0.19–0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley.

  18. Watershed Regressions for Pesticides (WARP) for Predicting Annual Maximum and Annual Maximum Moving-Average Concentrations of Atrazine in Streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.; Crawford, Charles G.

    2008-01-01

    Regression models were developed for predicting annual maximum and selected annual maximum moving-average concentrations of atrazine in streams using the Watershed Regressions for Pesticides (WARP) methodology developed by the National Water-Quality Assessment Program (NAWQA) of the U.S. Geological Survey (USGS). The current effort builds on the original WARP models, which were based on the annual mean and selected percentiles of the annual frequency distribution of atrazine concentrations. Estimates of annual maximum and annual maximum moving-average concentrations for selected durations are needed to characterize the levels of atrazine and other pesticides for comparison to specific water-quality benchmarks for evaluation of potential concerns regarding human health or aquatic life. Separate regression models were derived for the annual maximum and annual maximum 21-day, 60-day, and 90-day moving-average concentrations. Development of the regression models used the same explanatory variables, transformations, model development data, model validation data, and regression methods as those used in the original development of WARP. The models accounted for 72 to 75 percent of the variability in the concentration statistics among the 112 sampling sites used for model development. Predicted concentration statistics from the four models were within a factor of 10 of the observed concentration statistics for most of the model development and validation sites. Overall, performance of the models for the development and validation sites supports the application of the WARP models for predicting annual maximum and selected annual maximum moving-average atrazine concentration in streams and provides a framework to interpret the predictions in terms of uncertainty. For streams with inadequate direct measurements of atrazine concentrations, the WARP model predictions for the annual maximum and the annual maximum moving-average atrazine concentrations can be used to characterize the probable levels of atrazine for comparison to specific water-quality benchmarks. Sites with a high probability of exceeding a benchmark for human health or aquatic life can be prioritized for monitoring.

  19. Prediction of N-nitrosodimethylamine (NDMA) formation as a disinfection by-product.

    PubMed

    Kim, Jongo; Clevenger, Thomas E

    2007-06-25

    This study investigated the possibility of a statistical model application for the prediction of N-nitrosodimethylamine (NDMA) formation. The NDMA formation was studied as a function of monochloramine concentration (0.001-5mM) at fixed dimethylamine (DMA) concentrations of 0.01mM or 0.05mM. Excellent linear correlations were observed between the molar ratio of monochloramine to DMA and the NDMA formation on a log scale at pH 7 and 8. When a developed prediction equation was applied to a previously reported study, a good result was obtained. The statistical model appears to predict adequately NDMA concentrations if other NDMA precursors are excluded. Using the predictive tool, a simple and approximate calculation of NDMA formation can be obtained in drinking water systems.

  20. Recent development of risk-prediction models for incident hypertension: An updated systematic review

    PubMed Central

    Xiao, Lei; Liu, Ya; Wang, Zuoguang; Li, Chuang; Jin, Yongxin; Zhao, Qiong

    2017-01-01

    Background Hypertension is a leading global health threat and a major cardiovascular disease. Since clinical interventions are effective in delaying the disease progression from prehypertension to hypertension, diagnostic prediction models to identify patient populations at high risk for hypertension are imperative. Methods Both PubMed and Embase databases were searched for eligible reports of either prediction models or risk scores of hypertension. The study data were collected, including risk factors, statistic methods, characteristics of study design and participants, performance measurement, etc. Results From the searched literature, 26 studies reporting 48 prediction models were selected. Among them, 20 reports studied the established models using traditional risk factors, such as body mass index (BMI), age, smoking, blood pressure (BP) level, parental history of hypertension, and biochemical factors, whereas 6 reports used genetic risk score (GRS) as the prediction factor. AUC ranged from 0.64 to 0.97, and C-statistic ranged from 60% to 90%. Conclusions The traditional models are still the predominant risk prediction models for hypertension, but recently, more models have begun to incorporate genetic factors as part of their model predictors. However, these genetic predictors need to be well selected. The current reported models have acceptable to good discrimination and calibration ability, but whether the models can be applied in clinical practice still needs more validation and adjustment. PMID:29084293

  1. Modeling the microstructurally dependent mechanical properties of poly(ester-urethane-urea)s.

    PubMed

    Warren, P Daniel; Sycks, Dalton G; McGrath, Dominic V; Vande Geest, Jonathan P

    2013-12-01

    Poly(ester-urethane-urea) (PEUU) is one of many synthetic biodegradable elastomers under scrutiny for biomedical and soft tissue applications. The goal of this study was to investigate the effect of the experimental parameters on mechanical properties of PEUUs following exposure to different degrading environments, similar to that of the human body, using linear regression, producing one predictive model. The model utilizes two independent variables of poly(caprolactone) (PCL) type and copolymer crystallinity to predict the dependent variable of maximum tangential modulus (MTM). Results indicate that comparisons between PCLs at different degradation states are statistically different (p < 0.0003), while the difference between experimental and predicted average MTM is statistically negligible (p < 0.02). The linear correlation between experimental and predicted MTM values is R(2) = 0.75. Copyright © 2013 Wiley Periodicals, Inc., a Wiley Company.

  2. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    PubMed Central

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399

  3. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.

    PubMed

    Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  4. Performance of Reclassification Statistics in Comparing Risk Prediction Models

    PubMed Central

    Paynter, Nina P.

    2012-01-01

    Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information. PMID:21294152

  5. Statistical model selection for better prediction and discovering science mechanisms that affect reliability

    DOE PAGES

    Anderson-Cook, Christine M.; Morzinski, Jerome; Blecker, Kenneth D.

    2015-08-19

    Understanding the impact of production, environmental exposure and age characteristics on the reliability of a population is frequently based on underlying science and empirical assessment. When there is incomplete science to prescribe which inputs should be included in a model of reliability to predict future trends, statistical model/variable selection techniques can be leveraged on a stockpile or population of units to improve reliability predictions as well as suggest new mechanisms affecting reliability to explore. We describe a five-step process for exploring relationships between available summaries of age, usage and environmental exposure and reliability. The process involves first identifying potential candidatemore » inputs, then second organizing data for the analysis. Third, a variety of models with different combinations of the inputs are estimated, and fourth, flexible metrics are used to compare them. As a result, plots of the predicted relationships are examined to distill leading model contenders into a prioritized list for subject matter experts to understand and compare. The complexity of the model, quality of prediction and cost of future data collection are all factors to be considered by the subject matter experts when selecting a final model.« less

  6. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    PubMed

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  7. Jet Noise Diagnostics Supporting Statistical Noise Prediction Methods

    NASA Technical Reports Server (NTRS)

    Bridges, James E.

    2006-01-01

    The primary focus of my presentation is the development of the jet noise prediction code JeNo with most examples coming from the experimental work that drove the theoretical development and validation. JeNo is a statistical jet noise prediction code, based upon the Lilley acoustic analogy. Our approach uses time-average 2-D or 3-D mean and turbulent statistics of the flow as input. The output is source distributions and spectral directivity. NASA has been investing in development of statistical jet noise prediction tools because these seem to fit the middle ground that allows enough flexibility and fidelity for jet noise source diagnostics while having reasonable computational requirements. These tools rely on Reynolds-averaged Navier-Stokes (RANS) computational fluid dynamics (CFD) solutions as input for computing far-field spectral directivity using an acoustic analogy. There are many ways acoustic analogies can be created, each with a series of assumptions and models, many often taken unknowingly. And the resulting prediction can be easily reverse-engineered by altering the models contained within. However, only an approach which is mathematically sound, with assumptions validated and modeled quantities checked against direct measurement will give consistently correct answers. Many quantities are modeled in acoustic analogies precisely because they have been impossible to measure or calculate, making this requirement a difficult task. The NASA team has spent considerable effort identifying all the assumptions and models used to take the Navier-Stokes equations to the point of a statistical calculation via an acoustic analogy very similar to that proposed by Lilley. Assumptions have been identified and experiments have been developed to test these assumptions. In some cases this has resulted in assumptions being changed. Beginning with the CFD used as input to the acoustic analogy, models for turbulence closure used in RANS CFD codes have been explored and compared against measurements of mean and rms velocity statistics over a range of jet speeds and temperatures. Models for flow parameters used in the acoustic analogy, most notably the space-time correlations of velocity, have been compared against direct measurements, and modified to better fit the observed data. These measurements have been extremely challenging for hot, high speed jets, and represent a sizeable investment in instrumentation development. As an intermediate check that the analysis is predicting the physics intended, phased arrays have been employed to measure source distributions for a wide range of jet cases. And finally, careful far-field spectral directivity measurements have been taken for final validation of the prediction code. Examples of each of these experimental efforts will be presented. The main result of these efforts is a noise prediction code, named JeNo, which is in middevelopment. JeNo is able to consistently predict spectral directivity, including aft angle directivity, for subsonic cold jets of most geometries. Current development on JeNo is focused on extending its capability to hot jets, requiring inclusion of a previously neglected second source associated with thermal fluctuations. A secondary result of the intensive experimentation is the archiving of various flow statistics applicable to other acoustic analogies and to development of time-resolved prediction methods. These will be of lasting value as we look ahead at future challenges to the aeroacoustic experimentalist.

  8. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  9. Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model

    NASA Technical Reports Server (NTRS)

    Boone, Spencer

    2017-01-01

    This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.

  10. A Public-Private Partnership Develops and Externally Validates a 30-Day Hospital Readmission Risk Prediction Model

    PubMed Central

    Choudhry, Shahid A.; Li, Jing; Davis, Darcy; Erdmann, Cole; Sikka, Rishi; Sutariya, Bharat

    2013-01-01

    Introduction: Preventing the occurrence of hospital readmissions is needed to improve quality of care and foster population health across the care continuum. Hospitals are being held accountable for improving transitions of care to avert unnecessary readmissions. Advocate Health Care in Chicago and Cerner (ACC) collaborated to develop all-cause, 30-day hospital readmission risk prediction models to identify patients that need interventional resources. Ideally, prediction models should encompass several qualities: they should have high predictive ability; use reliable and clinically relevant data; use vigorous performance metrics to assess the models; be validated in populations where they are applied; and be scalable in heterogeneous populations. However, a systematic review of prediction models for hospital readmission risk determined that most performed poorly (average C-statistic of 0.66) and efforts to improve their performance are needed for widespread usage. Methods: The ACC team incorporated electronic health record data, utilized a mixed-method approach to evaluate risk factors, and externally validated their prediction models for generalizability. Inclusion and exclusion criteria were applied on the patient cohort and then split for derivation and internal validation. Stepwise logistic regression was performed to develop two predictive models: one for admission and one for discharge. The prediction models were assessed for discrimination ability, calibration, overall performance, and then externally validated. Results: The ACC Admission and Discharge Models demonstrated modest discrimination ability during derivation, internal and external validation post-recalibration (C-statistic of 0.76 and 0.78, respectively), and reasonable model fit during external validation for utility in heterogeneous populations. Conclusions: The ACC Admission and Discharge Models embody the design qualities of ideal prediction models. The ACC plans to continue its partnership to further improve and develop valuable clinical models. PMID:24224068

  11. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    NASA Astrophysics Data System (ADS)

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.

  12. Use of predictive models and rapid methods to nowcast bacteria levels at coastal beaches

    USGS Publications Warehouse

    Francy, Donna S.

    2009-01-01

    The need for rapid assessments of recreational water quality to better protect public health is well accepted throughout the research and regulatory communities. Rapid analytical methods, such as quantitative polymerase chain reaction (qPCR) and immunomagnetic separation/adenosine triphosphate (ATP) analysis, are being tested but are not yet ready for widespread use.Another solution is the use of predictive models, wherein variable(s) that are easily and quickly measured are surrogates for concentrations of fecal-indicator bacteria. Rainfall-based alerts, the simplest type of model, have been used by several communities for a number of years. Deterministic models use mathematical representations of the processes that affect bacteria concentrations; this type of model is being used for beach-closure decisions at one location in the USA. Multivariable statistical models are being developed and tested in many areas of the USA; however, they are only used in three areas of the Great Lakes to aid in notifications of beach advisories or closings. These “operational” statistical models can result in more accurate assessments of recreational water quality than use of the previous day's Escherichia coli (E. coli)concentration as determined by traditional culture methods. The Ohio Nowcast, at Huntington Beach, Bay Village, Ohio, is described in this paper as an example of an operational statistical model. Because predictive modeling is a dynamic process, water-resource managers continue to collect additional data to improve the predictive ability of the nowcast and expand the nowcast to other Ohio beaches and a recreational river. Although predictive models have been shown to work well at some beaches and are becoming more widely accepted, implementation in many areas is limited by funding, lack of coordinated technical leadership, and lack of supporting epidemiological data.

  13. Life beyond MSE and R2 — improving validation of predictive models with observations

    NASA Astrophysics Data System (ADS)

    Papritz, Andreas; Nussbaum, Madlene

    2017-04-01

    Machine learning and statistical predictive methods are evaluated by the closeness of predictions to observations of a test dataset. Common criteria for rating predictive methods are bias and mean square error (MSE), characterizing systematic and random prediction errors. Many studies also report R2-values, but their meaning is not always clear (correlation between observations and predictions or MSE skill score; Wilks, 2011). The same criteria are also used for choosing tuning parameters of predictive procedures by cross-validation and bagging (e.g. Hastie et al., 2009). For evident reasons, atmospheric sciences have developed a rich box of tools for forecast verification. Specific criteria have been proposed for evaluating deterministic and probabilistic predictions of binary, multinomial, ordinal and continuous responses (see reviews by Wilks, 2011, Jollie and Stephenson, 2012 and Gneiting et al., 2007). It appears that these techniques are not very well-known in the geosciences community interested in machine learning. In our presentation we review techniques that offer more insight into proximity of data and predictions than bias, MSE and R2 alone. We mention here only examples: (i) Graphing observations vs. predictions is usually more appropriate than the reverse (Piñeiro et al., 2008). (ii) The decomposition of the Brier score score (= MSE for probabilistic predictions of binary yes/no data) into reliability and resolution reveals (conditional) bias and capability of discriminating yes/no observations by the predictions. We illustrate the approaches by applications from digital soil mapping studies. Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B, 69, 243-268. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York, second edition. Jolliffe, I. T. and Stephenson, D. B., editors (2012). Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley-Blackwell, second edition. Piñeiro, G., Perelman, S., Guerschman, J., and Paruelo, J. (2008). How to evaluate models: Observed vs. predicted or predicted vs. observed? Ecological Modelling, 216, 316-322. Wilks, D. S. (2011). Statistical Methods in the Atmospheric Sciences. Academic Press, third edition.

  14. Variability-aware compact modeling and statistical circuit validation on SRAM test array

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Spanos, Costas J.

    2016-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose a variability-aware compact model characterization methodology based on stepwise parameter selection. Transistor I-V measurements are obtained from bit transistor accessible SRAM test array fabricated using a collaborating foundry's 28nm FDSOI technology. Our in-house customized Monte Carlo simulation bench can incorporate these statistical compact models; and simulation results on SRAM writability performance are very close to measurements in distribution estimation. Our proposed statistical compact model parameter extraction methodology also has the potential of predicting non-Gaussian behavior in statistical circuit performances through mixtures of Gaussian distributions.

  15. A model of strength

    USGS Publications Warehouse

    Johnson, Douglas H.; Cook, R.D.

    2013-01-01

    In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.

  16. Physics-based statistical model and simulation method of RF propagation in urban environments

    DOEpatents

    Pao, Hsueh-Yuan; Dvorak, Steven L.

    2010-09-14

    A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.

  17. Artificial neural network study on organ-targeting peptides

    NASA Astrophysics Data System (ADS)

    Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

    2010-01-01

    We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.

  18. 10 CFR 431.445 - Determination of small electric motor efficiency.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... statistical analysis, computer simulation or modeling, or other analytic evaluation of performance data. (3... statistical analysis, computer simulation or modeling, and other analytic evaluation of performance data on.... (ii) If requested by the Department, the manufacturer shall conduct simulations to predict the...

  19. Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.

    PubMed

    Smeers, Inge; Decorte, Ronny; Van de Voorde, Wim; Bekaert, Bram

    2018-05-01

    DNA methylation is a promising biomarker for forensic age prediction. A challenge that has emerged in recent studies is the fact that prediction errors become larger with increasing age due to interindividual differences in epigenetic ageing rates. This phenomenon of non-constant variance or heteroscedasticity violates an assumption of the often used method of ordinary least squares (OLS) regression. The aim of this study was to evaluate alternative statistical methods that do take heteroscedasticity into account in order to provide more accurate, age-dependent prediction intervals. A weighted least squares (WLS) regression is proposed as well as a quantile regression model. Their performances were compared against an OLS regression model based on the same dataset. Both models provided age-dependent prediction intervals which account for the increasing variance with age, but WLS regression performed better in terms of success rate in the current dataset. However, quantile regression might be a preferred method when dealing with a variance that is not only non-constant, but also not normally distributed. Ultimately the choice of which model to use should depend on the observed characteristics of the data. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.

    2014-12-01

    Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.

  1. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2014-12-01

    moving relative to the water in which they are immersed, reflecting the true school movement dynamics . There has also been work to implement this...Engineering Department Woods Hole Oceanographic Institution 98 Water Street, MS #11 Woods Hole, MA 02543 9. SPONSORING/MONITORING AGENCY NAME(S) AND...were measured with multi-beam sonars and quantified in terms of important aspects offish dynamics ; and predictions were made of echo statistics of a

  2. Statistical modelling of networked human-automation performance using working memory capacity.

    PubMed

    Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja

    2014-01-01

    This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.

  3. Statistical Analysis of CFD Solutions from the 6th AIAA CFD Drag Prediction Workshop

    NASA Technical Reports Server (NTRS)

    Derlaga, Joseph M.; Morrison, Joseph H.

    2017-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N- version test of a collection of Reynolds-averaged Navier-Stokes computational uid dynam- ics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using both common and custom grid sequencees as well as multiple turbulence models for the June 2016 6th AIAA CFD Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic con guration for this workshop was the Common Research Model subsonic transport wing- body previously used for both the 4th and 5th Drag Prediction Workshops. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.

  4. Functional status predicts acute care readmission in the traumatic spinal cord injury population.

    PubMed

    Huang, Donna; Slocum, Chloe; Silver, Julie K; Morgan, James W; Goldstein, Richard; Zafonte, Ross; Schneider, Jeffrey C

    2018-03-29

    Context/objective Acute care readmission has been identified as an important marker of healthcare quality. Most previous models assessing risk prediction of readmission incorporate variables for medical comorbidity. We hypothesized that functional status is a more robust predictor of readmission in the spinal cord injury population than medical comorbidities. Design Retrospective cross-sectional analysis. Setting Inpatient rehabilitation facilities, Uniform Data System for Medical Rehabilitation data from 2002 to 2012 Participants traumatic spinal cord injury patients. Outcome measures A logistic regression model for predicting acute care readmission based on demographic variables and functional status (Functional Model) was compared with models incorporating demographics, functional status, and medical comorbidities (Functional-Plus) or models including demographics and medical comorbidities (Demographic-Comorbidity). The primary outcomes were 3- and 30-day readmission, and the primary measure of model performance was the c-statistic. Results There were a total of 68,395 patients with 1,469 (2.15%) readmitted at 3 days and 7,081 (10.35%) readmitted at 30 days. The c-statistics for the Functional Model were 0.703 and 0.654 for 3 and 30 days. The Functional Model outperformed Demographic-Comorbidity models at 3 days (c-statistic difference: 0.066-0.096) and outperformed two of the three Demographic-Comorbidity models at 30 days (c-statistic difference: 0.029-0.056). The Functional-Plus models exhibited negligible improvements (0.002-0.010) in model performance compared to the Functional models. Conclusion Readmissions are used as a marker of hospital performance. Function-based readmission models in the spinal cord injury population outperform models incorporating medical comorbidities. Readmission risk models for this population would benefit from the inclusion of functional status.

  5. Building and verifying a severity prediction model of acute pancreatitis (AP) based on BISAP, MEWS and routine test indexes.

    PubMed

    Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei

    2017-10-01

    To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.

  6. External model validation of binary clinical risk prediction models in cardiovascular and thoracic surgery.

    PubMed

    Hickey, Graeme L; Blackstone, Eugene H

    2016-08-01

    Clinical risk-prediction models serve an important role in healthcare. They are used for clinical decision-making and measuring the performance of healthcare providers. To establish confidence in a model, external model validation is imperative. When designing such an external model validation study, thought must be given to patient selection, risk factor and outcome definitions, missing data, and the transparent reporting of the analysis. In addition, there are a number of statistical methods available for external model validation. Execution of a rigorous external validation study rests in proper study design, application of suitable statistical methods, and transparent reporting. Copyright © 2016 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.

  7. Quantifying predictability in a model with statistical features of the atmosphere

    PubMed Central

    Kleeman, Richard; Majda, Andrew J.; Timofeyev, Ilya

    2002-01-01

    The Galerkin truncated inviscid Burgers equation has recently been shown by the authors to be a simple model with many degrees of freedom, with many statistical properties similar to those occurring in dynamical systems relevant to the atmosphere. These properties include long time-correlated, large-scale modes of low frequency variability and short time-correlated “weather modes” at smaller scales. The correlation scaling in the model extends over several decades and may be explained by a simple theory. Here a thorough analysis of the nature of predictability in the idealized system is developed by using a theoretical framework developed by R.K. This analysis is based on a relative entropy functional that has been shown elsewhere by one of the authors to measure the utility of statistical predictions precisely. The analysis is facilitated by the fact that most relevant probability distributions are approximately Gaussian if the initial conditions are assumed to be so. Rather surprisingly this holds for both the equilibrium (climatological) and nonequilibrium (prediction) distributions. We find that in most cases the absolute difference in the first moments of these two distributions (the “signal” component) is the main determinant of predictive utility variations. Contrary to conventional belief in the ensemble prediction area, the dispersion of prediction ensembles is generally of secondary importance in accounting for variations in utility associated with different initial conditions. This conclusion has potentially important implications for practical weather prediction, where traditionally most attention has focused on dispersion and its variability. PMID:12429863

  8. Comparisons of modeled height predictions to ocular height estimates

    Treesearch

    W.A. Bechtold; S.J. Zarnoch; W.G. Burkman

    1998-01-01

    Equations used by USDA Forest Service Forest Inventory and Analysis projects to predict individual tree heights on the basis of species and d.b.h. were improved by the addition of mean overstory height. However, ocular estimates of total height by field crews were more accurate than the statistically improved models, especially for hardwood species. Height predictions...

  9. Bayesian inference of physiologically meaningful parameters from body sway measurements.

    PubMed

    Tietäväinen, A; Gutmann, M U; Keski-Vakkuri, E; Corander, J; Hæggström, E

    2017-06-19

    The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.

  10. A method for evaluating the importance of system state observations to model predictions, with application to the Death Valley regional groundwater flow system

    USGS Publications Warehouse

    Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.; O'Brien, Grady M.

    2004-01-01

    We develop a new observation‐prediction (OPR) statistic for evaluating the importance of system state observations to model predictions. The OPR statistic measures the change in prediction uncertainty produced when an observation is added to or removed from an existing monitoring network, and it can be used to guide refinement and enhancement of the network. Prediction uncertainty is approximated using a first‐order second‐moment method. We apply the OPR statistic to a model of the Death Valley regional groundwater flow system (DVRFS) to evaluate the importance of existing and potential hydraulic head observations to predicted advective transport paths in the saturated zone underlying Yucca Mountain and underground testing areas on the Nevada Test Site. Important existing observations tend to be far from the predicted paths, and many unimportant observations are in areas of high observation density. These results can be used to select locations at which increased observation accuracy would be beneficial and locations that could be removed from the network. Important potential observations are mostly in areas of high hydraulic gradient far from the paths. Results for both existing and potential observations are related to the flow system dynamics and coarse parameter zonation in the DVRFS model. If system properties in different locations are as similar as the zonation assumes, then the OPR results illustrate a data collection opportunity whereby observations in distant, high‐gradient areas can provide information about properties in flatter‐gradient areas near the paths. If this similarity is suspect, then the analysis produces a different type of data collection opportunity involving testing of model assumptions critical to the OPR results.

  11. Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions

    PubMed Central

    Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu

    2014-01-01

    Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917

  12. Climate Prediction - NOAA's National Weather Service

    Science.gov Websites

    Statistical Models... MOS Prod GFS-LAMP Prod Climate Past Weather Predictions Weather Safety Weather Radio National Weather Service on FaceBook NWS on Facebook NWS Director Home > Climate > Predictions Climate Prediction Long range forecasts across the U.S. Climate Prediction Web Sites Climate Prediction

  13. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar The Mesoscale Modeling Branch conducts a program of research and development in support of the prediction. This research and development includes mesoscale four-dimensional data assimilation of domestic

  14. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar WEATHER RESEARCH and FORECASTING HMON HMON - OPERATIONAL HURRICANE FORECASTING WAVEWATCH III WAVEWATCH III Modeling Center NOAA Center for Weather and Climate Prediction (NCWCP) 5830 University Research Court

  15. Predicting perceptual quality of images in realistic scenario using deep filter banks

    NASA Astrophysics Data System (ADS)

    Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

    2018-03-01

    Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.

  16. Statistical physics of interacting neural networks

    NASA Astrophysics Data System (ADS)

    Kinzel, Wolfgang; Metzler, Richard; Kanter, Ido

    2001-12-01

    Recent results on the statistical physics of time series generation and prediction are presented. A neural network is trained on quasi-periodic and chaotic sequences and overlaps to the sequence generator as well as the prediction errors are calculated numerically. For each network there exists a sequence for which it completely fails to make predictions. Two interacting networks show a transition to perfect synchronization. A pool of interacting networks shows good coordination in the minority game-a model of competition in a closed market. Finally, as a demonstration, a perceptron predicts bit sequences produced by human beings.

  17. Nonlinear wave chaos: statistics of second harmonic fields.

    PubMed

    Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M

    2017-10-01

    Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.

  18. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nedic, Vladimir, E-mail: vnedic@kg.ac.rs; Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs; Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs

    2014-11-15

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. Themore » output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.« less

  19. Tree injury and mortality in fires: developing process-based models

    Treesearch

    Bret W. Butler; Matthew B. Dickinson

    2010-01-01

    Wildland fire managers are often required to predict tree injury and mortality when planning a prescribed burn or when considering wildfire management options; and, currently, statistical models based on post-fire observations are the only tools available for this purpose. Implicit in the derivation of statistical models is the assumption that they are strictly...

  20. Comparison of statistical and theoretical habitat models for conservation planning: the benefit of ensemble prediction

    Treesearch

    D. Todd Jones-Farrand; Todd M. Fearer; Wayne E. Thogmartin; Frank R. Thompson; Mark D. Nelson; John M. Tirpak

    2011-01-01

    Selection of a modeling approach is an important step in the conservation planning process, but little guidance is available. We compared two statistical and three theoretical habitat modeling approaches representing those currently being used for avian conservation planning at landscape and regional scales: hierarchical spatial count (HSC), classification and...

  1. Forecasting runout of rock and debris avalanches

    USGS Publications Warehouse

    Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.

    2006-01-01

    Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.

  2. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  3. Prediction of the dollar to the ruble rate. A system-theoretic approach

    NASA Astrophysics Data System (ADS)

    Borodachev, Sergey M.

    2017-07-01

    Proposed a simple state-space model of dollar rate formation based on changes in oil prices and some mechanisms of money transfer between monetary and stock markets. Comparison of predictions by means of input-output model and state-space model is made. It concludes that with proper use of statistical data (Kalman filter) the second approach provides more adequate predictions of the dollar rate.

  4. A novel risk score model for prediction of contrast-induced nephropathy after emergent percutaneous coronary intervention.

    PubMed

    Lin, Kai-Yang; Zheng, Wei-Ping; Bei, Wei-Jie; Chen, Shi-Qun; Islam, Sheikh Mohammed Shariful; Liu, Yong; Xue, Lin; Tan, Ning; Chen, Ji-Yan

    2017-03-01

    A few studies developed simple risk model for predicting CIN with poor prognosis after emergent PCI. The study aimed to develop and validate a novel tool for predicting the risk of contrast-induced nephropathy (CIN) in patients undergoing emergent percutaneous coronary intervention (PCI). 692 consecutive patients undergoing emergent PCI between January 2010 and December 2013 were randomly (2:1) assigned to a development dataset (n=461) and a validation dataset (n=231). Multivariate logistic regression was applied to identify independent predictors of CIN, and established CIN predicting model, whose prognostic accuracy was assessed using the c-statistic for discrimination and the Hosmere Lemeshow test for calibration. The overall incidence of CIN was 55(7.9%). A total of 11 variables were analyzed, including age >75years old, baseline serum creatinine (SCr)>1.5mg/dl, hypotension and the use of intra-aortic balloon pump(IABP), which were identified to enter risk score model (Chen). The incidence of CIN was 32(6.9%) in the development dataset (in low risk (score=0), 1.0%, moderate risk (score:1-2), 13.4%, high risk (score≥3), 90.0%). Compared to the classical Mehran's and ACEF CIN risk score models, the risk score (Chen) across the subgroup of the study population exhibited similar discrimination and predictive ability on CIN (c-statistic:0.828, 0.776, 0.853, respectively), in-hospital mortality, 2, 3-years mortality (c-statistic:0.738.0.750, 0.845, respectively) in the validation population. Our data showed that this simple risk model exhibited good discrimination and predictive ability on CIN, similar to Mehran's and ACEF score, and even on long-term mortality after emergent PCI. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Invasive Species Distribution Modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?

    Treesearch

    Tomáš Václavík; Ross K. Meentemeyer

    2009-01-01

    Species distribution models (SDMs) based on statistical relationships between occurrence data and underlying environmental conditions are increasingly used to predict spatial patterns of biological invasions and prioritize locations for early detection and control of invasion outbreaks. However, invasive species distribution models (iSDMs) face special challenges...

  6. Modelling for Prediction vs. Modelling for Understanding: Commentary on Musso et al. (2013)

    ERIC Educational Resources Information Center

    Edelsbrunner, Peter; Schneider, Michael

    2013-01-01

    Musso et al. (2013) predict students' academic achievement with high accuracy one year in advance from cognitive and demographic variables, using artificial neural networks (ANNs). They conclude that ANNs have high potential for theoretical and practical improvements in learning sciences. ANNs are powerful statistical modelling tools but they can…

  7. Integrated Wind Power Planning Tool

    NASA Astrophysics Data System (ADS)

    Rosgaard, Martin; Giebel, Gregor; Skov Nielsen, Torben; Hahmann, Andrea; Sørensen, Poul; Madsen, Henrik

    2013-04-01

    This poster presents the current state of the public service obligation (PSO) funded project PSO 10464, with the title "Integrated Wind Power Planning Tool". The goal is to integrate a mesoscale numerical weather prediction (NWP) model with purely statistical tools in order to assess wind power fluctuations, with focus on long term power system planning for future wind farms as well as short term forecasting for existing wind farms. Currently, wind power fluctuation models are either purely statistical or integrated with NWP models of limited resolution. Using the state-of-the-art mesoscale NWP model Weather Research & Forecasting model (WRF) the forecast error is sought quantified in dependence of the time scale involved. This task constitutes a preparative study for later implementation of features accounting for NWP forecast errors in the DTU Wind Energy maintained Corwind code - a long term wind power planning tool. Within the framework of PSO 10464 research related to operational short term wind power prediction will be carried out, including a comparison of forecast quality at different mesoscale NWP model resolutions and development of a statistical wind power prediction tool taking input from WRF. The short term prediction part of the project is carried out in collaboration with ENFOR A/S; a Danish company that specialises in forecasting and optimisation for the energy sector. The integrated prediction model will allow for the description of the expected variability in wind power production in the coming hours to days, accounting for its spatio-temporal dependencies, and depending on the prevailing weather conditions defined by the WRF output. The output from the integrated short term prediction tool constitutes scenario forecasts for the coming period, which can then be fed into any type of system model or decision making problem to be solved. The high resolution of the WRF results loaded into the integrated prediction model will ensure a high accuracy data basis is available for use in the decision making process of the Danish transmission system operator. The need for high accuracy predictions will only increase over the next decade as Denmark approaches the goal of 50% wind power based electricity in 2025 from the current 20%.

  8. Improved Rubin-Bodner Model for the Prediction of Soft Tissue Deformations

    PubMed Central

    Zhang, Guangming; Xia, James J.; Liebschner, Michael; Zhang, Xiaoyan; Kim, Daeseung; Zhou, Xiaobo

    2016-01-01

    In craniomaxillofacial (CMF) surgery, a reliable way of simulating the soft tissue deformation resulted from skeletal reconstruction is vitally important for preventing the risks of facial distortion postoperatively. However, it is difficult to simulate the soft tissue behaviors affected by different types of CMF surgery. This study presents an integrated bio-mechanical and statistical learning model to improve accuracy and reliability of predictions on soft facial tissue behavior. The Rubin-Bodner (RB) model is initially used to describe the biomechanical behavior of the soft facial tissue. Subsequently, a finite element model (FEM) computers the stress of each node in soft facial tissue mesh data resulted from bone displacement. Next, the Generalized Regression Neural Network (GRNN) method is implemented to obtain the relationship between the facial soft tissue deformation and the stress distribution corresponding to different CMF surgical types and to improve evaluation of elastic parameters included in the RB model. Therefore, the soft facial tissue deformation can be predicted by biomechanical properties and statistical model. Leave-one-out cross-validation is used on eleven patients. As a result, the average prediction error of our model (0.7035mm) is lower than those resulting from other approaches. It also demonstrates that the more accurate bio-mechanical information the model has, the better prediction performance it could achieve. PMID:27717593

  9. An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models

    NASA Astrophysics Data System (ADS)

    Zaitchik, B. F.; Berhane, F.; Tadesse, T.

    2015-12-01

    We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS

  10. Skillful prediction of hot temperature extremes over the source region of ancient Silk Road.

    PubMed

    Zhang, Jingyong; Yang, Zhanmei; Wu, Lingyun

    2018-04-27

    The source region of ancient Silk Road (SRASR) in China, a region of around 150 million people, faces a rapidly increased risk of extreme heat in summer. In this study, we develop statistical models to predict summer hot temperature extremes over the SRASR based on a timescale decomposition approach. Results show that after removing the linear trends, the inter-annual components of summer hot days and heatwaves over the SRASR are significantly related with those of spring soil temperature over Central Asia and sea surface temperature over Northwest Atlantic while their inter-decadal components are closely linked to those of spring East Pacific/North Pacific pattern and Atlantic Multidecadal Oscillation for 1979-2016. The physical processes involved are also discussed. Leave-one-out cross-validation for detrended 1979-2016 time series indicates that the statistical models based on identified spring predictors can predict 47% and 57% of the total variances of summer hot days and heatwaves averaged over the SRASR, respectively. When the linear trends are put back, the prediction skills increase substantially to 64% and 70%. Hindcast experiments for 2012-2016 show high skills in predicting spatial patterns of hot temperature extremes over the SRASR. The statistical models proposed herein can be easily applied to operational seasonal forecasting.

  11. Statistical modelling predicts almost complete loss of major periglacial processes in Northern Europe by 2100.

    PubMed

    Aalto, Juha; Harrison, Stephan; Luoto, Miska

    2017-09-11

    The periglacial realm is a major part of the cryosphere, covering a quarter of Earth's land surface. Cryogenic land surface processes (LSPs) control landscape development, ecosystem functioning and climate through biogeochemical feedbacks, but their response to contemporary climate change is unclear. Here, by statistically modelling the current and future distributions of four major LSPs unique to periglacial regions at fine scale, we show fundamental changes in the periglacial climate realm are inevitable with future climate change. Even with the most optimistic CO 2 emissions scenario (Representative Concentration Pathway (RCP) 2.6) we predict a 72% reduction in the current periglacial climate realm by 2050 in our climatically sensitive northern Europe study area. These impacts are projected to be especially severe in high-latitude continental interiors. We further predict that by the end of the twenty-first century active periglacial LSPs will exist only at high elevations. These results forecast a future tipping point in the operation of cold-region LSP, and predict fundamental landscape-level modifications in ground conditions and related atmospheric feedbacks.Cryogenic land surface processes characterise the periglacial realm and control landscape development and ecosystem functioning. Here, via statistical modelling, the authors predict a 72% reduction of the periglacial realm in Northern Europe by 2050, and almost complete disappearance by 2100.

  12. Prediction of local concentration statistics in variably saturated soils: Influence of observation scale and comparison with field data

    NASA Astrophysics Data System (ADS)

    Graham, Wendy; Destouni, Georgia; Demmy, George; Foussereau, Xavier

    1998-07-01

    The methodology developed in Destouni and Graham [Destouni, G., Graham, W.D., 1997. The influence of observation method on local concentration statistics in the subsurface. Water Resour. Res. 33 (4) 663-676.] for predicting locally measured concentration statistics for solute transport in heterogeneous porous media under saturated flow conditions is applied to the prediction of conservative nonreactive solute transport in the vadose zone where observations are obtained by soil coring. Exact analytical solutions are developed for both the mean and variance of solute concentrations measured in discrete soil cores using a simplified physical model for vadose-zone flow and solute transport. Theoretical results show that while the ensemble mean concentration is relatively insensitive to the length-scale of the measurement, predictions of the concentration variance are significantly impacted by the sampling interval. Results also show that accounting for vertical heterogeneity in the soil profile results in significantly less spreading in the mean and variance of the measured solute breakthrough curves, indicating that it is important to account for vertical heterogeneity even for relatively small travel distances. Model predictions for both the mean and variance of locally measured solute concentration, based on independently estimated model parameters, agree well with data from a field tracer test conducted in Manatee County, Florida.

  13. On Theoretical Broadband Shock-Associated Noise Near-Field Cross-Spectra

    NASA Technical Reports Server (NTRS)

    Miller, Steven A. E.

    2015-01-01

    The cross-spectral acoustic analogy is used to predict auto-spectra and cross-spectra of broadband shock-associated noise in the near-field and far-field from a range of heated and unheated supersonic off-design jets. A single equivalent source model is proposed for the near-field, mid-field, and far-field terms, that contains flow-field statistics of the shock wave shear layer interactions. Flow-field statistics are modeled based upon experimental observation and computational fluid dynamics solutions. An axisymmetric assumption is used to reduce the model to a closed-form equation involving a double summation over the equivalent source at each shock wave shear layer interaction. Predictions are compared with a wide variety of measurements at numerous jet Mach numbers and temperature ratios from multiple facilities. Auto-spectral predictions of broadband shock-associated noise in the near-field and far-field capture trends observed in measurement and other prediction theories. Predictions of spatial coherence of broadband shock-associated noise accurately capture the peak coherent intensity, frequency, and spectral width.

  14. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  15. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  16. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar ) of the Environmental Modeling Center (EMC) conducts a program of research and development in support Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author: EMC

  17. Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

    The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less

  18. Spatial statistical network models for stream and river temperature in New England, USA

    NASA Astrophysics Data System (ADS)

    Detenbeck, Naomi E.; Morrison, Alisa C.; Abele, Ralph W.; Kopp, Darin A.

    2016-08-01

    Watershed managers are challenged by the need for predictive temperature models with sufficient accuracy and geographic breadth for practical use. We described thermal regimes of New England rivers and streams based on a reduced set of metrics for the May-September growing season (July or August median temperature, diurnal rate of change, and magnitude and timing of growing season maximum) chosen through principal component analysis of 78 candidate metrics. We then developed and assessed spatial statistical models for each of these metrics, incorporating spatial autocorrelation based on both distance along the flow network and Euclidean distance between points. Calculation of spatial autocorrelation based on travel or retention time in place of network distance yielded tighter-fitting Torgegrams with less scatter but did not improve overall model prediction accuracy. We predicted monthly median July or August stream temperatures as a function of median air temperature, estimated urban heat island effect, shaded solar radiation, main channel slope, watershed storage (percent lake and wetland area), percent coarse-grained surficial deposits, and presence or maximum depth of a lake immediately upstream, with an overall root-mean-square prediction error of 1.4 and 1.5°C, respectively. Growing season maximum water temperature varied as a function of air temperature, local channel slope, shaded August solar radiation, imperviousness, and watershed storage. Predictive models for July or August daily range, maximum daily rate of change, and timing of growing season maximum were statistically significant but explained a much lower proportion of variance than the above models (5-14% of total).

  19. Predicting future protection of respirator users: Statistical approaches and practical implications.

    PubMed

    Hu, Chengcheng; Harber, Philip; Su, Jing

    2016-01-01

    The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.

  20. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author: EMC

  1. Seeing number using texture: How summary statistics account for reductions in perceived numerosity in the visual periphery.

    PubMed

    Balas, Benjamin

    2016-11-01

    Peripheral visual perception is characterized by reduced information about appearance due to constraints on how image structure is represented. Visual crowding is a consequence of excessive integration in the visual periphery. Basic phenomenology of visual crowding and other tasks have been successfully accounted for by a summary-statistic model of pooling, suggesting that texture-like processing is useful for how information is reduced in peripheral vision. I attempt to extend the scope of this model by examining a property of peripheral vision: reduced perceived numerosity in the periphery. I demonstrate that a summary-statistic model of peripheral appearance accounts for reduced numerosity in peripherally viewed arrays of randomly placed dots, but does not account for observed effects of dot clustering within such arrays. The model thus offers a limited account of how numerosity is perceived in the visual periphery. I also demonstrate that the model predicts that numerosity estimation is sensitive to element shape, which represents a novel prediction regarding the phenomenology of peripheral numerosity perception. Finally, I discuss ways to extend the model to a broader range of behavior and the potential for using the model to make further predictions about how number is perceived in untested scenarios in peripheral vision.

  2. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  3. Financial Stylized Facts in the Word of Mouth Model

    NASA Astrophysics Data System (ADS)

    Misawa, Tadanobu; Watanabe, Kyoko; Shimokawa, Tetsuya

    Recently, we proposed an agent-based model called the word of mouth model to analyze the influence of an information transmission process to price formation in financial markets. Especially, the short-term predictability of asset return was focused on and an explanation in the view of information transmission was provided to the question why the predictability was much clearly observed in the small-sized stocks. This paper, to extend the previous study, demonstrates that the word of mouth model also has a consistency with other important financial stylized facts. This strengthens the possibility that the information transmission among investors plays a crucial role in price formation. Concretely, this paper addresses two famous statistical features of returns; the leptokurtic distribution of return and the autocorrelation of return volatility. The reasons why these statistical facts receive especial attentions of researchers among financial stylized facts are their statistical robustness and practical importance, such as the applications to the derivative pricing problems.

  4. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

    PubMed

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A

    2012-03-15

    To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Statistical prediction of dynamic distortion of inlet flow using minimum dynamic measurement. An application to the Melick statistical method and inlet flow dynamic distortion prediction without RMS measurements

    NASA Technical Reports Server (NTRS)

    Schweikhard, W. G.; Chen, Y. S.

    1986-01-01

    The Melick method of inlet flow dynamic distortion prediction by statistical means is outlined. A hypothetic vortex model is used as the basis for the mathematical formulations. The main variables are identified by matching the theoretical total pressure rms ratio with the measured total pressure rms ratio. Data comparisons, using the HiMAT inlet test data set, indicate satisfactory prediction of the dynamic peak distortion for cases with boundary layer control device vortex generators. A method for the dynamic probe selection was developed. Validity of the probe selection criteria is demonstrated by comparing the reduced-probe predictions with the 40-probe predictions. It is indicated that the the number of dynamic probes can be reduced to as few as two and still retain good accuracy.

  6. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance.

    PubMed

    Meads, Catherine; Ahmed, Ikhlaaq; Riley, Richard D

    2012-04-01

    A risk prediction model is a statistical tool for estimating the probability that a currently healthy individual with specific risk factors will develop a condition in the future such as breast cancer. Reliably accurate prediction models can inform future disease burdens, health policies and individual decisions. Breast cancer prediction models containing modifiable risk factors, such as alcohol consumption, BMI or weight, condom use, exogenous hormone use and physical activity, are of particular interest to women who might be considering how to reduce their risk of breast cancer and clinicians developing health policies to reduce population incidence rates. We performed a systematic review to identify and evaluate the performance of prediction models for breast cancer that contain modifiable factors. A protocol was developed and a sensitive search in databases including MEDLINE and EMBASE was conducted in June 2010. Extensive use was made of reference lists. Included were any articles proposing or validating a breast cancer prediction model in a general female population, with no language restrictions. Duplicate data extraction and quality assessment were conducted. Results were summarised qualitatively, and where possible meta-analysis of model performance statistics was undertaken. The systematic review found 17 breast cancer models, each containing a different but often overlapping set of modifiable and other risk factors, combined with an estimated baseline risk that was also often different. Quality of reporting was generally poor, with characteristics of included participants and fitted model results often missing. Only four models received independent validation in external data, most notably the 'Gail 2' model with 12 validations. None of the models demonstrated consistently outstanding ability to accurately discriminate between those who did and those who did not develop breast cancer. For example, random-effects meta-analyses of the performance of the 'Gail 2' model showed the average C statistic was 0.63 (95% CI 0.59-0.67), and the expected/observed ratio of events varied considerably across studies (95% prediction interval for E/O ratio when the model was applied in practice was 0.75-1.19). There is a need for models with better predictive performance but, given the large amount of work already conducted, further improvement of existing models based on conventional risk factors is perhaps unlikely. Research to identify new risk factors with large additionally predictive ability is therefore needed, alongside clearer reporting and continual validation of new models as they develop.

  7. An Easy Tool to Predict Survival in Patients Receiving Radiation Therapy for Painful Bone Metastases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Westhoff, Paulien G., E-mail: p.g.westhoff@umcutrecht.nl; Graeff, Alexander de; Monninkhof, Evelyn M.

    2014-11-15

    Purpose: Patients with bone metastases have a widely varying survival. A reliable estimation of survival is needed for appropriate treatment strategies. Our goal was to assess the value of simple prognostic factors, namely, patient and tumor characteristics, Karnofsky performance status (KPS), and patient-reported scores of pain and quality of life, to predict survival in patients with painful bone metastases. Methods and Materials: In the Dutch Bone Metastasis Study, 1157 patients were treated with radiation therapy for painful bone metastases. At randomization, physicians determined the KPS; patients rated general health on a visual analogue scale (VAS-gh), valuation of life on amore » verbal rating scale (VRS-vl) and pain intensity. To assess the predictive value of the variables, we used multivariate Cox proportional hazard analyses and C-statistics for discriminative value. Of the final model, calibration was assessed. External validation was performed on a dataset of 934 patients who were treated with radiation therapy for vertebral metastases. Results: Patients had mainly breast (39%), prostate (23%), or lung cancer (25%). After a maximum of 142 weeks' follow-up, 74% of patients had died. The best predictive model included sex, primary tumor, visceral metastases, KPS, VAS-gh, and VRS-vl (C-statistic = 0.72, 95% CI = 0.70-0.74). A reduced model, with only KPS and primary tumor, showed comparable discriminative capacity (C-statistic = 0.71, 95% CI = 0.69-0.72). External validation showed a C-statistic of 0.72 (95% CI = 0.70-0.73). Calibration of the derivation and the validation dataset showed underestimation of survival. Conclusion: In predicting survival in patients with painful bone metastases, KPS combined with primary tumor was comparable to a more complex model. Considering the amount of variables in complex models and the additional burden on patients, the simple model is preferred for daily use. In addition, a risk table for survival is provided.« less

  8. Application of statistical classification methods for predicting the acceptability of well-water quality

    NASA Astrophysics Data System (ADS)

    Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

    2018-06-01

    The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.

  9. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

    PubMed

    Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

    2016-12-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  10. PREDICTION OF VO2PEAK USING OMNI RATINGS OF PERCEIVED EXERTION FROM A SUBMAXIMAL CYCLE EXERCISE TEST

    PubMed Central

    Mays, Ryan J.; Goss, Fredric L.; Nagle-Stilley, Elizabeth F.; Gallagher, Michael; Schafer, Mark A.; Kim, Kevin H.; Robertson, Robert J.

    2015-01-01

    Summary The primary aim of this study was to develop statistical models to predict peak oxygen consumption (VO2peak) using OMNI Ratings of Perceived Exertion measured during submaximal cycle ergometry. Men (mean ± standard error: 20.90 ± 0.42 yrs) and women (21.59 ± 0.49 yrs) participants (n = 81) completed a load-incremented maximal cycle ergometer exercise test. Simultaneous multiple linear regression was used to develop separate VO2peak statistical models using submaximal ratings of perceived exertion for the overall body, legs, and chest/breathing as predictor variables. VO2peak (L·min−1) predicted for men and women from ratings of perceived exertion for the overall body (3.02 ± 0.06; 2.03 ± 0.04), legs (3.02 ± 0.06; 2.04 ± 0.04) and chest/breathing (3.02 ± 0.05; 2.03 ± 0.03) were similar with measured VO2peak (3.02 ± 0.10; 2.03 ± 0.06, ps > .05). Statistical models based on submaximal OMNI Ratings of Perceived Exertion provide an easily administered and accurate method to predict VO2peak. PMID:25068750

  11. A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Travis, James E.

    2013-01-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.

  12. Primary Sclerosing Cholangitis Risk Estimate Tool (PREsTo) Predicts Outcomes in PSC: A Derivation & Validation Study Using Machine Learning.

    PubMed

    Eaton, John E; Vesterhus, Mette; McCauley, Bryan M; Atkinson, Elizabeth J; Schlicht, Erik M; Juran, Brian D; Gossard, Andrea A; LaRusso, Nicholas F; Gores, Gregory J; Karlsen, Tom H; Lazaridis, Konstantinos N

    2018-05-09

    Improved methods are needed to risk stratify and predict outcomes in patients with primary sclerosing cholangitis (PSC). Therefore, we sought to derive and validate a new prediction model and compare its performance to existing surrogate markers. The model was derived using 509 subjects from a multicenter North American cohort and validated in an international multicenter cohort (n=278). Gradient boosting, a machine based learning technique, was used to create the model. The endpoint was hepatic decompensation (ascites, variceal hemorrhage or encephalopathy). Subjects with advanced PSC or cholangiocarcinoma at baseline were excluded. The PSC risk estimate tool (PREsTo) consists of 9 variables: bilirubin, albumin, serum alkaline phosphatase (SAP) times the upper limit of normal (ULN), platelets, AST, hemoglobin, sodium, patient age and the number of years since PSC was diagnosed. Validation in an independent cohort confirms PREsTo accurately predicts decompensation (C statistic 0.90, 95% confidence interval (CI) 0.84-0.95) and performed well compared to MELD score (C statistic 0.72, 95% CI 0.57-0.84), Mayo PSC risk score (C statistic 0.85, 95% CI 0.77-0.92) and SAP < 1.5x ULN (C statistic 0.65, 95% CI 0.55-0.73). PREsTo continued to be accurate among individuals with a bilirubin < 2.0 mg/dL (C statistic 0.90, 95% CI 0.82-0.96) and when the score was re-applied at a later course in the disease (C statistic 0.82, 95% CI 0.64-0.95). PREsTo accurately predicts hepatic decompensation in PSC and exceeds the performance among other widely available, noninvasive prognostic scoring systems. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.

  13. Forecasting experiments of a dynamical-statistical model of the sea surface temperature anomaly field based on the improved self-memorization principle

    NASA Astrophysics Data System (ADS)

    Hong, Mei; Chen, Xi; Zhang, Ren; Wang, Dong; Shen, Shuanghe; Singh, Vijay P.

    2018-04-01

    With the objective of tackling the problem of inaccurate long-term El Niño-Southern Oscillation (ENSO) forecasts, this paper develops a new dynamical-statistical forecast model of the sea surface temperature anomaly (SSTA) field. To avoid single initial prediction values, a self-memorization principle is introduced to improve the dynamical reconstruction model, thus making the model more appropriate for describing such chaotic systems as ENSO events. The improved dynamical-statistical model of the SSTA field is used to predict SSTA in the equatorial eastern Pacific and during El Niño and La Niña events. The long-term step-by-step forecast results and cross-validated retroactive hindcast results of time series T1 and T2 are found to be satisfactory, with a Pearson correlation coefficient of approximately 0.80 and a mean absolute percentage error (MAPE) of less than 15 %. The corresponding forecast SSTA field is accurate in that not only is the forecast shape similar to the actual field but also the contour lines are essentially the same. This model can also be used to forecast the ENSO index. The temporal correlation coefficient is 0.8062, and the MAPE value of 19.55 % is small. The difference between forecast results in spring and those in autumn is not high, indicating that the improved model can overcome the spring predictability barrier to some extent. Compared with six mature models published previously, the present model has an advantage in prediction precision and length, and is a novel exploration of the ENSO forecast method.

  14. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

    PubMed

    Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

    2017-01-01

    If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

  15. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author: EMC Webmaster Page

  16. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page Author

  17. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Center for Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740

  18. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar for Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park, MD 20740 Page

  19. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar Center NOAA Center for Weather and Climate Prediction (NCWCP) 5830 University Research Court College Park

  20. A Stochastic Framework for Evaluating Seizure Prediction Algorithms Using Hidden Markov Models

    PubMed Central

    Wong, Stephen; Gardner, Andrew B.; Krieger, Abba M.; Litt, Brian

    2007-01-01

    Responsive, implantable stimulation devices to treat epilepsy are now in clinical trials. New evidence suggests that these devices may be more effective when they deliver therapy before seizure onset. Despite years of effort, prospective seizure prediction, which could improve device performance, remains elusive. In large part, this is explained by lack of agreement on a statistical framework for modeling seizure generation and a method for validating algorithm performance. We present a novel stochastic framework based on a three-state hidden Markov model (HMM) (representing interictal, preictal, and seizure states) with the feature that periods of increased seizure probability can transition back to the interictal state. This notion reflects clinical experience and may enhance interpretation of published seizure prediction studies. Our model accommodates clipped EEG segments and formalizes intuitive notions regarding statistical validation. We derive equations for type I and type II errors as a function of the number of seizures, duration of interictal data, and prediction horizon length and we demonstrate the model’s utility with a novel seizure detection algorithm that appeared to predicted seizure onset. We propose this framework as a vital tool for designing and validating prediction algorithms and for facilitating collaborative research in this area. PMID:17021032

  1. In vivo serial MRI-based models and statistical methods to quantify sensitivity and specificity of mechanical predictors for carotid plaque rupture: location and beyond.

    PubMed

    Wu, Zheyang; Yang, Chun; Tang, Dalin

    2011-06-01

    It has been hypothesized that mechanical risk factors may be used to predict future atherosclerotic plaque rupture. Truly predictive methods for plaque rupture and methods to identify the best predictor(s) from all the candidates are lacking in the literature. A novel combination of computational and statistical models based on serial magnetic resonance imaging (MRI) was introduced to quantify sensitivity and specificity of mechanical predictors to identify the best candidate for plaque rupture site prediction. Serial in vivo MRI data of carotid plaque from one patient was acquired with follow-up scan showing ulceration. 3D computational fluid-structure interaction (FSI) models using both baseline and follow-up data were constructed and plaque wall stress (PWS) and strain (PWSn) and flow maximum shear stress (FSS) were extracted from all 600 matched nodal points (100 points per matched slice, baseline matching follow-up) on the lumen surface for analysis. Each of the 600 points was marked "ulcer" or "nonulcer" using follow-up scan. Predictive statistical models for each of the seven combinations of PWS, PWSn, and FSS were trained using the follow-up data and applied to the baseline data to assess their sensitivity and specificity using the 600 data points for ulcer predictions. Sensitivity of prediction is defined as the proportion of the true positive outcomes that are predicted to be positive. Specificity of prediction is defined as the proportion of the true negative outcomes that are correctly predicted to be negative. Using probability 0.3 as a threshold to infer ulcer occurrence at the prediction stage, the combination of PWS and PWSn provided the best predictive accuracy with (sensitivity, specificity) = (0.97, 0.958). Sensitivity and specificity given by PWS, PWSn, and FSS individually were (0.788, 0.968), (0.515, 0.968), and (0.758, 0.928), respectively. The proposed computational-statistical process provides a novel method and a framework to assess the sensitivity and specificity of various risk indicators and offers the potential to identify the optimized predictor for plaque rupture using serial MRI with follow-up scan showing ulceration as the gold standard for method validation. While serial MRI data with actual rupture are hard to acquire, this single-case study suggests that combination of multiple predictors may provide potential improvement to existing plaque assessment schemes. With large-scale patient studies, this predictive modeling process may provide more solid ground for rupture predictor selection strategies and methods for image-based plaque vulnerability assessment.

  2. A statistical model for radar images of agricultural scenes

    NASA Technical Reports Server (NTRS)

    Frost, V. S.; Shanmugan, K. S.; Holtzman, J. C.; Stiles, J. A.

    1982-01-01

    The presently derived and validated statistical model for radar images containing many different homogeneous fields predicts the probability density functions of radar images of entire agricultural scenes, thereby allowing histograms of large scenes composed of a variety of crops to be described. Seasat-A SAR images of agricultural scenes are accurately predicted by the model on the basis of three assumptions: each field has the same SNR, all target classes cover approximately the same area, and the true reflectivity characterizing each individual target class is a uniformly distributed random variable. The model is expected to be useful in the design of data processing algorithms and for scene analysis using radar images.

  3. Examination of multi-model ensemble seasonal prediction methods using a simple climate system

    NASA Astrophysics Data System (ADS)

    Kang, In-Sik; Yoo, Jin Ho

    2006-02-01

    A simple climate model was designed as a proxy for the real climate system, and a number of prediction models were generated by slightly perturbing the physical parameters of the simple model. A set of long (240 years) historical hindcast predictions were performed with various prediction models, which are used to examine various issues of multi-model ensemble seasonal prediction, such as the best ways of blending multi-models and the selection of models. Based on these results, we suggest a feasible way of maximizing the benefit of using multi models in seasonal prediction. In particular, three types of multi-model ensemble prediction systems, i.e., the simple composite, superensemble, and the composite after statistically correcting individual predictions (corrected composite), are examined and compared to each other. The superensemble has more of an overfitting problem than the others, especially for the case of small training samples and/or weak external forcing, and the corrected composite produces the best prediction skill among the multi-model systems.

  4. Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

    PubMed

    Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

    2014-01-01

    Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.

  5. High resolution tempo-spatial ozone prediction with SVM and LSTM

    NASA Astrophysics Data System (ADS)

    Gao, D.; Zhang, Y.; Qu, Z.; Sadighi, K.; Coffey, E.; LIU, Q.; Hannigan, M.; Henze, D. K.; Dick, R.; Shang, L.; Lv, Q.

    2017-12-01

    To investigate and predict the exposure of ozone and other pollutants in urban areas, we utilize data from various infrastructures including EPA, NOAA and RIITS from government of Los Angeles and construct statistical models to conduct ozone concentration prediction in Los Angeles areas at finer spatial and temporal granularity. Our work involves cyber data such as traffic, roads and population data as features for prediction. Two statistical models, Support Vector Machine (SVM) and Long Short-term Memory (LSTM, deep learning method) are used for prediction. . Our experiments show that kernelized SVM gains better prediction performance when taking traffic counts, road density and population density as features, with a prediction RMSE of 7.99 ppb for all-time ozone and 6.92 ppb for peak-value ozone. With simulated NOx from Chemical Transport Model(CTM) as features, SVM generates even better prediction performance, with a prediction RMSE of 6.69ppb. We also build LSTM, which has shown great advantages at dealing with temporal sequences, to predict ozone concentration by treating ozone concentration as spatial-temporal sequences. Trained by ozone concentration measurements from the 13 EPA stations in LA area, the model achieves 4.45 ppb RMSE. Besides, we build a variant of this model which adds spatial dynamics into the model in the form of transition matrix that reveals new knowledge on pollutant transition. The forgetting gate of the trained LSTM is consistent with the delay effect of ozone concentration and the trained transition matrix shows spatial consistency with the common direction of winds in LA area.

  6. ANEMOS: Development of a next generation wind power forecasting system for the large-scale integration of onshore and offshore wind farms.

    NASA Astrophysics Data System (ADS)

    Kariniotakis, G.; Anemos Team

    2003-04-01

    Objectives: Accurate forecasting of the wind energy production up to two days ahead is recognized as a major contribution for reliable large-scale wind power integration. Especially, in a liberalized electricity market, prediction tools enhance the position of wind energy compared to other forms of dispatchable generation. ANEMOS, is a new 3.5 years R&D project supported by the European Commission, that resembles research organizations and end-users with an important experience on the domain. The project aims to develop advanced forecasting models that will substantially outperform current methods. Emphasis is given to situations like complex terrain, extreme weather conditions, as well as to offshore prediction for which no specific tools currently exist. The prediction models will be implemented in a software platform and installed for online operation at onshore and offshore wind farms by the end-users participating in the project. Approach: The paper presents the methodology of the project. Initially, the prediction requirements are identified according to the profiles of the end-users. The project develops prediction models based on both a physical and an alternative statistical approach. Research on physical models gives emphasis to techniques for use in complex terrain and the development of prediction tools based on CFD techniques, advanced model output statistics or high-resolution meteorological information. Statistical models (i.e. based on artificial intelligence) are developed for downscaling, power curve representation, upscaling for prediction at regional or national level, etc. A benchmarking process is set-up to evaluate the performance of the developed models and to compare them with existing ones using a number of case studies. The synergy between statistical and physical approaches is examined to identify promising areas for further improvement of forecasting accuracy. Appropriate physical and statistical prediction models are also developed for offshore wind farms taking into account advances in marine meteorology (interaction between wind and waves, coastal effects). The benefits from the use of satellite radar images for modeling local weather patterns are investigated. A next generation forecasting software, ANEMOS, will be developed to integrate the various models. The tool is enhanced by advanced Information Communication Technology (ICT) functionality and can operate both in stand alone, or remote mode, or be interfaced with standard Energy or Distribution Management Systems (EMS/DMS) systems. Contribution: The project provides an advanced technology for wind resource forecasting applicable in a large scale: at a single wind farm, regional or national level and for both interconnected and island systems. A major milestone is the on-line operation of the developed software by the participating utilities for onshore and offshore wind farms and the demonstration of the economic benefits. The outcome of the ANEMOS project will help consistently the increase of wind integration in two levels; in an operational level due to better management of wind farms, but also, it will contribute to increasing the installed capacity of wind farms. This is because accurate prediction of the resource reduces the risk of wind farm developers, who are then more willing to undertake new wind farm installations especially in a liberalized electricity market environment.

  7. Statistically Based Approach to Broadband Liner Design and Assessment

    NASA Technical Reports Server (NTRS)

    Jones, Michael G. (Inventor); Nark, Douglas M. (Inventor)

    2016-01-01

    A broadband liner design optimization includes utilizing in-duct attenuation predictions with a statistical fan source model to obtain optimum impedance spectra over a number of flow conditions for one or more liner locations in a bypass duct. The predicted optimum impedance information is then used with acoustic liner modeling tools to design liners having impedance spectra that most closely match the predicted optimum values. Design selection is based on an acceptance criterion that provides the ability to apply increasing weighting to specific frequencies and/or operating conditions. One or more broadband design approaches are utilized to produce a broadband liner that targets a full range of frequencies and operating conditions.

  8. Statistical mapping of count survey data

    USGS Publications Warehouse

    Royle, J. Andrew; Link, W.A.; Sauer, J.R.; Scott, J. Michael; Heglund, Patricia J.; Morrison, Michael L.; Haufler, Jonathan B.; Wall, William A.

    2002-01-01

    We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to habitat structure. Although our models were fit in the absence of habitat information, the resulting predictions show a strong inverse relation with a map of forest cover in the state, as expected. Consequently, the results suggest that the correlated random effect in the model is broadly representing ecological variation, and that BBS data may be generally useful for studying bird-habitat relationships, even in the presence of observer errors and other widely recognized deficiencies of the BBS.

  9. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less

  10. Predictors of outcome after elective endovascular abdominal aortic aneurysm repair and external validation of a risk prediction model.

    PubMed

    Wisniowski, Brendan; Barnes, Mary; Jenkins, Jason; Boyne, Nicholas; Kruger, Allan; Walker, Philip J

    2011-09-01

    Endovascular abdominal aortic aneurysm (AAA) repair (EVAR) has been associated with lower operative mortality and morbidity than open surgery but comparable long-term mortality and higher delayed complication and reintervention rates. Attention has therefore been directed to identifying preoperative and operative variables that influence outcomes after EVAR. Risk-prediction models, such as the EVAR Risk Assessment (ERA) model, have also been developed to help surgeons plan EVAR procedures. The aims of this study were (1) to describe outcomes of elective EVAR at the Royal Brisbane and Women's Hospital (RBWH), (2) to identify preoperative and operative variables predictive of outcomes after EVAR, and (3) to externally validate the ERA model. All elective EVAR procedures at the RBWH before July 1, 2009, were reviewed. Descriptive analyses were performed to determine the outcomes. Univariate and multivariate analyses were performed to identify preoperative and operative variables predictive of outcomes after EVAR. Binomial logistic regression analyses were used to externally validate the ERA model. Before July 1, 2009, 197 patients (172 men), who were a mean age of 72.8 years, underwent elective EVAR at the RBWH. Operative mortality was 1.0%. Survival was 81.1% at 3 years and 63.2% at 5 years. Multivariate analysis showed predictors of survival were age (P = .0126), American Society of Anesthesiologists (ASA) score (P = .0180), and chronic obstructive pulmonary disease (P = .0348) at 3 years and age (P = .0103), ASA score (P = .0006), renal failure (P = .0048), and serum creatinine (P = .0022) at 5 years. Aortic branch vessel score was predictive of initial (30-day) type II endoleak (P = .0015). AAA tortuosity was predictive of midterm type I endoleak (P = .0251). Female sex was associated with lower rates of initial clinical success (P = .0406). The ERA model fitted RBWH data well for early death (C statistic = .906), 3-year survival (C statistic = .735), 5-year survival (C statistic = .800), and initial type I endoleak (C statistic = .850). The outcomes of elective EVAR at the RBWH are broadly consistent with those of a nationwide Australian audit and recent randomized trials. Age and ASA score are independent predictors of midterm survival after elective EVAR. The ERA model predicts mortality-related outcomes and initial type I endoleak well for RBWH elective EVAR patients. Copyright © 2011 Society for Vascular Surgery. All rights reserved.

  11. A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

    NASA Astrophysics Data System (ADS)

    Zack, J. W.

    2015-12-01

    Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.

  12. Calibrating MODIS aerosol optical depth for predicting daily PM2.5 concentrations via statistical downscaling

    PubMed Central

    Chang, Howard H.; Hu, Xuefei; Liu, Yang

    2014-01-01

    There has been a growing interest in the use of satellite-retrieved aerosol optical depth (AOD) to estimate ambient concentrations of PM2.5 (particulate matter <2.5 μm in aerodynamic diameter). With their broad spatial coverage, satellite data can increase the spatial–temporal availability of air quality data beyond ground monitoring measurements and potentially improve exposure assessment for population-based health studies. This paper describes a statistical downscaling approach that brings together (1) recent advances in PM2.5 land use regression models utilizing AOD and (2) statistical data fusion techniques for combining air quality data sets that have different spatial resolutions. Statistical downscaling assumes the associations between AOD and PM2.5 concentrations to be spatially and temporally dependent and offers two key advantages. First, it enables us to use gridded AOD data to predict PM2.5 concentrations at spatial point locations. Second, the unified hierarchical framework provides straightforward uncertainty quantification in the predicted PM2.5 concentrations. The proposed methodology is applied to a data set of daily AOD values in southeastern United States during the period 2003–2005. Via cross-validation experiments, our model had an out-of-sample prediction R2 of 0.78 and a root mean-squared error (RMSE) of 3.61 μg/m3 between observed and predicted daily PM2.5 concentrations. This corresponds to a 10% decrease in RMSE compared with the same land use regression model without AOD as a predictor. Prediction performances of spatial–temporal interpolations to locations and on days without monitoring PM2.5 measurements were also examined. PMID:24368510

  13. Calibrating MODIS aerosol optical depth for predicting daily PM2.5 concentrations via statistical downscaling.

    PubMed

    Chang, Howard H; Hu, Xuefei; Liu, Yang

    2014-07-01

    There has been a growing interest in the use of satellite-retrieved aerosol optical depth (AOD) to estimate ambient concentrations of PM2.5 (particulate matter <2.5 μm in aerodynamic diameter). With their broad spatial coverage, satellite data can increase the spatial-temporal availability of air quality data beyond ground monitoring measurements and potentially improve exposure assessment for population-based health studies. This paper describes a statistical downscaling approach that brings together (1) recent advances in PM2.5 land use regression models utilizing AOD and (2) statistical data fusion techniques for combining air quality data sets that have different spatial resolutions. Statistical downscaling assumes the associations between AOD and PM2.5 concentrations to be spatially and temporally dependent and offers two key advantages. First, it enables us to use gridded AOD data to predict PM2.5 concentrations at spatial point locations. Second, the unified hierarchical framework provides straightforward uncertainty quantification in the predicted PM2.5 concentrations. The proposed methodology is applied to a data set of daily AOD values in southeastern United States during the period 2003-2005. Via cross-validation experiments, our model had an out-of-sample prediction R(2) of 0.78 and a root mean-squared error (RMSE) of 3.61 μg/m(3) between observed and predicted daily PM2.5 concentrations. This corresponds to a 10% decrease in RMSE compared with the same land use regression model without AOD as a predictor. Prediction performances of spatial-temporal interpolations to locations and on days without monitoring PM2.5 measurements were also examined.

  14. An injury mortality prediction based on the anatomic injury scale

    PubMed Central

    Wang, Muding; Wu, Dan; Qiu, Wusi; Wang, Weimi; Zeng, Yunji; Shen, Yi

    2017-01-01

    Abstract To determine whether the injury mortality prediction (IMP) statistically outperforms the trauma mortality prediction model (TMPM) as a predictor of mortality. The TMPM is currently the best trauma score method, which is based on the anatomic injury. Its ability of mortality prediction is superior to the injury severity score (ISS) and to the new injury severity score (NISS). However, despite its statistical significance, the predictive power of TMPM needs to be further improved. Retrospective cohort study is based on the data of 1,148,359 injured patients in the National Trauma Data Bank hospitalized from 2010 to 2011. Sixty percent of the data was used to derive an empiric measure of severity of different Abbreviated Injury Scale predot codes by taking the weighted average death probabilities of trauma patients. Twenty percent of the data was used to create computing method of the IMP model. The remaining 20% of the data was used to evaluate the statistical performance of IMP and then be compared with the TMPM and the single worst injury by examining area under the receiver operating characteristic curve (ROC), the Hosmer–Lemeshow (HL) statistic, and the Akaike information criterion. IMP exhibits significantly both better discrimination (ROC-IMP, 0.903 [0.899–0.907] and ROC-TMPM, 0.890 [0.886–0.895]) and calibration (HL-IMP, 9.9 [4.4–14.7] and HL-TMPM, 197 [143–248]) compared with TMPM. All models show slight changes after the extension of age, gender, and mechanism of injury, but the extended IMP still dominated TMPM in every performance. The IMP has slight improvement in discrimination and calibration compared with the TMPM and can accurately predict mortality. Therefore, we consider it as a new feasible scoring method in trauma research. PMID:28858124

  15. Studying Individual Differences in Predictability with Gamma Regression and Nonlinear Multilevel Models

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2010-01-01

    Statistical prediction remains an important tool for decisions in a variety of disciplines. An equally important issue is identifying factors that contribute to more or less accurate predictions. The time series literature includes well developed methods for studying predictability and volatility over time. This article develops…

  16. Comparing Data Input Requirements of Statistical vs. Process-based Watershed Models Applied for Prediction of Fecal Indicator and Pathogen Levels in Recreational Beaches

    EPA Science Inventory

    Same day prediction of fecal indicator bacteria (FIB) concentrations and bather protection from the risk of exposure to pathogens are two important goals of implementing a modeling program at recreational beaches. Sampling efforts for modelling applications can be expensive and t...

  17. LES/PDF studies of joint statistics of mixture fraction and progress variable in piloted methane jet flames with inhomogeneous inlet flows

    NASA Astrophysics Data System (ADS)

    Zhang, Pei; Barlow, Robert; Masri, Assaad; Wang, Haifeng

    2016-11-01

    The mixture fraction and progress variable are often used as independent variables for describing turbulent premixed and non-premixed flames. There is a growing interest in using these two variables for describing partially premixed flames. The joint statistical distribution of the mixture fraction and progress variable is of great interest in developing models for partially premixed flames. In this work, we conduct predictive studies of the joint statistics of mixture fraction and progress variable in a series of piloted methane jet flames with inhomogeneous inlet flows. The employed models combine large eddy simulations with the Monte Carlo probability density function (PDF) method. The joint PDFs and marginal PDFs are examined in detail by comparing the model predictions and the measurements. Different presumed shapes of the joint PDFs are also evaluated.

  18. Random glucose is useful for individual prediction of type 2 diabetes: results of the Study of Health in Pomerania (SHIP).

    PubMed

    Kowall, Bernd; Rathmann, Wolfgang; Giani, Guido; Schipf, Sabine; Baumeister, Sebastian; Wallaschofski, Henri; Nauck, Matthias; Völzke, Henry

    2013-04-01

    Random glucose is widely used in routine clinical practice. We investigated whether this non-standardized glycemic measure is useful for individual diabetes prediction. The Study of Health in Pomerania (SHIP), a population-based cohort study in north-east Germany, included 3107 diabetes-free persons aged 31-81 years at baseline in 1997-2001. 2475 persons participated at 5-year follow-up and gave self-reports of incident diabetes. For the total sample and for subjects aged ≥50 years, statistical properties of prediction models with and without random glucose were compared. A basic model (including age, sex, diabetes of parents, hypertension and waist circumference) and a comprehensive model (additionally including various lifestyle variables and blood parameters, but not HbA1c) performed statistically significantly better after adding random glucose (e.g., the area under the receiver-operating curve (AROC) increased from 0.824 to 0.856 after adding random glucose to the comprehensive model in the total sample). Likewise, adding random glucose to prediction models which included HbA1c led to significant improvements of predictive ability (e.g., for subjects ≥50 years, AROC increased from 0.824 to 0.849 after adding random glucose to the comprehensive model+HbA1c). Random glucose is useful for individual diabetes prediction, and improves prediction models including HbA1c. Copyright © 2012 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.

  19. Refining calibration and predictions of a Bayesian statistical-dynamical model for long term avalanche forecasting using dendrochronological reconstructions

    NASA Astrophysics Data System (ADS)

    Eckert, Nicolas; Schläppy, Romain; Jomelli, Vincent; Naaim, Mohamed

    2013-04-01

    A crucial step for proposing relevant long-term mitigation measures in long term avalanche forecasting is the accurate definition of high return period avalanches. Recently, "statistical-dynamical" approach combining a numerical model with stochastic operators describing the variability of its inputs-outputs have emerged. Their main interests is to take into account the topographic dependency of snow avalanche runout distances, and to constrain the correlation structure between model's variables by physical rules, so as to simulate the different marginal distributions of interest (pressure, flow depth, etc.) with a reasonable realism. Bayesian methods have been shown to be well adapted to achieve model inference, getting rid of identifiability problems thanks to prior information. An important problem which has virtually never been considered before is the validation of the predictions resulting from a statistical-dynamical approach (or from any other engineering method for computing extreme avalanches). In hydrology, independent "fossil" data such as flood deposits in caves are sometimes confronted to design discharges corresponding to high return periods. Hence, the aim of this work is to implement a similar comparison between high return period avalanches obtained with a statistical-dynamical approach and independent validation data resulting from careful dendrogeomorphological reconstructions. To do so, an up-to-date statistical model based on the depth-averaged equations and the classical Voellmy friction law is used on a well-documented case study. First, parameter values resulting from another path are applied, and the dendrological validation sample shows that this approach fails in providing realistic prediction for the case study. This may be due to the strongly bounded behaviour of runouts in this case (the extreme of their distribution is identified as belonging to the Weibull attraction domain). Second, local calibration on the available avalanche chronicle is performed with various prior distributions resulting from expert knowledge and/or other paths. For all calibrations, a very successful convergence is obtained, which confirms the robustness of the used Metropolis-Hastings estimation algorithm. This also demonstrates the interest of the Bayesian framework for aggregating information by sequential assimilation in the frequently encountered case of limited data quantity. Confrontation with the dendrological sample stresses the predominant role of the Coulombian friction coefficient distribution's variance on predicted high magnitude runouts. The optimal fit is obtained for a strong prior reflecting the local bounded behavior, and results in a 10-40 m difference for return periods ranging between 10 and 300 years. Implementing predictive simulations shows that this is largely within the range of magnitude of uncertainties to be taken into account. On the other hand, the different priors tested for the turbulent friction coefficient influence predictive performances only slightly, but have a large influence on predicted velocity and flow depth distributions. This all may be of high interest to refine calibration and predictive use of the statistical-dynamical model for any engineering application.

  20. Unified risk analysis of fatigue failure in ductile alloy components during all three stages of fatigue crack evolution process.

    PubMed

    Patankar, Ravindra

    2003-10-01

    Statistical fatigue life of a ductile alloy specimen is traditionally divided into three stages, namely, crack nucleation, small crack growth, and large crack growth. Crack nucleation and small crack growth show a wide variation and hence a big spread on cycles versus crack length graph. Relatively, large crack growth shows a lesser variation. Therefore, different models are fitted to the different stages of the fatigue evolution process, thus treating different stages as different phenomena. With these independent models, it is impossible to predict one phenomenon based on the information available about the other phenomenon. Experimentally, it is easier to carry out crack length measurements of large cracks compared to nucleating cracks and small cracks. Thus, it is easier to collect statistical data for large crack growth compared to the painstaking effort it would take to collect statistical data for crack nucleation and small crack growth. This article presents a fracture mechanics-based stochastic model of fatigue crack growth in ductile alloys that are commonly encountered in mechanical structures and machine components. The model has been validated by Ray (1998) for crack propagation by various statistical fatigue data. Based on the model, this article proposes a technique to predict statistical information of fatigue crack nucleation and small crack growth properties that uses the statistical properties of large crack growth under constant amplitude stress excitation. The statistical properties of large crack growth under constant amplitude stress excitation can be obtained via experiments.

  1. Dynamic Modeling and Very Short-term Prediction of Wind Power Output Using Box-Cox Transformation

    NASA Astrophysics Data System (ADS)

    Urata, Kengo; Inoue, Masaki; Murayama, Dai; Adachi, Shuichi

    2016-09-01

    We propose a statistical modeling method of wind power output for very short-term prediction. The modeling method with a nonlinear model has cascade structure composed of two parts. One is a linear dynamic part that is driven by a Gaussian white noise and described by an autoregressive model. The other is a nonlinear static part that is driven by the output of the linear part. This nonlinear part is designed for output distribution matching: we shape the distribution of the model output to match with that of the wind power output. The constructed model is utilized for one-step ahead prediction of the wind power output. Furthermore, we study the relation between the prediction accuracy and the prediction horizon.

  2. A QSAR study of integrase strand transfer inhibitors based on a large set of pyrimidine, pyrimidone, and pyridopyrazine carboxamide derivatives

    NASA Astrophysics Data System (ADS)

    de Campos, Luana Janaína; de Melo, Eduardo Borges

    2017-08-01

    In the present study, 199 compounds derived from pyrimidine, pyrimidone and pyridopyrazine carboxamides with inhibitory activity against HIV-1 integrase were modeled. Subsequently, a multivariate QSAR study was conducted with 54 molecules employed by Ordered Predictors Selection (OPS) and Partial Least Squares (PLS) for the selection of variables and model construction, respectively. Topological, electrotopological, geometric, and molecular descriptors were used. The selected real model was robust and free from chance correlation; in addition, it demonstrated favorable internal and external statistical quality. Once statistically validated, the training model was used to predict the activity of a second data set (n = 145). The root mean square deviation (RMSD) between observed and predicted values was 0.698. Although it is a value outside of the standards, only 15 (10.34%) of the samples exhibited higher residual values than 1 log unit, a result considered acceptable. Results of Williams and Euclidean applicability domains relative to the prediction showed that the predictions did not occur by extrapolation and that the model is representative of the chemical space of test compounds.

  3. Highway runoff quality models for the protection of environmentally sensitive areas

    NASA Astrophysics Data System (ADS)

    Trenouth, William R.; Gharabaghi, Bahram

    2016-11-01

    This paper presents novel highway runoff quality models using artificial neural networks (ANN) which take into account site-specific highway traffic and seasonal storm event meteorological factors to predict the event mean concentration (EMC) statistics and mean daily unit area load (MDUAL) statistics of common highway pollutants for the design of roadside ditch treatment systems (RDTS) to protect sensitive receiving environs. A dataset of 940 monitored highway runoff events from fourteen sites located in five countries (Canada, USA, Australia, New Zealand, and China) was compiled and used to develop ANN models for the prediction of highway runoff suspended solids (TSS) seasonal EMC statistical distribution parameters, as well as the MDUAL statistics for four different heavy metal species (Cu, Zn, Cr and Pb). TSS EMCs are needed to estimate the minimum required removal efficiency of the RDTS needed in order to improve highway runoff quality to meet applicable standards and MDUALs are needed to calculate the minimum required capacity of the RDTS to ensure performance longevity.

  4. COMPARING MID-INFRARED GLOBULAR CLUSTER COLORS WITH POPULATION SYNTHESIS MODELS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barmby, P.; Jalilian, F. F.

    2012-04-15

    Several population synthesis models now predict integrated colors of simple stellar populations in the mid-infrared bands. To date, the models have not been extensively tested in this wavelength range. In a comparison of the predictions of several recent population synthesis models, the integrated colors are found to cover approximately the same range but to disagree in detail, for example, on the effects of metallicity. To test against observational data, globular clusters (GCs) are used as the closest objects to idealized groups of stars with a single age and single metallicity. Using recent mass estimates, we have compiled a sample ofmore » massive, old GCs in M31 which contain enough stars to guard against the stochastic effects of small-number statistics, and measured their integrated colors in the Spitzer/IRAC bands. Comparison of the cluster photometry in the IRAC bands with the model predictions shows that the models reproduce the cluster colors reasonably well, except for a small (not statistically significant) offset in [4.5] - [5.8]. In this color, models without circumstellar dust emission predict bluer values than are observed. Model predictions of colors formed from the V band and the IRAC 3.6 and 4.5 {mu}m bands are redder than the observed data at high metallicities and we discuss several possible explanations. In agreement with model predictions, V - [3.6] and V - [4.5] colors are found to have metallicity sensitivity similar to or slightly better than V - K{sub s}.« less

  5. Reverberant acoustic energy in auditoria that comprise systems of coupled rooms

    NASA Astrophysics Data System (ADS)

    Summers, Jason E.

    2003-11-01

    A frequency-dependent model for reverberant energy in coupled rooms is developed and compared with measurements for a 1:10 scale model and for Bass Hall, Ft. Worth, TX. At high frequencies, prior statistical-acoustics models are improved by geometrical-acoustics corrections for decay within sub-rooms and for energy transfer between sub-rooms. Comparisons of computational geometrical acoustics predictions based on beam-axis tracing with scale model measurements indicate errors resulting from tail-correction assuming constant quadratic growth of reflection density. Using ray tracing in the late part corrects this error. For mid-frequencies, the models are modified to account for wave effects at coupling apertures by including power transmission coefficients. Similarly, statical-acoustics models are improved through more accurate estimates of power transmission measurements. Scale model measurements are in accord with the predicted behavior. The edge-diffraction model is adapted to study transmission through apertures. Multiple-order scattering is theoretically and experimentally shown inaccurate due to neglect of slope diffraction. At low frequencies, perturbation models qualitatively explain scale model measurements. Measurements confirm relation of coupling strength to unperturbed pressure distribution on coupling surfaces. Measurements in Bass Hall exhibit effects of the coupled stage house. High frequency predictions of statistical acoustics and geometrical acoustics models and predictions of coupling apertures all agree with measurements.

  6. Operational planning using Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

    NASA Astrophysics Data System (ADS)

    O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe

    2016-05-01

    The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.

  7. Multi-fidelity machine learning models for accurate bandgap predictions of solids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab

    Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less

  8. Multi-fidelity machine learning models for accurate bandgap predictions of solids

    DOE PAGES

    Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab

    2016-12-28

    Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less

  9. Model variations in predicting incidence of Plasmodium falciparum malaria using 1998-2007 morbidity and meteorological data from south Ethiopia.

    PubMed

    Loha, Eskindir; Lindtjørn, Bernt

    2010-06-16

    Malaria transmission is complex and is believed to be associated with local climate changes. However, simple attempts to extrapolate malaria incidence rates from averaged regional meteorological conditions have proven unsuccessful. Therefore, the objective of this study was to determine if variations in specific meteorological factors are able to consistently predict P. falciparum malaria incidence at different locations in south Ethiopia. Retrospective data from 42 locations were collected including P. falciparum malaria incidence for the period of 1998-2007 and meteorological variables such as monthly rainfall (all locations), temperature (17 locations), and relative humidity (three locations). Thirty-five data sets qualified for the analysis. Ljung-Box Q statistics was used for model diagnosis, and R squared or stationary R squared was taken as goodness of fit measure. Time series modelling was carried out using Transfer Function (TF) models and univariate auto-regressive integrated moving average (ARIMA) when there was no significant predictor meteorological variable. Of 35 models, five were discarded because of the significant value of Ljung-Box Q statistics. Past P. falciparum malaria incidence alone (17 locations) or when coupled with meteorological variables (four locations) was able to predict P. falciparum malaria incidence within statistical significance. All seasonal AIRMA orders were from locations at altitudes above 1742 m. Monthly rainfall, minimum and maximum temperature was able to predict incidence at four, five and two locations, respectively. In contrast, relative humidity was not able to predict P. falciparum malaria incidence. The R squared values for the models ranged from 16% to 97%, with the exception of one model which had a negative value. Models with seasonal ARIMA orders were found to perform better. However, the models for predicting P. falciparum malaria incidence varied from location to location, and among lagged effects, data transformation forms, ARIMA and TF orders. This study describes P. falciparum malaria incidence models linked with meteorological data. Variability in the models was principally attributed to regional differences, and a single model was not found that fits all locations. Past P. falciparum malaria incidence appeared to be a superior predictor than meteorology. Future efforts in malaria modelling may benefit from inclusion of non-meteorological factors.

  10. The evaluation of different forest structural indices to predict the stand aboveground biomass of even-aged Scotch pine (Pinus sylvestris L.) forests in Kunduz, Northern Turkey.

    PubMed

    Ercanli, İlker; Kahriman, Aydın

    2015-03-01

    We assessed the effect of stand structural diversity, including the Shannon, improved Shannon, Simpson, McIntosh, Margelef, and Berger-Parker indices, on stand aboveground biomass (AGB) and developed statistical prediction models for the stand AGB values, including stand structural diversity indices and some stand attributes. The AGB prediction model, including only stand attributes, accounted for 85 % of the total variance in AGB (R (2)) with an Akaike's information criterion (AIC) of 807.2407, Bayesian information criterion (BIC) of 809.5397, Schwarz Bayesian criterion (SBC) of 818.0426, and root mean square error (RMSE) of 38.529 Mg. After inclusion of the stand structural diversity into the model structure, considerable improvement was observed in statistical accuracy, including 97.5 % of the total variance in AGB, with an AIC of 614.1819, BIC of 617.1242, SBC of 633.0853, and RMSE of 15.8153 Mg. The predictive fitting results indicate that some indices describing the stand structural diversity can be employed as significant independent variables to predict the AGB production of the Scotch pine stand. Further, including the stand diversity indices in the AGB prediction model with the stand attributes provided important predictive contributions in estimating the total variance in AGB.

  11. Statistical Properties of Differences between Low and High Resolution CMAQ Runs with Matched Initial and Boundary Conditions

    EPA Science Inventory

    The difficulty in assessing errors in numerical models of air quality is a major obstacle to improving their ability to predict and retrospectively map air quality. In this paper, using simulation outputs from the Community Multi-scale Air Quality Model (CMAQ), the statistic...

  12. Can upstaging of ductal carcinoma in situ be predicted at biopsy by histologic and mammographic features?

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2017-03-01

    Reducing the overdiagnosis and overtreatment associated with ductal carcinoma in situ (DCIS) requires accurate prediction of the invasive potential at cancer screening. In this work, we investigated the utility of pre-operative histologic and mammographic features to predict upstaging of DCIS. The goal was to provide intentionally conservative baseline performance using readily available data from radiologists and pathologists and only linear models. We conducted a retrospective analysis on 99 patients with DCIS. Of those 25 were upstaged to invasive cancer at the time of definitive surgery. Pre-operative factors including both the histologic features extracted from stereotactic core needle biopsy (SCNB) reports and the mammographic features annotated by an expert breast radiologist were investigated with statistical analysis. Furthermore, we built classification models based on those features in an attempt to predict the presence of an occult invasive component in DCIS, with generalization performance assessed by receiver operating characteristic (ROC) curve analysis. Histologic features including nuclear grade and DCIS subtype did not show statistically significant differences between cases with pure DCIS and with DCIS plus invasive disease. However, three mammographic features, i.e., the major axis length of DCIS lesion, the BI-RADS level of suspicion, and radiologist's assessment did achieve the statistical significance. Using those three statistically significant features as input, a linear discriminant model was able to distinguish patients with DCIS plus invasive disease from those with pure DCIS, with AUC-ROC equal to 0.62. Overall, mammograms used for breast screening contain useful information that can be perceived by radiologists and help predict occult invasive components in DCIS.

  13. Mixture EMOS model for calibrating ensemble forecasts of wind speed.

    PubMed

    Baran, S; Lerch, S

    2016-03-01

    Ensemble model output statistics (EMOS) is a statistical tool for post-processing forecast ensembles of weather variables obtained from multiple runs of numerical weather prediction models in order to produce calibrated predictive probability density functions. The EMOS predictive probability density function is given by a parametric distribution with parameters depending on the ensemble forecasts. We propose an EMOS model for calibrating wind speed forecasts based on weighted mixtures of truncated normal (TN) and log-normal (LN) distributions where model parameters and component weights are estimated by optimizing the values of proper scoring rules over a rolling training period. The new model is tested on wind speed forecasts of the 50 member European Centre for Medium-range Weather Forecasts ensemble, the 11 member Aire Limitée Adaptation dynamique Développement International-Hungary Ensemble Prediction System ensemble of the Hungarian Meteorological Service, and the eight-member University of Washington mesoscale ensemble, and its predictive performance is compared with that of various benchmark EMOS models based on single parametric families and combinations thereof. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison with the raw ensemble and climatological forecasts. The mixture EMOS model significantly outperforms the TN and LN EMOS methods; moreover, it provides better calibrated forecasts than the TN-LN combination model and offers an increased flexibility while avoiding covariate selection problems. © 2016 The Authors Environmetrics Published by JohnWiley & Sons Ltd.

  14. Estimation of Aerosol Optical Depth at Different Wavelengths by Multiple Regression Method

    NASA Technical Reports Server (NTRS)

    Tan, Fuyi; Lim, Hwee San; Abdullah, Khiruddin; Holben, Brent

    2015-01-01

    This study aims to investigate and establish a suitable model that can help to estimate aerosol optical depth (AOD) in order to monitor aerosol variations especially during non-retrieval time. The relationship between actual ground measurements (such as air pollution index, visibility, relative humidity, temperature, and pressure) and AOD obtained with a CIMEL sun photometer was determined through a series of statistical procedures to produce an AOD prediction model with reasonable accuracy. The AOD prediction model calibrated for each wavelength has a set of coefficients. The model was validated using a set of statistical tests. The validated model was then employed to calculate AOD at different wavelengths. The results show that the proposed model successfully predicted AOD at each studied wavelength ranging from 340 nm to 1020 nm. To illustrate the application of the model, the aerosol size determined using measure AOD data for Penang was compared with that determined using the model. This was done by examining the curvature in the ln [AOD]-ln [wavelength] plot. Consistency was obtained when it was concluded that Penang was dominated by fine mode aerosol in 2012 and 2013 using both measured and predicted AOD data. These results indicate that the proposed AOD prediction model using routine measurements as input is a promising tool for the regular monitoring of aerosol variation during non-retrieval time.

  15. Study on elevated-temperature flow behavior of Ni-Cr-Mo-B ultra-heavy-plate steel via experiment and modelling

    NASA Astrophysics Data System (ADS)

    Gao, Zhi-yu; Kang, Yu; Li, Yan-shuai; Meng, Chao; Pan, Tao

    2018-04-01

    Elevated-temperature flow behavior of a novel Ni-Cr-Mo-B ultra-heavy-plate steel was investigated by conducting hot compressive deformation tests on a Gleeble-3800 thermo-mechanical simulator at a temperature range of 1123 K–1423 K with a strain rate range from 0.01 s‑1 to10 s‑1 and a height reduction of 70%. Based on the experimental results, classic strain-compensated Arrhenius-type, a new revised strain-compensated Arrhenius-type and classic modified Johnson-Cook constitutive models were developed for predicting the high-temperature deformation behavior of the steel. The predictability of these models were comparatively evaluated in terms of statistical parameters including correlation coefficient (R), average absolute relative error (AARE), average root mean square error (RMSE), normalized mean bias error (NMBE) and relative error. The statistical results indicate that the new revised strain-compensated Arrhenius-type model could give prediction of elevated-temperature flow stress for the steel accurately under the entire process conditions. However, the predicted values by the classic modified Johnson-Cook model could not agree well with the experimental values, and the classic strain-compensated Arrhenius-type model could track the deformation behavior more accurately compared with the modified Johnson-Cook model, but less accurately with the new revised strain-compensated Arrhenius-type model. In addition, reasons of differences in predictability of these models were discussed in detail.

  16. Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

    PubMed Central

    Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang

    2014-01-01

    Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553

  17. Factorial analysis of trihalomethanes formation in drinking water.

    PubMed

    Chowdhury, Shakhawat; Champagne, Pascale; McLellan, P James

    2010-06-01

    Disinfection of drinking water reduces pathogenic infection, but may pose risks to human health through the formation of disinfection byproducts. The effects of different factors on the formation of trihalomethanes were investigated using a statistically designed experimental program, and a predictive model for trihalomethanes formation was developed. Synthetic water samples with different factor levels were produced, and trihalomethanes concentrations were measured. A replicated fractional factorial design with center points was performed, and significant factors were identified through statistical analysis. A second-order trihalomethanes formation model was developed from 92 experiments, and the statistical adequacy was assessed through appropriate diagnostics. This model was validated using additional data from the Drinking Water Surveillance Program database and was applied to the Smiths Falls water supply system in Ontario, Canada. The model predictions were correlated strongly to the measured trihalomethanes, with correlations of 0.95 and 0.91, respectively. The resulting model can assist in analyzing risk-cost tradeoffs in the design and operation of water supply systems.

  18. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  19. Assessing the sensitivity and robustness of prediction models for apple firmness using spectral scattering technique

    USDA-ARS?s Scientific Manuscript database

    Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...

  20. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar projects. Starting a Monsoon Mission experiment or research project? Let us know so we can add it to our Modeling Center NOAA Center for Weather and Climate Prediction (NCWCP) 5830 University Research Court

  1. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  2. National Centers for Environmental Prediction

    Science.gov Websites

    Statistics Observational Data Processing Data Assimilation Monsoon Desk Model Transition Seminars Seminar conducts a program of research and development in support of the National Centers for Environmental Prediction (NCEP) operational forecasting mission for global prediction. This research and development in

  3. Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

    PubMed

    Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

    2010-11-01

    The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.

  4. September Arctic Sea Ice minimum prediction - a new skillful statistical approach

    NASA Astrophysics Data System (ADS)

    Ionita-Scholz, Monica; Grosfeld, Klaus; Scholz, Patrick; Treffeisen, Renate; Lohmann, Gerrit

    2017-04-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability is complex and it depends on various climate and oceanic parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal of sea ice evolution, we developed a robust statistical model based on ocean heat content, sea surface temperature and different atmospheric variables to calculate an estimate of the September Sea ice extent (SSIE) on monthly time scale. Although previous statistical attempts at monthly/seasonal forecasts of SSIE show a relatively reduced skill, we show here that more than 92% (r = 0.96) of the September sea ice extent can be predicted at the end of May by using previous months' climate and oceanic conditions. The skill of the model increases with a decrease in the time lag used for the forecast. At the end of August, our predictions are even able to explain 99% of the SSIE. Our statistical model captures both the general trend as well as the interannual variability of the SSIE. Moreover, it is able to properly forecast the years with extreme high/low SSIE (e.g. 1996/ 2007, 2012, 2013). Besides its forecast skill for SSIE, the model could provide a valuable tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  5. Modeling and forecasting the distribution of Vibrio vulnificus in Chesapeake Bay

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacobs, John M.; Rhodes, M.; Brown, C. W.

    The aim is to construct statistical models to predict the presence, abundance and potential virulence of Vibrio vulnificus in surface waters. A variety of statistical techniques were used in concert to identify water quality parameters associated with V. vulnificus presence, abundance and virulence markers in the interest of developing strong predictive models for use in regional oceanographic modeling systems. A suite of models are provided to represent the best model fit and alternatives using environmental variables that allow them to be put to immediate use in current ecological forecasting efforts. Conclusions: Environmental parameters such as temperature, salinity and turbidity aremore » capable of accurately predicting abundance and distribution of V. vulnificus in Chesapeake Bay. Forcing these empirical models with output from ocean modeling systems allows for spatially explicit forecasts for up to 48 h in the future. This study uses one of the largest data sets compiled to model Vibrio in an estuary, enhances our understanding of environmental correlates with abundance, distribution and presence of potentially virulent strains and offers a method to forecast these pathogens that may be replicated in other regions.« less

  6. Bayesian statistics in medicine: a 25 year review.

    PubMed

    Ashby, Deborah

    2006-11-15

    This review examines the state of Bayesian thinking as Statistics in Medicine was launched in 1982, reflecting particularly on its applicability and uses in medical research. It then looks at each subsequent five-year epoch, with a focus on papers appearing in Statistics in Medicine, putting these in the context of major developments in Bayesian thinking and computation with reference to important books, landmark meetings and seminal papers. It charts the growth of Bayesian statistics as it is applied to medicine and makes predictions for the future. From sparse beginnings, where Bayesian statistics was barely mentioned, Bayesian statistics has now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, spatial modelling, longitudinal modelling, survival modelling, molecular genetics and decision-making in respect of new technologies.

  7. Automated Cognitive Health Assessment From Smart Home-Based Behavior Data.

    PubMed

    Dawadi, Prafulla Nath; Cook, Diane Joyce; Schmitter-Edgecombe, Maureen

    2016-07-01

    Smart home technologies offer potential benefits for assisting clinicians by automating health monitoring and well-being assessment. In this paper, we examine the actual benefits of smart home-based analysis by monitoring daily behavior in the home and predicting clinical scores of the residents. To accomplish this goal, we propose a clinical assessment using activity behavior (CAAB) approach to model a smart home resident's daily behavior and predict the corresponding clinical scores. CAAB uses statistical features that describe characteristics of a resident's daily activity performance to train machine learning algorithms that predict the clinical scores. We evaluate the performance of CAAB utilizing smart home sensor data collected from 18 smart homes over two years. We obtain a statistically significant correlation ( r=0.72) between CAAB-predicted and clinician-provided cognitive scores and a statistically significant correlation ( r=0.45) between CAAB-predicted and clinician-provided mobility scores. These prediction results suggest that it is feasible to predict clinical scores using smart home sensor data and learning-based data analysis.

  8. Principal Component Analysis in Construction of 3D Human Knee Joint Models Using a Statistical Shape Model Method

    PubMed Central

    Tsai, Tsung-Yuan; Li, Jing-Sheng; Wang, Shaobai; Li, Pingyue; Kwon, Young-Min; Li, Guoan

    2013-01-01

    The statistical shape model (SSM) method that uses 2D images of the knee joint to predict the 3D joint surface model has been reported in literature. In this study, we constructed a SSM database using 152 human CT knee joint models, including the femur, tibia and patella and analyzed the characteristics of each principal component of the SSM. The surface models of two in vivo knees were predicted using the SSM and their 2D bi-plane fluoroscopic images. The predicted models were compared to their CT joint models. The differences between the predicted 3D knee joint surfaces and the CT image-based surfaces were 0.30 ± 0.81 mm, 0.34 ± 0.79 mm and 0.36 ± 0.59 mm for the femur, tibia and patella, respectively (average ± standard deviation). The computational time for each bone of the knee joint was within 30 seconds using a personal computer. The analysis of this study indicated that the SSM method could be a useful tool to construct 3D surface models of the knee with sub-millimeter accuracy in real time. Thus it may have a broad application in computer assisted knee surgeries that require 3D surface models of the knee. PMID:24156375

  9. Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in metropolitan Manila, Philippines.

    PubMed

    Carvajal, Thaddeus M; Viacrusis, Katherine M; Hernandez, Lara Fides T; Ho, Howell T; Amalin, Divina M; Watanabe, Kozo

    2018-04-17

    Several studies have applied ecological factors such as meteorological variables to develop models and accurately predict the temporal pattern of dengue incidence or occurrence. With the vast amount of studies that investigated this premise, the modeling approaches differ from each study and only use a single statistical technique. It raises the question of whether which technique would be robust and reliable. Hence, our study aims to compare the predictive accuracy of the temporal pattern of Dengue incidence in Metropolitan Manila as influenced by meteorological factors from four modeling techniques, (a) General Additive Modeling, (b) Seasonal Autoregressive Integrated Moving Average with exogenous variables (c) Random Forest and (d) Gradient Boosting. Dengue incidence and meteorological data (flood, precipitation, temperature, southern oscillation index, relative humidity, wind speed and direction) of Metropolitan Manila from January 1, 2009 - December 31, 2013 were obtained from respective government agencies. Two types of datasets were used in the analysis; observed meteorological factors (MF) and its corresponding delayed or lagged effect (LG). After which, these datasets were subjected to the four modeling techniques. The predictive accuracy and variable importance of each modeling technique were calculated and evaluated. Among the statistical modeling techniques, Random Forest showed the best predictive accuracy. Moreover, the delayed or lag effects of the meteorological variables was shown to be the best dataset to use for such purpose. Thus, the model of Random Forest with delayed meteorological effects (RF-LG) was deemed the best among all assessed models. Relative humidity was shown to be the top-most important meteorological factor in the best model. The study exhibited that there are indeed different predictive outcomes generated from each statistical modeling technique and it further revealed that the Random forest model with delayed meteorological effects to be the best in predicting the temporal pattern of Dengue incidence in Metropolitan Manila. It is also noteworthy that the study also identified relative humidity as an important meteorological factor along with rainfall and temperature that can influence this temporal pattern.

  10. Development of a prognostic model for predicting spontaneous singleton preterm birth.

    PubMed

    Schaaf, Jelle M; Ravelli, Anita C J; Mol, Ben Willem J; Abu-Hanna, Ameen

    2012-10-01

    To develop and validate a prognostic model for prediction of spontaneous preterm birth. Prospective cohort study using data of the nationwide perinatal registry in The Netherlands. We studied 1,524,058 singleton pregnancies between 1999 and 2007. We developed a multiple logistic regression model to estimate the risk of spontaneous preterm birth based on maternal and pregnancy characteristics. We used bootstrapping techniques to internally validate our model. Discrimination (AUC), accuracy (Brier score) and calibration (calibration graphs and Hosmer-Lemeshow C-statistic) were used to assess the model's predictive performance. Our primary outcome measure was spontaneous preterm birth at <37 completed weeks. Spontaneous preterm birth occurred in 57,796 (3.8%) pregnancies. The final model included 13 variables for predicting preterm birth. The predicted probabilities ranged from 0.01 to 0.71 (IQR 0.02-0.04). The model had an area under the receiver operator characteristic curve (AUC) of 0.63 (95% CI 0.63-0.63), the Brier score was 0.04 (95% CI 0.04-0.04) and the Hosmer Lemeshow C-statistic was significant (p<0.0001). The calibration graph showed overprediction at higher values of predicted probability. The positive predictive value was 26% (95% CI 20-33%) for the 0.4 probability cut-off point. The model's discrimination was fair and it had modest calibration. Previous preterm birth, drug abuse and vaginal bleeding in the first half of pregnancy were the most important predictors for spontaneous preterm birth. Although not applicable in clinical practice yet, this model is a next step towards early prediction of spontaneous preterm birth that enables caregivers to start preventive therapy in women at higher risk. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  11. Study of angular momentum variation due to entrance channel effect in heavy ion fusion reactions

    NASA Astrophysics Data System (ADS)

    Kumar, Ajay

    2014-05-01

    A systematic investigation of the properties of hot nuclei may be studied by detecting the evaporated particles. These emissions reflect the behavior of the nucleus at various stages of the deexcitation cascade. When the nucleus is formed by the collision of a heavy nucleus with a light particle, the statistical model has done a good job of predicting the distribution of evaporated particles when reasonable choices were made for the level densities and yrast lines. Comparison to more specific measurements could, of course, provide a more severe test of the model and enable one to identify the deviations from the statistical model as the signature of other effects not included in the model. Some papers have claimed that experimental evaporation spectra from heavy-ion fusion reactions at higher excitation energies and angular momenta are no longer consistent with the predictions of the standard statistical model. In order to confirm this prediction we have employed two systems, a mass-symmetric (31P+45Sc) and a mass-asymmetric channel (12C+64Zn), leading to the same compound nucleus 76Kr* at the excitation energy of 75 MeV. Neutron energy spectra of the asymmetric system (12C+64Zn) at different angles are well described by the statistical model predictions using the normal value of the level density parameter a = A/8 MeV-1. However, in the case of the symmetric system (31P+45Sc), the statistical model interpretation of the data requires the change in the value of a = A/10 MeV-1. The delayed evolution of the compound system in case of the symmetric 31P+45Sc system may lead to the formation of a temperature equilibrated dinuclear complex, which may be responsible for the neutron emission at higher temperature, while the protons and alpha particles are evaporated after neutron emission when the system is sufficiently cooled down and the higher g-values do not contribute in the formation of the compound nucleus for the symmetric entrance channel in case of charged particle emission.

  12. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  13. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  14. Can air temperature be used to project influences of climate change on stream temperature?

    USGS Publications Warehouse

    Arismendi, Ivan; Safeeq, Mohammad; Dunham, Jason B.; Johnson, Sherri L.

    2014-01-01

    Worldwide, lack of data on stream temperature has motivated the use of regression-based statistical models to predict stream temperatures based on more widely available data on air temperatures. Such models have been widely applied to project responses of stream temperatures under climate change, but the performance of these models has not been fully evaluated. To address this knowledge gap, we examined the performance of two widely used linear and nonlinear regression models that predict stream temperatures based on air temperatures. We evaluated model performance and temporal stability of model parameters in a suite of regulated and unregulated streams with 11–44 years of stream temperature data. Although such models may have validity when predicting stream temperatures within the span of time that corresponds to the data used to develop them, model predictions did not transfer well to other time periods. Validation of model predictions of most recent stream temperatures, based on air temperature–stream temperature relationships from previous time periods often showed poor performance when compared with observed stream temperatures. Overall, model predictions were less robust in regulated streams and they frequently failed in detecting the coldest and warmest temperatures within all sites. In many cases, the magnitude of errors in these predictions falls within a range that equals or exceeds the magnitude of future projections of climate-related changes in stream temperatures reported for the region we studied (between 0.5 and 3.0 °C by 2080). The limited ability of regression-based statistical models to accurately project stream temperatures over time likely stems from the fact that underlying processes at play, namely the heat budgets of air and water, are distinctive in each medium and vary among localities and through time.

  15. Vitamin D and ferritin correlation with chronic neck pain using standard statistics and a novel artificial neural network prediction model.

    PubMed

    Eloqayli, Haytham; Al-Yousef, Ali; Jaradat, Raid

    2018-02-15

    Despite the high prevalence of chronic neck pain, there is limited consensus about the primary etiology, risk factors, diagnostic criteria and therapeutic outcome. Here, we aimed to determine if Ferritin and Vitamin D are modifiable risk factors with chronic neck pain using slandered statistics and artificial intelligence neural network (ANN). Fifty-four patients with chronic neck pain treated between February 2016 and August 2016 in King Abdullah University Hospital and 54 patients age matched controls undergoing outpatient or minor procedures were enrolled. Patients and control demographic parameters, height, weight and single measurement of serum vitamin D, Vitamin B12, ferritin, calcium, phosphorus, zinc were obtained. An ANN prediction model was developed. The statistical analysis reveals that patients with chronic neck pain have significantly lower serum Vitamin D and Ferritin (p-value <.05). 90% of patients with chronic neck pain were females. Multilayer Feed Forward Neural Network with Back Propagation(MFFNN) prediction model were developed and designed based on vitamin D and ferritin as input variables and CNP as output. The ANN model output results show that, 92 out of 108 samples were correctly classified with 85% classification accuracy. Although Iron and vitamin D deficiency cannot be isolated as the sole risk factors of chronic neck pain, they should be considered as two modifiable risk. The high prevalence of chronic neck pain, hypovitaminosis D and low ferritin amongst women is of concern. Bioinformatics predictions with artificial neural network can be of future benefit in classification and prediction models for chronic neck pain. We hope this initial work will encourage a future larger cohort study addressing vitamin D and iron correction as modifiable factors and the application of artificial intelligence models in clinical practice.

  16. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  17. An analysis, sensitivity and prediction of winter fog events using FASP model over Indo-Gangetic plains, India

    NASA Astrophysics Data System (ADS)

    Srivastava, S. K., Sr.; Sharma, D. A.; Sachdeva, K.

    2017-12-01

    Indo-Gangetic plains of India experience severe fog conditions during the peak winter months of December and January every year. In this paper an attempt has been to analyze the spatial and temporal variability of winter fog over Indo-Gangetic plains. Further, an attempt has also been made to configure an efficient meso-scale numerical weather prediction model using different parameterization schemes and develop a forecasting tool for prediction of fog during winter months over Indo-Gangetic plains. The study revealed that an alarming increasing positive trend of fog frequency prevails over many locations of IGP. Hot spot and cluster analysis were conducted to identify the high fog prone zones using GIS and inferential statistical tools respectively. Hot spots on an average experiences fog on 68.27% days, it is followed by moderate and cold spots with 48.03% and 21.79% respectively. The study proposes a new FASP (Fog Analysis, sensitivity and prediction) Model for overall analysis and prediction of fog at a particular location and period over IGP. In the first phase of this model long term climatological fog data of a location is analyzed to determine its characteristics and prevailing trend using various advanced statistical techniques. During a second phase a sensitivity test is conducted with different combination of parameterization schemes to determine the most suitable combination for fog simulation over a particular location and period and in the third and final phase, first ARIMA model is used to predict the number of fog days in future . Thereafter, Numerical model is used to predict the various meteorological parameters favourable for fog forecast. Finally, Hybrid model is used for fog forecast over the study location. The results of the FASP model are validated with actual ground based fog data using statistical tools. Forecast Fog-gram generated using hybrid model during Jan 2017 shows highly encouraging results for fog occurrence/Non occurrence between 25 hrs to 72 hours forecast. The model predicted the fog occurrences/Non occurrence with more than 85 % accuracy over most of the locations across the study area. The minimum visibility departure is within 500 m on 90% occasions over the central IGP and within 1000m on more than 80 % occasions over most of the locations across Indo-Gangetic plains.

  18. Statistical shear lag model - unraveling the size effect in hierarchical composites.

    PubMed

    Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D

    2015-05-01

    Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  19. Replicating annual North Atlantic hurricane activity 1878-2012 from environmental variables

    NASA Astrophysics Data System (ADS)

    Saunders, Mark A.; Klotzbach, Philip J.; Lea, Adam S. R.

    2017-06-01

    Statistical models can replicate annual North Atlantic hurricane activity from large-scale environmental field data for August and September, the months of peak hurricane activity. We assess how well the six environmental fields used most often in contemporary statistical modeling of seasonal hurricane activity replicate North Atlantic hurricane numbers and Accumulated Cyclone Energy (ACE) over the 135 year period from 1878 to 2012. We find that these fields replicate historical hurricane activity surprisingly well, showing that contemporary statistical models and their seasonal physical links have long-term robustness. We find that August-September zonal trade wind speed over the Caribbean Sea and the tropical North Atlantic is the environmental field which individually replicates long-term hurricane activity the best and that trade wind speed combined with the difference in sea surface temperature between the tropical Atlantic and the tropical mean is the best multi-predictor model. Comparing the performance of the best single-predictor and best multi-predictor models shows that they exhibit little difference in hindcast skill for predicting long-term ACE but that the best multipredictor model offers improved skill for predicting long-term hurricane numbers. We examine whether replicated real-time prediction skill 1983-2012 increases as the model training period lengthens and find evidence that this happens slowly. We identify a dropout in hurricane replication centered on the 1940s and show that this is likely due to a decrease in data quality which affects all data sets but Atlantic sea surface temperatures in particular. Finally, we offer insights on the implications of our findings for seasonal hurricane prediction.

  20. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 3: A stochastic rain fade control algorithm for satellite link power via non linear Markow filtering theory

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1991-01-01

    The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.

  1. Discrimination and prediction of the origin of Chinese and Korean soybeans using Fourier transform infrared spectrometry (FT-IR) with multivariate statistical analysis

    PubMed Central

    Lee, Byeong-Ju; Zhou, Yaoyao; Lee, Jae Soung; Shin, Byeung Kon; Seo, Jeong-Ah; Lee, Doyup; Kim, Young-Suk

    2018-01-01

    The ability to determine the origin of soybeans is an important issue following the inclusion of this information in the labeling of agricultural food products becoming mandatory in South Korea in 2017. This study was carried out to construct a prediction model for discriminating Chinese and Korean soybeans using Fourier-transform infrared (FT-IR) spectroscopy and multivariate statistical analysis. The optimal prediction models for discriminating soybean samples were obtained by selecting appropriate scaling methods, normalization methods, variable influence on projection (VIP) cutoff values, and wave-number regions. The factors for constructing the optimal partial-least-squares regression (PLSR) prediction model were using second derivatives, vector normalization, unit variance scaling, and the 4000–400 cm–1 region (excluding water vapor and carbon dioxide). The PLSR model for discriminating Chinese and Korean soybean samples had the best predictability when a VIP cutoff value was not applied. When Chinese soybean samples were identified, a PLSR model that has the lowest root-mean-square error of the prediction value was obtained using a VIP cutoff value of 1.5. The optimal PLSR prediction model for discriminating Korean soybean samples was also obtained using a VIP cutoff value of 1.5. This is the first study that has combined FT-IR spectroscopy with normalization methods, VIP cutoff values, and selected wave-number regions for discriminating Chinese and Korean soybeans. PMID:29689113

  2. Spatial prediction of landslide hazard using discriminant analysis and GIS

    Treesearch

    Peter V. Gorsevski; Paul Gessler; Randy B. Foltz

    2000-01-01

    Environmental attributes relevant for spatial prediction of landslides triggered by rain and snowmelt events were derived from digital elevation model (DEM). Those data in conjunction with statistics and geographic information system (GIS) provided a detailed basis for spatial prediction of landslide hazard. The spatial prediction of landslide hazard in this paper is...

  3. Modeling forest biomass and growth: Coupling long-term inventory and LiDAR data

    Treesearch

    Chad Babcock; Andrew O. Finley; Bruce D. Cook; Aaron Weiskittel; Christopher W. Woodall

    2016-01-01

    Combining spatially-explicit long-term forest inventory and remotely sensed information from Light Detection and Ranging (LiDAR) datasets through statistical models can be a powerful tool for predicting and mapping above-ground biomass (AGB) at a range of geographic scales. We present and examine a novel modeling approach to improve prediction of AGB and estimate AGB...

  4. Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment

    NASA Astrophysics Data System (ADS)

    Sahoo, Sasmita; Jha, Madan K.

    2013-12-01

    The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.

  5. A probabilistic mechanical model for prediction of aggregates’ size distribution effect on concrete compressive strength

    NASA Astrophysics Data System (ADS)

    Miled, Karim; Limam, Oualid; Sab, Karam

    2012-06-01

    To predict aggregates' size distribution effect on the concrete compressive strength, a probabilistic mechanical model is proposed. Within this model, a Voronoi tessellation of a set of non-overlapping and rigid spherical aggregates is used to describe the concrete microstructure. Moreover, aggregates' diameters are defined as statistical variables and their size distribution function is identified to the experimental sieve curve. Then, an inter-aggregate failure criterion is proposed to describe the compressive-shear crushing of the hardened cement paste when concrete is subjected to uniaxial compression. Using a homogenization approach based on statistical homogenization and on geometrical simplifications, an analytical formula predicting the concrete compressive strength is obtained. This formula highlights the effects of cement paste strength and aggregates' size distribution and volume fraction on the concrete compressive strength. According to the proposed model, increasing the concrete strength for the same cement paste and the same aggregates' volume fraction is obtained by decreasing both aggregates' maximum size and the percentage of coarse aggregates. Finally, the validity of the model has been discussed through a comparison with experimental results (15 concrete compressive strengths ranging between 46 and 106 MPa) taken from literature and showing a good agreement with the model predictions.

  6. Identification of cognitive and non-cognitive predictive variables related to attrition in baccalaureate nursing education programs in Mississippi

    NASA Astrophysics Data System (ADS)

    Hayes, Catherine

    2005-07-01

    This study sought to identify a variable or variables predictive of attrition among baccalaureate nursing students. The study was quantitative in design and multivariate correlational statistics and discriminant statistical analysis were used to identify a model for prediction of attrition. The analysis then weighted variables according to their predictive value to determine the most parsimonious model with the greatest predictive value. Three public university nursing education programs in Mississippi offering a Bachelors Degree in Nursing were selected for the study. The population consisted of students accepted and enrolled in these three programs for the years 2001 and 2002 and graduating in the years 2003 and 2004 (N = 195). The categorical dependent variable was attrition (includes academic failure or withdrawal) from the program of nursing education. The ten independent variables selected for the study and considered to have possible predictive value were: Grade Point Average for Pre-requisite Course Work; ACT Composite Score, ACT Reading Subscore, and ACT Mathematics Subscore; Letter Grades in the Courses: Anatomy & Physiology and Lab I, Algebra I, English I (101), Chemistry & Lab I, and Microbiology & Lab I; and Number of Institutions Attended (Universities, Colleges, Junior Colleges or Community Colleges). Descriptive analysis was performed and the means of each of the ten independent variables was compared for students who attrited and those who were retained in the population. The discriminant statistical analysis performed created a matrix using the ten variable model that was able to correctly predicted attrition in the study's population in 77.6% of the cases. Variables were then combined and recombined to produce the most efficient and parsimonious model for prediction. A six variable model resulted which weighted each variable according to predictive value: GPA for Prerequisite Coursework, ACT Composite, English I, Chemistry & Lab I, Microbiology & Lab I, and Number of Institutions Attended. Results of the study indicate that it is possible to predict attrition among students enrolled in baccalaureate nursing education programs and that additional investigation on the subject is warranted.

  7. A statistical model of brittle fracture by transgranular cleavage

    NASA Astrophysics Data System (ADS)

    Lin, Tsann; Evans, A. G.; Ritchie, R. O.

    A MODEL for brittle fracture by transgranular cleavage cracking is presented based on the application of weakest link statistics to the critical microstructural fracture mechanisms. The model permits prediction of the macroscopic fracture toughness, KI c, in single phase microstructures containing a known distribution of particles, and defines the critical distance from the crack tip at which the initial cracking event is most probable. The model is developed for unstable fracture ahead of a sharp crack considering both linear elastic and nonlinear elastic ("elastic/plastic") crack tip stress fields. Predictions are evaluated by comparison with experimental results on the low temperature flow and fracture behavior of a low carbon mild steel with a simple ferrite/grain boundary carbide microstructure.

  8. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care.

    PubMed

    Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer

    2017-04-01

    Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.

  9. A stochastic fractional dynamics model of space-time variability of rain

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun K.; Travis, James E.

    2013-09-01

    varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.

  10. Leuconostoc mesenteroides growth in food products: prediction and sensitivity analysis by adaptive-network-based fuzzy inference systems.

    PubMed

    Wang, Hue-Yu; Wen, Ching-Feng; Chiu, Yu-Hsien; Lee, I-Nong; Kao, Hao-Yun; Lee, I-Chen; Ho, Wen-Hsien

    2013-01-01

    An adaptive-network-based fuzzy inference system (ANFIS) was compared with an artificial neural network (ANN) in terms of accuracy in predicting the combined effects of temperature (10.5 to 24.5°C), pH level (5.5 to 7.5), sodium chloride level (0.25% to 6.25%) and sodium nitrite level (0 to 200 ppm) on the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. THE ANFIS AND ANN MODELS WERE COMPARED IN TERMS OF SIX STATISTICAL INDICES CALCULATED BY COMPARING THEIR PREDICTION RESULTS WITH ACTUAL DATA: mean absolute percentage error (MAPE), root mean square error (RMSE), standard error of prediction percentage (SEP), bias factor (Bf), accuracy factor (Af), and absolute fraction of variance (R (2)). Graphical plots were also used for model comparison. The learning-based systems obtained encouraging prediction results. Sensitivity analyses of the four environmental factors showed that temperature and, to a lesser extent, NaCl had the most influence on accuracy in predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. The observed effectiveness of ANFIS for modeling microbial kinetic parameters confirms its potential use as a supplemental tool in predictive mycology. Comparisons between growth rates predicted by ANFIS and actual experimental data also confirmed the high accuracy of the Gaussian membership function in ANFIS. Comparisons of the six statistical indices under both aerobic and anaerobic conditions also showed that the ANFIS model was better than all ANN models in predicting the four kinetic parameters. Therefore, the ANFIS model is a valuable tool for quickly predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions.

  11. Leuconostoc Mesenteroides Growth in Food Products: Prediction and Sensitivity Analysis by Adaptive-Network-Based Fuzzy Inference Systems

    PubMed Central

    Wang, Hue-Yu; Wen, Ching-Feng; Chiu, Yu-Hsien; Lee, I-Nong; Kao, Hao-Yun; Lee, I-Chen; Ho, Wen-Hsien

    2013-01-01

    Background An adaptive-network-based fuzzy inference system (ANFIS) was compared with an artificial neural network (ANN) in terms of accuracy in predicting the combined effects of temperature (10.5 to 24.5°C), pH level (5.5 to 7.5), sodium chloride level (0.25% to 6.25%) and sodium nitrite level (0 to 200 ppm) on the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. Methods The ANFIS and ANN models were compared in terms of six statistical indices calculated by comparing their prediction results with actual data: mean absolute percentage error (MAPE), root mean square error (RMSE), standard error of prediction percentage (SEP), bias factor (Bf), accuracy factor (Af), and absolute fraction of variance (R 2). Graphical plots were also used for model comparison. Conclusions The learning-based systems obtained encouraging prediction results. Sensitivity analyses of the four environmental factors showed that temperature and, to a lesser extent, NaCl had the most influence on accuracy in predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. The observed effectiveness of ANFIS for modeling microbial kinetic parameters confirms its potential use as a supplemental tool in predictive mycology. Comparisons between growth rates predicted by ANFIS and actual experimental data also confirmed the high accuracy of the Gaussian membership function in ANFIS. Comparisons of the six statistical indices under both aerobic and anaerobic conditions also showed that the ANFIS model was better than all ANN models in predicting the four kinetic parameters. Therefore, the ANFIS model is a valuable tool for quickly predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. PMID:23705023

  12. Prediction of Down-Gradient Impacts of DNAPL Source Depletion Using Tracer Techniques

    NASA Astrophysics Data System (ADS)

    Basu, N. B.; Fure, A. D.; Jawitz, J. W.

    2006-12-01

    Four simplified DNAPL source depletion models that have been discussed in the literature recently are evaluated for the prediction of long-term effects of source depletion under natural gradient flow. These models are simple in form (a power function equation is an example) but are shown here to serve as mathematical analogs to complex multiphase flow and transport simulators. One of the source depletion models, the equilibrium streamtube model, is shown to be relatively easily parameterized using non-reactive and reactive tracers. Non-reactive tracers are used to characterize the aquifer heterogeneity while reactive tracers are used to describe the mean DNAPL mass and its distribution. This information is then used in a Lagrangian framework to predict source remediation performance. In a Lagrangian approach the source zone is conceptualized as a collection of non-interacting streamtubes with hydrodynamic and DNAPL heterogeneity represented by the variation of the travel time and DNAPL saturation among the streamtubes. The travel time statistics are estimated from the non-reactive tracer data while the DNAPL distribution statistics are estimated from the reactive tracer data. The combined statistics are used to define an analytical solution for contaminant dissolution under natural gradient flow. The tracer prediction technique compared favorably with results from a multiphase flow and transport simulator UTCHEM in domains with different hydrodynamic heterogeneity (variance of the log conductivity field = 0.2, 1 and 3).

  13. Discriminative Random Field Models for Subsurface Contamination Uncertainty Quantification

    NASA Astrophysics Data System (ADS)

    Arshadi, M.; Abriola, L. M.; Miller, E. L.; De Paolis Kaluza, C.

    2017-12-01

    Application of flow and transport simulators for prediction of the release, entrapment, and persistence of dense non-aqueous phase liquids (DNAPLs) and associated contaminant plumes is a computationally intensive process that requires specification of a large number of material properties and hydrologic/chemical parameters. Given its computational burden, this direct simulation approach is particularly ill-suited for quantifying both the expected performance and uncertainty associated with candidate remediation strategies under real field conditions. Prediction uncertainties primarily arise from limited information about contaminant mass distributions, as well as the spatial distribution of subsurface hydrologic properties. Application of direct simulation to quantify uncertainty would, thus, typically require simulating multiphase flow and transport for a large number of permeability and release scenarios to collect statistics associated with remedial effectiveness, a computationally prohibitive process. The primary objective of this work is to develop and demonstrate a methodology that employs measured field data to produce equi-probable stochastic representations of a subsurface source zone that capture the spatial distribution and uncertainty associated with key features that control remediation performance (i.e., permeability and contamination mass). Here we employ probabilistic models known as discriminative random fields (DRFs) to synthesize stochastic realizations of initial mass distributions consistent with known, and typically limited, site characterization data. Using a limited number of full scale simulations as training data, a statistical model is developed for predicting the distribution of contaminant mass (e.g., DNAPL saturation and aqueous concentration) across a heterogeneous domain. Monte-Carlo sampling methods are then employed, in conjunction with the trained statistical model, to generate realizations conditioned on measured borehole data. Performance of the statistical model is illustrated through comparisons of generated realizations with the `true' numerical simulations. Finally, we demonstrate how these realizations can be used to determine statistically optimal locations for further interrogation of the subsurface.

  14. Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.

    2002-01-01

    Several methods for the prediction of jet noise are described. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy, whereas the other is the jet noise generation model recently proposed by Tam and Auriault. In all of the approaches, some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier-Stokes equation using a kappa-sigma turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach, but instead is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. In conclusion, a proposal is presented for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms, as is a discussion of noise prediction issues that remain to be resolved.

  15. The Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.; Morris, Philip J.

    2002-01-01

    This paper describes several methods for the prediction of jet noise. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy while the other is the jet noise generation model recently proposed by Tam and Auriault. In all the approaches some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier Stokes equation using a k-epsilon turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach: but, is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. The paper concludes with a proposal for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms and a discussion of noise prediction issues that remain to be resolved.

  16. The Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.

    2002-01-01

    This paper describes several methods for the prediction of jet noise. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy while the other is the jet noise generation model recently proposed by Tam and Auriault. In all the approaches some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier Stokes equation using a k - epsilon turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach: but, is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. The paper concludes with a proposal for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms and a discussion of noise prediction issues that remain to be resolved.

  17. Dynamic rain fade compensation techniques for the advanced communications technology satellite

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1992-01-01

    The dynamic and composite nature of propagation impairments that are incurred on earth-space communications links at frequencies in and above the 30/20 GHz Ka band necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) project by the implementation of optimal processing schemes derived through the use of the ACTS Rain Attenuation Prediction Model and nonlinear Markov filtering theory. The ACTS Rain Attenuation Prediction Model discerns climatological variations on the order of 0.5 deg in latitude and longitude in the continental U.S. The time-dependent portion of the model gives precise availability predictions for the 'spot beam' links of ACTS. However, the structure of the dynamic portion of the model, which yields performance parameters such as fade duration probabilities, is isomorphic to the state-variable approach of stochastic control theory and is amenable to the design of such statistical fade processing schemes which can be made specific to the particular climatological location at which they are employed.

  18. Limitations of the Porter-Thomas distribution

    NASA Astrophysics Data System (ADS)

    Weidenmüller, Hans A.

    2017-12-01

    Data on the distribution of reduced partial neutron widths and on the distribution of total gamma decay widths disagree with the Porter-Thomas distribution (PTD) for reduced partial widths or with predictions of the statistical model. We recall why the disagreement is important: The PTD is a direct consequence of the orthogonal invariance of the Gaussian Orthogonal Ensemble (GOE) of random matrices. The disagreement is reviewed. Two possible causes for violation of orthogonal invariance of the GOE are discussed, and their consequences explored. The disagreement of the distribution of total gamma decay widths with theoretical predictions cannot be blamed on the statistical model.

  19. Confidence Intervals from Realizations of Simulated Nuclear Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Younes, W.; Ratkiewicz, A.; Ressler, J. J.

    2017-09-28

    Various statistical techniques are discussed that can be used to assign a level of confidence in the prediction of models that depend on input data with known uncertainties and correlations. The particular techniques reviewed in this paper are: 1) random realizations of the input data using Monte-Carlo methods, 2) the construction of confidence intervals to assess the reliability of model predictions, and 3) resampling techniques to impose statistical constraints on the input data based on additional information. These techniques are illustrated with a calculation of the keff value, based on the 235U(n, f) and 239Pu (n, f) cross sections.

  20. Modeling of the reactant conversion rate in a turbulent shear flow

    NASA Technical Reports Server (NTRS)

    Frankel, S. H.; Madnia, C. K.; Givi, P.

    1992-01-01

    Results are presented of direct numerical simulations (DNS) of spatially developing shear flows under the influence of infinitely fast chemical reactions of the type A + B yields Products. The simulation results are used to construct the compositional structure of the scalar field in a statistical manner. The results of this statistical analysis indicate that the use of a Beta density for the probability density function (PDF) of an appropriate Shvab-Zeldovich mixture fraction provides a very good estimate of the limiting bounds of the reactant conversion rate within the shear layer. This provides a strong justification for the implementation of this density in practical modeling of non-homogeneous turbulent reacting flows. However, the validity of the model cannot be generalized for predictions of higher order statistical quantities. A closed form analytical expression is presented for predicting the maximum rate of reactant conversion in non-homogeneous reacting turbulence.

  1. Annual Research Briefs, 1987

    NASA Technical Reports Server (NTRS)

    Moin, Parviz; Reynolds, William C.

    1988-01-01

    Lagrangian techniques have found widespread application to the prediction and understanding of turbulent transport phenomena and have yielded satisfactory results for different cases of shear flow problems. However, it must be kept in mind that in most experiments what is really available are Eulerian statistics, and it is far from obvious how to extract from them the information relevant to the Lagrangian behavior of the flow; in consequence, Lagrangian models still include some hypothesis for which no adequate supporting evidence was until now available. Direct numerical simulation of turbulence offers a new way to obtain Lagrangian statistics and so verify the validity of the current predictive models and the accuracy of their results. After the pioneering work of Riley (Riley and Patterson, 1974) in the 70's, some such results have just appeared in the literature (Lee et al, Yeung and Pope). The present contribution follows in part similar lines, but focuses on two particle statistics and comparison with existing models.

  2. Statistical modelling coupled with LC-MS analysis to predict human upper intestinal absorption of phytochemical mixtures.

    PubMed

    Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise

    2018-04-15

    A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Development and validation of a risk calculator predicting exercise-induced ventricular arrhythmia in patients with cardiovascular disease.

    PubMed

    Hermes, Ilarraza-Lomelí; Marianna, García-Saldivia; Jessica, Rojano-Castillo; Carlos, Barrera-Ramírez; Rafael, Chávez-Domínguez; María Dolores, Rius-Suárez; Pedro, Iturralde

    2016-10-01

    Mortality due to cardiovascular disease is often associated with ventricular arrhythmias. Nowadays, patients with cardiovascular disease are more encouraged to take part in physical training programs. Nevertheless, high-intensity exercise is associated to a higher risk for sudden death, even in apparently healthy people. During an exercise testing (ET), health care professionals provide patients, in a controlled scenario, an intense physiological stimulus that could precipitate cardiac arrhythmia in high risk individuals. There is still no clinical or statistical tool to predict this incidence. The aim of this study was to develop a statistical model to predict the incidence of exercise-induced potentially life-threatening ventricular arrhythmia (PLVA) during high intensity exercise. 6415 patients underwent a symptom-limited ET with a Balke ramp protocol. A multivariate logistic regression model where the primary outcome was PLVA was performed. Incidence of PLVA was 548 cases (8.5%). After a bivariate model, thirty one clinical or ergometric variables were statistically associated with PLVA and were included in the regression model. In the multivariate model, 13 of these variables were found to be statistically significant. A regression model (G) with a X(2) of 283.987 and a p<0.001, was constructed. Significant variables included: heart failure, antiarrhythmic drugs, myocardial lower-VD, age and use of digoxin, nitrates, among others. This study allows clinicians to identify patients at risk of ventricular tachycardia or couplets during exercise, and to take preventive measures or appropriate supervision. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  5. Development and external validation of a risk-prediction model to predict 5-year overall survival in advanced larynx cancer.

    PubMed

    Petersen, Japke F; Stuiver, Martijn M; Timmermans, Adriana J; Chen, Amy; Zhang, Hongzhen; O'Neill, James P; Deady, Sandra; Vander Poorten, Vincent; Meulemans, Jeroen; Wennerberg, Johan; Skroder, Carl; Day, Andrew T; Koch, Wayne; van den Brekel, Michiel W M

    2018-05-01

    TNM-classification inadequately estimates patient-specific overall survival (OS). We aimed to improve this by developing a risk-prediction model for patients with advanced larynx cancer. Cohort study. We developed a risk prediction model to estimate the 5-year OS rate based on a cohort of 3,442 patients with T3T4N0N+M0 larynx cancer. The model was internally validated using bootstrapping samples and externally validated on patient data from five external centers (n = 770). The main outcome was performance of the model as tested by discrimination, calibration, and the ability to distinguish risk groups based on tertiles from the derivation dataset. The model performance was compared to a model based on T and N classification only. We included age, gender, T and N classification, and subsite as prognostic variables in the standard model. After external validation, the standard model had a significantly better fit than a model based on T and N classification alone (C statistic, 0.59 vs. 0.55, P < .001). The model was able to distinguish well among three risk groups based on tertiles of the risk score. Adding treatment modality to the model did not decrease the predictive power. As a post hoc analysis, we tested the added value of comorbidity as scored by American Society of Anesthesiologists score in a subsample, which increased the C statistic to 0.68. A risk prediction model for patients with advanced larynx cancer, consisting of readily available clinical variables, gives more accurate estimations of the estimated 5-year survival rate when compared to a model based on T and N classification alone. 2c. Laryngoscope, 128:1140-1145, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  6. Comparison of RF spectrum prediction methods for dynamic spectrum access

    NASA Astrophysics Data System (ADS)

    Kovarskiy, Jacob A.; Martone, Anthony F.; Gallagher, Kyle A.; Sherbondy, Kelly D.; Narayanan, Ram M.

    2017-05-01

    Dynamic spectrum access (DSA) refers to the adaptive utilization of today's busy electromagnetic spectrum. Cognitive radio/radar technologies require DSA to intelligently transmit and receive information in changing environments. Predicting radio frequency (RF) activity reduces sensing time and energy consumption for identifying usable spectrum. Typical spectrum prediction methods involve modeling spectral statistics with Hidden Markov Models (HMM) or various neural network structures. HMMs describe the time-varying state probabilities of Markov processes as a dynamic Bayesian network. Neural Networks model biological brain neuron connections to perform a wide range of complex and often non-linear computations. This work compares HMM, Multilayer Perceptron (MLP), and Recurrent Neural Network (RNN) algorithms and their ability to perform RF channel state prediction. Monte Carlo simulations on both measured and simulated spectrum data evaluate the performance of these algorithms. Generalizing spectrum occupancy as an alternating renewal process allows Poisson random variables to generate simulated data while energy detection determines the occupancy state of measured RF spectrum data for testing. The results suggest that neural networks achieve better prediction accuracy and prove more adaptable to changing spectral statistics than HMMs given sufficient training data.

  7. Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001; Gupta, Shikha

    Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models wasmore » performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive abilities of the interspecies GRNN model to predict the carcinogenic potency of diverse chemicals. - Highlights: • Global robust models constructed for carcinogenicity prediction of diverse chemicals. • Tanimoto/BDS test revealed structural diversity of chemicals and nonlinearity in data. • PNN/GRNN successfully predicted carcinogenicity/carcinogenic potency of chemicals. • Developed interspecies PNN/GRNN models for carcinogenicity prediction. • Proposed models can be used as tool to predict carcinogenicity of new chemicals.« less

  8. A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pasqualini, Donatella

    This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimatedmore » stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.« less

  9. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

    PubMed

    Austin, Peter C; Steyerberg, Ewout W

    2012-06-20

    When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

  10. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  11. Multilevel Model Prediction

    ERIC Educational Resources Information Center

    Frees, Edward W.; Kim, Jee-Seon

    2006-01-01

    Multilevel models are proven tools in social research for modeling complex, hierarchical systems. In multilevel modeling, statistical inference is based largely on quantification of random variables. This paper distinguishes among three types of random variables in multilevel modeling--model disturbances, random coefficients, and future response…

  12. An adaptive two-stage analog/regression model for probabilistic prediction of small-scale precipitation in France

    NASA Astrophysics Data System (ADS)

    Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

    2018-01-01

    Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.

  13. Response of Douglas-fir advance regeneration to overstory removal

    Treesearch

    J. Chris Maranto; Dennis E. Ferguson; David L. Adams

    2008-01-01

    A statistical model is presented that predicts periodic height growth for released Pseudotsuga menziesii var. glauca [Beissn.] Franco advance regeneration in central Idaho. Individual tree and site variables were used to construct a model that predicts 5-year height growth for years 6 through 10 after release. Habitat type and height growth prior to...

  14. Predicting fire spread in Arizona's oak chaparral

    Treesearch

    A. W. Lindenmuth; James R. Davis

    1973-01-01

    Five existing fire models, both experimental and theoretical, did not adequately predict rate-of-spread (ROS) when tested on single- and multiclump fires in oak chaparral in Arizona. A statistical model developed using essentially the same input variables but weighted differently accounted for 81 percent ofthe variation in ROS. A chemical coefficient that accounts for...

  15. Integrated Wind Power Planning Tool

    NASA Astrophysics Data System (ADS)

    Rosgaard, M. H.; Giebel, G.; Nielsen, T. S.; Hahmann, A.; Sørensen, P.; Madsen, H.

    2012-04-01

    This poster presents the current state of the public service obligation (PSO) funded project PSO 10464, with the working title "Integrated Wind Power Planning Tool". The project commenced October 1, 2011, and the goal is to integrate a numerical weather prediction (NWP) model with purely statistical tools in order to assess wind power fluctuations, with focus on long term power system planning for future wind farms as well as short term forecasting for existing wind farms. Currently, wind power fluctuation models are either purely statistical or integrated with NWP models of limited resolution. With regard to the latter, one such simulation tool has been developed at the Wind Energy Division, Risø DTU, intended for long term power system planning. As part of the PSO project the inferior NWP model used at present will be replaced by the state-of-the-art Weather Research & Forecasting (WRF) model. Furthermore, the integrated simulation tool will be improved so it can handle simultaneously 10-50 times more turbines than the present ~ 300, as well as additional atmospheric parameters will be included in the model. The WRF data will also be input for a statistical short term prediction model to be developed in collaboration with ENFOR A/S; a danish company that specialises in forecasting and optimisation for the energy sector. This integrated prediction model will allow for the description of the expected variability in wind power production in the coming hours to days, accounting for its spatio-temporal dependencies, and depending on the prevailing weather conditions defined by the WRF output. The output from the integrated prediction tool constitute scenario forecasts for the coming period, which can then be fed into any type of system model or decision making problem to be solved. The high resolution of the WRF results loaded into the integrated prediction model will ensure a high accuracy data basis is available for use in the decision making process of the Danish transmission system operator, and the need for high accuracy predictions will only increase over the next decade as Denmark approaches the goal of 50% wind power based electricity in 2020, from the current 20%.

  16. A person based formula for allocating commissioning funds to general practices in England: development of a statistical model.

    PubMed

    Dixon, Jennifer; Smith, Peter; Gravelle, Hugh; Martin, Steve; Bardsley, Martin; Rice, Nigel; Georghiou, Theo; Dusheiko, Mark; Billings, John; Lorenzo, Michael De; Sanderson, Colin

    2011-11-22

    To develop a formula for allocating resources for commissioning hospital care to all general practices in England based on the health needs of the people registered in each practice Multivariate prospective statistical models were developed in which routinely collected electronic information from 2005-6 and 2006-7 on individuals and the areas in which they lived was used to predict their costs of hospital care in the next year, 2007-8. Data on individuals included all diagnoses recorded at any inpatient admission. Models were developed on a random sample of 5 million people and validated on a second random sample of 5 million people and a third sample of 5 million people drawn from a random sample of practices. All general practices in England as of 1 April 2007. All NHS inpatient admissions and outpatient attendances for individuals registered with a general practice on that date. All individuals registered with a general practice in England at 1 April 2007. Power of the statistical models to predict the costs of the individual patient or each practice's registered population for 2007-8 tested with a range of metrics (R(2) reported here). Comparisons of predicted costs in 2007-8 with actual costs incurred in the same year were calculated by individual and by practice. Models including person level information (age, sex, and ICD-10 codes diagnostic recorded) and a range of area level information (such as socioeconomic deprivation and supply of health facilities) were most predictive of costs. After accounting for person level variables, area level variables added little explanatory power. The best models for resource allocation could predict upwards of 77% of the variation in costs at practice level, and about 12% at the person level. With these models, the predicted costs of about a third of practices would exceed or undershoot the actual costs by 10% or more. Smaller practices were more likely to be in these groups. A model was developed that performed well by international standards, and could be used for allocations to practices for commissioning. The best formulas, however, could predict only about 12% of the variation in next year's costs of most inpatient and outpatient NHS care for each individual. Person-based diagnostic data significantly added to the predictive power of the models.

  17. HiRadProp: High-Frequency Modeling and Prediction of Tropospheric Radiopropagation Parameters from Ground-Based-Multi-Channel Radiometric Measurements between Ka and W Band

    DTIC Science & Technology

    2016-05-11

    new physically -based prediction models for all-weather path attenuation estimation at Ka, V and W band from multi- channel microwave radiometric data...of new physically -based prediction models for all-weather path attenuation estimation at Ka, V and W band from multi- channel microwave radiometric...the medium behavior at these frequency bands from both a physical and a statistical point of view (e.g., [5]-[7]). However, these campaigns are

  18. Statistical characterization of the fatigue behavior of composite lamina

    NASA Technical Reports Server (NTRS)

    Yang, J. N.; Jones, D. L.

    1979-01-01

    A theoretical model was developed to predict statistically the effects of constant and variable amplitude fatigue loadings on the residual strength and fatigue life of composite lamina. The parameters in the model were established from the results of a series of static tensile tests and a fatigue scan and a number of verification tests were performed. Abstracts for two other papers on the effect of load sequence on the statistical fatigue of composites are also presented.

  19. A Frequency Domain Approach to Pretest Analysis Model Correlation and Model Updating for the Mid-Frequency Range

    DTIC Science & Technology

    2009-02-01

    range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The corresponding...frequency range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The...predictions. The averaging process is consistent with the averaging done in statistical energy analysis for stochastic systems. The FEM will always

  20. Application of linear regression analysis in accuracy assessment of rolling force calculations

    NASA Astrophysics Data System (ADS)

    Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.

    1998-10-01

    Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.

  1. A cross-national analysis of how economic inequality predicts biodiversity loss.

    PubMed

    Holland, Tim G; Peterson, Garry D; Gonzalez, Andrew

    2009-10-01

    We used socioeconomic models that included economic inequality to predict biodiversity loss, measured as the proportion of threatened plant and vertebrate species, across 50 countries. Our main goal was to evaluate whether economic inequality, measured as the Gini index of income distribution, improved the explanatory power of our statistical models. We compared four models that included the following: only population density, economic footprint (i.e., the size of the economy relative to the country area), economic footprint and income inequality (Gini index), and an index of environmental governance. We also tested the environmental Kuznets curve hypothesis, but it was not supported by the data. Statistical comparisons of the models revealed that the model including both economic footprint and inequality was the best predictor of threatened species. It significantly outperformed population density alone and the environmental governance model according to the Akaike information criterion. Inequality was a significant predictor of biodiversity loss and significantly improved the fit of our models. These results confirm that socioeconomic inequality is an important factor to consider when predicting rates of anthropogenic biodiversity loss.

  2. Fatigue Life Prediction of Fiber-Reinforced Ceramic-Matrix Composites with Different Fiber Preforms at Room and Elevated Temperatures

    PubMed Central

    Li, Longbiao

    2016-01-01

    In this paper, the fatigue life of fiber-reinforced ceramic-matrix composites (CMCs) with different fiber preforms, i.e., unidirectional, cross-ply, 2D (two dimensional), 2.5D and 3D CMCs at room and elevated temperatures in air and oxidative environments, has been predicted using the micromechanics approach. An effective coefficient of the fiber volume fraction along the loading direction (ECFL) was introduced to describe the fiber architecture of preforms. The statistical matrix multicracking model and fracture mechanics interface debonding criterion were used to determine the matrix crack spacing and interface debonded length. Under cyclic fatigue loading, the fiber broken fraction was determined by combining the interface wear model and fiber statistical failure model at room temperature, and interface/fiber oxidation model, interface wear model and fiber statistical failure model at elevated temperatures, based on the assumption that the fiber strength is subjected to two-parameter Weibull distribution and the load carried by broken and intact fibers satisfies the Global Load Sharing (GLS) criterion. When the broken fiber fraction approaches the critical value, the composites fatigue fracture. PMID:28773332

  3. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  4. The application of feature selection to the development of Gaussian process models for percutaneous absorption.

    PubMed

    Lam, Lun Tak; Sun, Yi; Davey, Neil; Adams, Rod; Prapopoulou, Maria; Brown, Marc B; Moss, Gary P

    2010-06-01

    The aim was to employ Gaussian processes to assess mathematically the nature of a skin permeability dataset and to employ these methods, particularly feature selection, to determine the key physicochemical descriptors which exert the most significant influence on percutaneous absorption, and to compare such models with established existing models. Gaussian processes, including automatic relevance detection (GPRARD) methods, were employed to develop models of percutaneous absorption that identified key physicochemical descriptors of percutaneous absorption. Using MatLab software, the statistical performance of these models was compared with single linear networks (SLN) and quantitative structure-permeability relationships (QSPRs). Feature selection methods were used to examine in more detail the physicochemical parameters used in this study. A range of statistical measures to determine model quality were used. The inherently nonlinear nature of the skin data set was confirmed. The Gaussian process regression (GPR) methods yielded predictive models that offered statistically significant improvements over SLN and QSPR models with regard to predictivity (where the rank order was: GPR > SLN > QSPR). Feature selection analysis determined that the best GPR models were those that contained log P, melting point and the number of hydrogen bond donor groups as significant descriptors. Further statistical analysis also found that great synergy existed between certain parameters. It suggested that a number of the descriptors employed were effectively interchangeable, thus questioning the use of models where discrete variables are output, usually in the form of an equation. The use of a nonlinear GPR method produced models with significantly improved predictivity, compared with SLN or QSPR models. Feature selection methods were able to provide important mechanistic information. However, it was also shown that significant synergy existed between certain parameters, and as such it was possible to interchange certain descriptors (i.e. molecular weight and melting point) without incurring a loss of model quality. Such synergy suggested that a model constructed from discrete terms in an equation may not be the most appropriate way of representing mechanistic understandings of skin absorption.

  5. A Bayesian approach for parameter estimation and prediction using a computationally intensive model

    DOE PAGES

    Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...

    2015-02-05

    Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less

  6. Humidity-corrected Arrhenius equation: The reference condition approach.

    PubMed

    Naveršnik, Klemen; Jurečič, Rok

    2016-03-16

    Accelerated and stress stability data is often used to predict shelf life of pharmaceuticals. Temperature, combined with humidity accelerates chemical decomposition and the Arrhenius equation is used to extrapolate accelerated stability results to long-term stability. Statistical estimation of the humidity-corrected Arrhenius equation is not straightforward due to its non-linearity. A two stage nonlinear fitting approach is used in practice, followed by a prediction stage. We developed a single-stage statistical procedure, called the reference condition approach, which has better statistical properties (less collinearity, direct estimation of uncertainty, narrower prediction interval) and is significantly easier to use, compared to the existing approaches. Our statistical model was populated with data from a 35-day stress stability study on a laboratory batch of vitamin tablets and required mere 30 laboratory assay determinations. The stability prediction agreed well with the actual 24-month long term stability of the product. The approach has high potential to assist product formulation, specification setting and stability statements. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  8. Predicting stillbirth in a low resource setting.

    PubMed

    Kayode, Gbenga A; Grobbee, Diederick E; Amoakoh-Coleman, Mary; Adeleke, Ibrahim Taiwo; Ansah, Evelyn; de Groot, Joris A H; Klipstein-Grobusch, Kerstin

    2016-09-20

    Stillbirth is a major contributor to perinatal mortality and it is particularly common in low- and middle-income countries, where annually about three million stillbirths occur in the third trimester. This study aims to develop a prediction model for early detection of pregnancies at high risk of stillbirth. This retrospective cohort study examined 6,573 pregnant women who delivered at Federal Medical Centre Bida, a tertiary level of healthcare in Nigeria from January 2010 to December 2013. Descriptive statistics were performed and missing data imputed. Multivariable logistic regression was applied to examine the associations between selected candidate predictors and stillbirth. Discrimination and calibration were used to assess the model's performance. The prediction model was validated internally and over-optimism was corrected. We developed a prediction model for stillbirth that comprised maternal comorbidity, place of residence, maternal occupation, parity, bleeding in pregnancy, and fetal presentation. As a secondary analysis, we extended the model by including fetal growth rate as a predictor, to examine how beneficial ultrasound parameters would be for the predictive performance of the model. After internal validation, both calibration and discriminative performance of both the basic and extended model were excellent (i.e. C-statistic basic model = 0.80 (95 % CI 0.78-0.83) and extended model = 0.82 (95 % CI 0.80-0.83)). We developed a simple but informative prediction model for early detection of pregnancies with a high risk of stillbirth for early intervention in a low resource setting. Future research should focus on external validation of the performance of this promising model.

  9. Inflationary tensor fossils in large-scale structure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dimastrogiovanni, Emanuela; Fasiello, Matteo; Jeong, Donghui

    Inflation models make specific predictions for a tensor-scalar-scalar three-point correlation, or bispectrum, between one gravitational-wave (tensor) mode and two density-perturbation (scalar) modes. This tensor-scalar-scalar correlation leads to a local power quadrupole, an apparent departure from statistical isotropy in our Universe, as well as characteristic four-point correlations in the current mass distribution in the Universe. So far, the predictions for these observables have been worked out only for single-clock models in which certain consistency conditions between the tensor-scalar-scalar correlation and tensor and scalar power spectra are satisfied. Here we review the requirements on inflation models for these consistency conditions to bemore » satisfied. We then consider several examples of inflation models, such as non-attractor and solid-inflation models, in which these conditions are put to the test. In solid inflation the simplest consistency conditions are already violated whilst in the non-attractor model we find that, contrary to the standard scenario, the tensor-scalar-scalar correlator probes directly relevant model-dependent information. We work out the predictions for observables in these models. For non-attractor inflation we find an apparent local quadrupolar departure from statistical isotropy in large-scale structure but that this power quadrupole decreases very rapidly at smaller scales. The consistency of the CMB quadrupole with statistical isotropy then constrains the distance scale that corresponds to the transition from the non-attractor to attractor phase of inflation to be larger than the currently observable horizon. Solid inflation predicts clustering fossils signatures in the current galaxy distribution that may be large enough to be detectable with forthcoming, and possibly even current, galaxy surveys.« less

  10. Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

    PubMed

    Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean

    2010-11-01

    Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

  11. Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.

    PubMed

    Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P

    2010-01-01

    In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.

  12. Risk models for post-endoscopic retrograde cholangiopancreatography pancreatitis (PEP): smoking and chronic liver disease are predictors of protection against PEP.

    PubMed

    DiMagno, Matthew J; Spaete, Joshua P; Ballard, Darren D; Wamsteker, Erik-Jan; Saini, Sameer D

    2013-08-01

    We investigated which variables independently associated with protection against or development of postendoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) and severity of PEP. Subsequently, we derived predictive risk models for PEP. In a case-control design, 6505 patients had 8264 ERCPs, 211 patients had PEP, and 22 patients had severe PEP. We randomly selected 348 non-PEP controls. We examined 7 established- and 9 investigational variables. In univariate analysis, 7 variables predicted PEP: younger age, female sex, suspected sphincter of Oddi dysfunction (SOD), pancreatic sphincterotomy, moderate-difficult cannulation (MDC), pancreatic stent placement, and lower Charlson score. Protective variables were current smoking, former drinking, diabetes, and chronic liver disease (CLD, biliary/transplant complications). Multivariate analysis identified seven independent variables for PEP, three protective (current smoking, CLD-biliary, CLD-transplant/hepatectomy complications) and 4 predictive (younger age, suspected SOD, pancreatic sphincterotomy, MDC). Pre- and post-ERCP risk models of 7 variables have a C-statistic of 0.74. Removing age (seventh variable) did not significantly affect the predictive value (C-statistic of 0.73) and reduced model complexity. Severity of PEP did not associate with any variables by multivariate analysis. By using the newly identified protective variables with 3 predictive variables, we derived 2 risk models with a higher predictive value for PEP compared to prior studies.

  13. Clinical trials with velnacrine: (PROPP) the physician reference of predicted probabilities--a statistical model for the estimation of hepatotoxicity risk with velnacrine maleate.

    PubMed

    Hardiman, S; Miller, K; Murphy, M

    1993-01-01

    Safety observations during the clinical development of Mentane (velnacrine maleate) have included the occurrence of generally asymptomatic liver enzyme elevations confined to patients with Alzheimer's disease (AD). The clinical presentation of this reversible hepatocellular injury is analogous to that reported for tetrahydroaminoacridine (THA). Direct liver injury, possibly associated with the production of a toxic metabolite, would be consistent with reports of aberrant xenobiotic metabolism in Alzheimer's disease patients. Since a patient related aberration in drug metabolism was suspected, a biostatistical strategy was developed with the objective of predicting hepatotoxicity in individual patients prior to exposure to velnacrine maleate. The method used logistic regression techniques with variable selection restricted to those items which could be routinely and inexpensively accessed at screen evaluation for potential candidates for treatment. The model was to be predictive (a marker for eventual hepatotoxicity) rather than a causative model, and techniques employed "goodness of fit", percentage correct, and positive and negative predictive values. On the basis of demographic and baseline laboratory data from 942 patients, the PROPP statistic was developed (the Physician Reference Of Predicted Probabilities). Main effect variables included age, gender, and nine hematological and serum chemistry variables. The sensitivity of the current model is approximately 49%, specificity approximately 88%. Using prior probability estimates, however, in which the patient's likelihood of liver toxicity is presumed to be at least 30%, the positive predictive value ranged between 64-77%. Although the clinical utility of this statistic will require refinements and additional prospective confirmation, its potential existence speaks to the possibility of markers for idiosyncratic drug metabolism in patients with Alzheimer's disease.

  14. Predicting the stability of nanodevices

    NASA Astrophysics Data System (ADS)

    Lin, Z. Z.; Yu, W. F.; Wang, Y.; Ning, X. J.

    2011-05-01

    A simple model based on the statistics of single atoms is developed to predict the stability or lifetime of nanodevices without empirical parameters. Under certain conditions, the model produces the Arrhenius law and the Meyer-Neldel compensation rule. Compared with the classical molecular-dynamics simulations for predicting the stability of monatomic carbon chain at high temperature, the model is proved to be much more accurate than the transition state theory. Based on the ab initio calculation of the static potential, the model can give out a corrected lifetime of monatomic carbon and gold chains at higher temperature, and predict that the monatomic chains are very stable at room temperature.

  15. Deformable image registration as a tool to improve survival prediction after neoadjuvant chemotherapy for breast cancer: results from the ACRIN 6657/I-SPY-1 trial

    NASA Astrophysics Data System (ADS)

    Jahani, Nariman; Cohen, Eric; Hsieh, Meng-Kang; Weinstein, Susan P.; Pantalone, Lauren; Davatzikos, Christos; Kontos, Despina

    2018-02-01

    We examined the ability of DCE-MRI longitudinal features to give early prediction of recurrence-free survival (RFS) in women undergoing neoadjuvant chemotherapy for breast cancer, in a retrospective analysis of 106 women from the ISPY 1 cohort. These features were based on the voxel-wise changes seen in registered images taken before treatment and after the first round of chemotherapy. We computed the transformation field using a robust deformable image registration technique to match breast images from these two visits. Using the deformation field, parametric response maps (PRM) — a voxel-based feature analysis of longitudinal changes in images between visits — was computed for maps of four kinetic features (signal enhancement ratio, peak enhancement, and wash-in/wash-out slopes). A two-level discrete wavelet transform was applied to these PRMs to extract heterogeneity information about tumor change between visits. To estimate survival, a Cox proportional hazard model was applied with the C statistic as the measure of success in predicting RFS. The best PRM feature (as determined by C statistic in univariable analysis) was determined for each of the four kinetic features. The baseline model, incorporating functional tumor volume, age, race, and hormone response status, had a C statistic of 0.70 in predicting RFS. The model augmented with the four PRM features had a C statistic of 0.76. Thus, our results suggest that adding information on the texture of voxel-level changes in tumor kinetic response between registered images of first and second visits could improve early RFS prediction in breast cancer after neoadjuvant chemotherapy.

  16. Improving satellite-driven PM2.5 models with Moderate Resolution Imaging Spectroradiometer fire counts in the southeastern U.S.

    PubMed

    Hu, Xuefei; Waller, Lance A; Lyapustin, Alexei; Wang, Yujie; Liu, Yang

    2014-10-16

    Multiple studies have developed surface PM 2.5 (particle size less than 2.5 µm in aerodynamic diameter) prediction models using satellite-derived aerosol optical depth as the primary predictor and meteorological and land use variables as secondary variables. To our knowledge, satellite-retrieved fire information has not been used for PM 2.5 concentration prediction in statistical models. Fire data could be a useful predictor since fires are significant contributors of PM 2.5 . In this paper, we examined whether remotely sensed fire count data could improve PM 2.5 prediction accuracy in the southeastern U.S. in a spatial statistical model setting. A sensitivity analysis showed that when the radius of the buffer zone centered at each PM 2.5 monitoring site reached 75 km, fire count data generally have the greatest predictive power of PM 2.5 across the models considered. Cross validation (CV) generated an R 2 of 0.69, a mean prediction error of 2.75 µg/m 3 , and root-mean-square prediction errors (RMSPEs) of 4.29 µg/m 3 , indicating a good fit between the dependent and predictor variables. A comparison showed that the prediction accuracy was improved more substantially from the nonfire model to the fire model at sites with higher fire counts. With increasing fire counts, CV RMSPE decreased by values up to 1.5 µg/m 3 , exhibiting a maximum improvement of 13.4% in prediction accuracy. Fire count data were shown to have better performance in southern Georgia and in the spring season due to higher fire occurrence. Our findings indicate that fire count data provide a measurable improvement in PM 2.5 concentration estimation, especially in areas and seasons prone to fire events.

  17. Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2013-09-01

    The research aims to develop global modeling tools capable of categorizing structurally diverse chemicals in various toxicity classes according to the EEC and European Community directives, and to predict their acute toxicity in fathead minnow using set of selected molecular descriptors. Accordingly, artificial intelligence approach based classification and regression models, such as probabilistic neural networks (PNN), generalized regression neural networks (GRNN), multilayer perceptron neural network (MLPN), radial basis function neural network (RBFN), support vector machines (SVM), gene expression programming (GEP), and decision tree (DT) were constructed using the experimental toxicity data. Diversity and non-linearity in the chemicals' data were tested using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. Predictive and generalization abilities of various models constructed here were compared using several statistical parameters. PNN and GRNN models performed relatively better than MLPN, RBFN, SVM, GEP, and DT. Both in two and four category classifications, PNN yielded a considerably high accuracy of classification in training (95.85 percent and 90.07 percent) and validation data (91.30 percent and 86.96 percent), respectively. GRNN rendered a high correlation between the measured and model predicted -log LC50 values both for the training (0.929) and validation (0.910) data and low prediction errors (RMSE) of 0.52 and 0.49 for two sets. Efficiency of the selected PNN and GRNN models in predicting acute toxicity of new chemicals was adequately validated using external datasets of different fish species (fathead minnow, bluegill, trout, and guppy). The PNN and GRNN models showed good predictive and generalization abilities and can be used as tools for predicting toxicities of structurally diverse chemical compounds. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Improving satellite-driven PM2.5 models with Moderate Resolution Imaging Spectroradiometer fire counts in the southeastern U.S

    PubMed Central

    Hu, Xuefei; Waller, Lance A.; Lyapustin, Alexei; Wang, Yujie; Liu, Yang

    2017-01-01

    Multiple studies have developed surface PM2.5 (particle size less than 2.5 µm in aerodynamic diameter) prediction models using satellite-derived aerosol optical depth as the primary predictor and meteorological and land use variables as secondary variables. To our knowledge, satellite-retrieved fire information has not been used for PM2.5 concentration prediction in statistical models. Fire data could be a useful predictor since fires are significant contributors of PM2.5. In this paper, we examined whether remotely sensed fire count data could improve PM2.5 prediction accuracy in the southeastern U.S. in a spatial statistical model setting. A sensitivity analysis showed that when the radius of the buffer zone centered at each PM2.5 monitoring site reached 75 km, fire count data generally have the greatest predictive power of PM2.5 across the models considered. Cross validation (CV) generated an R2 of 0.69, a mean prediction error of 2.75 µg/m3, and root-mean-square prediction errors (RMSPEs) of 4.29 µg/m3, indicating a good fit between the dependent and predictor variables. A comparison showed that the prediction accuracy was improved more substantially from the nonfire model to the fire model at sites with higher fire counts. With increasing fire counts, CV RMSPE decreased by values up to 1.5 µg/m3, exhibiting a maximum improvement of 13.4% in prediction accuracy. Fire count data were shown to have better performance in southern Georgia and in the spring season due to higher fire occurrence. Our findings indicate that fire count data provide a measurable improvement in PM2.5 concentration estimation, especially in areas and seasons prone to fire events. PMID:28967648

  19. Quantifying geological uncertainty for flow and transport modeling in multi-modal heterogeneous formations

    NASA Astrophysics Data System (ADS)

    Feyen, Luc; Caers, Jef

    2006-06-01

    In this work, we address the problem of characterizing the heterogeneity and uncertainty of hydraulic properties for complex geological settings. Hereby, we distinguish between two scales of heterogeneity, namely the hydrofacies structure and the intrafacies variability of the hydraulic properties. We employ multiple-point geostatistics to characterize the hydrofacies architecture. The multiple-point statistics are borrowed from a training image that is designed to reflect the prior geological conceptualization. The intrafacies variability of the hydraulic properties is represented using conventional two-point correlation methods, more precisely, spatial covariance models under a multi-Gaussian spatial law. We address the different levels and sources of uncertainty in characterizing the subsurface heterogeneity, and explore their effect on groundwater flow and transport predictions. Typically, uncertainty is assessed by way of many images, termed realizations, of a fixed statistical model. However, in many cases, sampling from a fixed stochastic model does not adequately represent the space of uncertainty. It neglects the uncertainty related to the selection of the stochastic model and the estimation of its input parameters. We acknowledge the uncertainty inherent in the definition of the prior conceptual model of aquifer architecture and in the estimation of global statistics, anisotropy, and correlation scales. Spatial bootstrap is used to assess the uncertainty of the unknown statistical parameters. As an illustrative example, we employ a synthetic field that represents a fluvial setting consisting of an interconnected network of channel sands embedded within finer-grained floodplain material. For this highly non-stationary setting we quantify the groundwater flow and transport model prediction uncertainty for various levels of hydrogeological uncertainty. Results indicate the importance of accurately describing the facies geometry, especially for transport predictions.

  20. Current Risk Adjustment and Comorbidity Index Underperformance in Predicting Post-Acute Utilization and Hospital Readmissions After Joint Replacements: Implications for Comprehensive Care for Joint Replacement Model.

    PubMed

    Kumar, Amit; Karmarkar, Amol; Downer, Brian; Vashist, Amit; Adhikari, Deepak; Al Snih, Soham; Ottenbacher, Kenneth

    2017-11-01

    To compare the performances of 3 comorbidity indices, the Charlson Comorbidity Index, the Elixhauser Comorbidity Index, and the Centers for Medicare & Medicaid Services (CMS) risk adjustment model, Hierarchical Condition Category (HCC), in predicting post-acute discharge settings and hospital readmission for patients after joint replacement. A retrospective study of Medicare beneficiaries with total knee replacement (TKR) or total hip replacement (THR) discharged from hospitals in 2009-2011 (n = 607,349) was performed. Study outcomes were post-acute discharge setting and unplanned 30-, 60-, and 90-day hospital readmissions. Logistic regression models were built to compare the performance of the 3 comorbidity indices using C statistics. The base model included patient demographics and hospital use. Subsequent models included 1 of the 3 comorbidity indices. Additional multivariable logistic regression models were built to identify individual comorbid conditions associated with high risk of hospital readmissions. The 30-, 60-, and 90-day unplanned hospital readmission rates were 5.3%, 7.2%, and 8.5%, respectively. Patients were most frequently discharged to home health (46.3%), followed by skilled nursing facility (40.9%) and inpatient rehabilitation facility (12.7%). The C statistics for the base model in predicting post-acute discharge setting and 30-, 60-, and 90-day readmission in TKR and THR were between 0.63 and 0.67. Adding the Charlson Comorbidity Index, the Elixhauser Comorbidity Index, or HCC increased the C statistic minimally from the base model for predicting both discharge settings and hospital readmission. The health conditions most frequently associated with hospital readmission were diabetes mellitus, pulmonary disease, arrhythmias, and heart disease. The comorbidity indices and CMS-HCC demonstrated weak discriminatory ability to predict post-acute discharge settings and hospital readmission following joint replacement. © 2017, American College of Rheumatology.

  1. Risk prediction score for death of traumatised and injured children

    PubMed Central

    2014-01-01

    Background Injury prediction scores facilitate the development of clinical management protocols to decrease mortality. However, most of the previously developed scores are limited in scope and are non-specific for use in children. We aimed to develop and validate a risk prediction model of death for injured and Traumatised Thai children. Methods Our cross-sectional study included 43,516 injured children from 34 emergency services. A risk prediction model was derived using a logistic regression analysis that included 15 predictors. Model performance was assessed using the concordance statistic (C-statistic) and the observed per expected (O/E) ratio. Internal validation of the model was performed using a 200-repetition bootstrap analysis. Results Death occurred in 1.7% of the injured children (95% confidence interval [95% CI]: 1.57–1.82). Ten predictors (i.e., age, airway intervention, physical injury mechanism, three injured body regions, the Glasgow Coma Scale, and three vital signs) were significantly associated with death. The C-statistic and the O/E ratio were 0.938 (95% CI: 0.929–0.947) and 0.86 (95% CI: 0.70–1.02), respectively. The scoring scheme classified three risk stratifications with respective likelihood ratios of 1.26 (95% CI: 1.25–1.27), 2.45 (95% CI: 2.42–2.52), and 4.72 (95% CI: 4.57–4.88) for low, intermediate, and high risks of death. Internal validation showed good model performance (C-statistic = 0.938, 95% CI: 0.926–0.952) and a small calibration bias of 0.002 (95% CI: 0.0005–0.003). Conclusions We developed a simplified Thai pediatric injury death prediction score with satisfactory calibrated and discriminative performance in emergency room settings. PMID:24575982

  2. A statistical model for predicting the inter-annual variability of birch pollen abundance in Northern and North-Eastern Europe.

    PubMed

    Ritenberga, Olga; Sofiev, Mikhail; Siljamo, Pilvi; Saarto, Annika; Dahl, Aslog; Ekebom, Agneta; Sauliene, Ingrida; Shalaboda, Valentina; Severova, Elena; Hoebeke, Lucie; Ramfjord, Hallvard

    2018-02-15

    The paper suggests a methodology for predicting next-year seasonal pollen index (SPI, a sum of daily-mean pollen concentrations) over large regions and demonstrates its performance for birch in Northern and North-Eastern Europe. A statistical model is constructed using meteorological, geophysical and biological characteristics of the previous year). A cluster analysis of multi-annual data of European Aeroallergen Network (EAN) revealed several large regions in Europe, where the observed SPI exhibits similar patterns of the multi-annual variability. We built the model for the northern cluster of stations, which covers Finland, Sweden, Baltic States, part of Belarus, and, probably, Russia and Norway, where the lack of data did not allow for conclusive analysis. The constructed model was capable of predicting the SPI with correlation coefficient reaching up to 0.9 for some stations, odds ratio is infinitely high for 50% of sites inside the region and the fraction of prediction falling within factor of 2 from observations, stays within 40-70%. In particular, model successfully reproduced both the bi-annual cycle of the SPI and years when this cycle breaks down. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Hyperspectral Imaging in Tandem with R Statistics and Image Processing for Detection and Visualization of pH in Japanese Big Sausages Under Different Storage Conditions.

    PubMed

    Feng, Chao-Hui; Makino, Yoshio; Yoshimura, Masatoshi; Thuyet, Dang Quoc; García-Martín, Juan Francisco

    2018-02-01

    The potential of hyperspectral imaging with wavelengths of 380 to 1000 nm was used to determine the pH of cooked sausages after different storage conditions (4 °C for 1 d, 35 °C for 1, 3, and 5 d). The mean spectra of the sausages were extracted from the hyperspectral images and partial least squares regression (PLSR) model was developed to relate spectral profiles with the pH of the cooked sausages. Eleven important wavelengths were selected based on the regression coefficient values. The PLSR model established using the optimal wavelengths showed good precision being the prediction coefficient of determination (R p 2 ) 0.909 and the root mean square error of prediction 0.035. The prediction map for illustrating pH indices in sausages was for the first time developed by R statistics. The overall results suggested that hyperspectral imaging combined with PLSR and R statistics are capable to quantify and visualize the sausages pH evolution under different storage conditions. In this paper, hyperspectral imaging is for the first time used to detect pH in cooked sausages using R statistics, which provides another useful information for the researchers who do not have the access to Matlab. Eleven optimal wavelengths were successfully selected, which were used for simplifying the PLSR model established based on the full wavelengths. This simplified model achieved a high R p 2 (0.909) and a low root mean square error of prediction (0.035), which can be useful for the design of multispectral imaging systems. © 2017 Institute of Food Technologists®.

  4. Proposal for a New Predictive Model of Short-Term Mortality After Living Donor Liver Transplantation due to Acute Liver Failure.

    PubMed

    Chung, Hyun Sik; Lee, Yu Jung; Jo, Yun Sung

    2017-02-21

    BACKGROUND Acute liver failure (ALF) is known to be a rapidly progressive and fatal disease. Various models which could help to estimate the post-transplant outcome for ALF have been developed; however, none of them have been proved to be the definitive predictive model of accuracy. We suggest a new predictive model, and investigated which model has the highest predictive accuracy for the short-term outcome in patients who underwent living donor liver transplantation (LDLT) due to ALF. MATERIAL AND METHODS Data from a total 88 patients were collected retrospectively. King's College Hospital criteria (KCH), Child-Turcotte-Pugh (CTP) classification, and model for end-stage liver disease (MELD) score were calculated. Univariate analysis was performed, and then multivariate statistical adjustment for preoperative variables of ALF prognosis was performed. A new predictive model was developed, called the MELD conjugated serum phosphorus model (MELD-p). The individual diagnostic accuracy and cut-off value of models in predicting 3-month post-transplant mortality were evaluated using the area under the receiver operating characteristic curve (AUC). The difference in AUC between MELD-p and the other models was analyzed. The diagnostic improvement in MELD-p was assessed using the net reclassification improvement (NRI) and integrated discrimination improvement (IDI). RESULTS The MELD-p and MELD scores had high predictive accuracy (AUC >0.9). KCH and serum phosphorus had an acceptable predictive ability (AUC >0.7). The CTP classification failed to show discriminative accuracy in predicting 3-month post-transplant mortality. The difference in AUC between MELD-p and the other models had statistically significant associations with CTP and KCH. The cut-off value of MELD-p was 3.98 for predicting 3-month post-transplant mortality. The NRI was 9.9% and the IDI was 2.9%. CONCLUSIONS MELD-p score can predict 3-month post-transplant mortality better than other scoring systems after LDLT due to ALF. The recommended cut-off value of MELD-p is 3.98.

  5. Linking Field and Satellite Observations to Reveal Differences in Single vs. Double-Cropped Soybean Yields in Central Brazil

    NASA Astrophysics Data System (ADS)

    Jeffries, G. R.; Cohn, A.

    2016-12-01

    Soy-corn double cropping (DC) has been widely adopted in Central Brazil alongside single cropped (SC) soybean production. DC involves different cropping calendars, soy varieties, and may be associated with different crop yield patterns and volatility than SC. Study of the performance of the region's agriculture in a changing climate depends on tracking differences in the productivity of SC vs. DC, but has been limited by crop yield data that conflate the two systems. We predicted SC and DC yields across Central Brazil, drawing on field observations and remotely sensed data. We first modeled field yield estimates as a function of remotely sensed DC status and vegetation index (VI) metrics, and other management and biophysical factors. We then used the statistical model estimated to predict SC and DC soybean yields at each 500 m2 grid cell of Central Brazil for harvest years 2001 - 2015. The yield estimation model was constructed using 1) a repeated cross-sectional survey of soybean yields and management factors for years 2007-2015, 2) a custom agricultural land cover classification dataset which assimilates earlier datasets for the region, and 3) 500m 8-day MODIS image composites used to calculate the wide dynamic range vegetation index (WDRVI) and derivative metrics such as area under the curve for WDRVI values in critical crop development periods. A statistical yield estimation model which primarily entails WDRVI metrics, DC status, and spatial fixed effects was developed on a subset of the yield dataset. Model validation was conducted by predicting previously withheld yield records, and then assessing error and goodness-of-fit for predicted values with metrics including root mean squared error (RMSE), mean squared error (MSE), and R2. We found a statistical yield estimation model which incorporates WDRVI and DC status to be way to estimate crop yields over the region. Statistical properties of the resulting gridded yield dataset may be valuable for understanding linkages between crop yields, farm management factors, and climate.

  6. Calibration transfer of a Raman spectroscopic quantification method for the assessment of liquid detergent compositions from at-line laboratory to in-line industrial scale.

    PubMed

    Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T

    2018-03-01

    Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Statistical models to predict type 2 diabetes remission after bariatric surgery.

    PubMed

    Ramos-Levi, Ana M; Matia, Pilar; Cabrerizo, Lucio; Barabash, Ana; Sanchez-Pernaute, Andres; Calle-Pascual, Alfonso L; Torres, Antonio J; Rubio, Miguel A

    2014-09-01

    Type 2 diabetes (T2D) remission may be achieved after bariatric surgery (BS), but rates vary according to patients' baseline characteristics. The present study evaluates the relevance of several preoperative factors and develops statistical models to predict T2D remission 1 year after BS. We retrospectively studied 141 patients (57.4% women), with a preoperative diagnosis of T2D, who underwent BS in a single center (2006-2011). Anthropometric and glucose metabolism parameters before surgery and at 1-year follow-up were recorded. Remission of T2D was defined according to consensus criteria: HbA1c <6%, fasting glucose (FG) <100 mg/dL, absence of pharmacologic treatment. The influence of several preoperative factors was explored and different statistical models to predict T2D remission were elaborated using logistic regression analysis. Three preoperative characteristics considered individually were identified as the most powerful predictors of T2D remission: C-peptide (R2  = 0.249; odds ratio [OR] 1.652, 95% confidence interval [CI] 1.181-2.309; P = 0.003), T2D duration (R2  = 0.197; OR 0.869, 95% CI 0.808-0.935; P < 0.001), and previous insulin therapy (R2  = 0.165; OR 4.670, 95% CI 2.257-9.665; P < 0.001). High C-peptide levels, a shorter duration of T2D, and the absence of insulin therapy favored remission. Different multivariate logistic regression models were designed. When considering sex, T2D duration, and insulin treatment, remission was correctly predicted in 72.4% of cases. The model that included age, FG and C-peptide levels resulted in 83.7% correct classifications. When sex, FG, C-peptide, insulin treatment, and percentage weight loss were considered, correct classification of T2D remission was achieved in 95.9% of cases. Preoperative characteristics determine T2D remission rates after BS to different extents. The use of statistical models may help clinicians reliably predict T2D remission rates after BS. © 2014 Ruijin Hospital, Shanghai Jiaotong University School of Medicine and Wiley Publishing Asia Pty Ltd.

  8. Validation of the measure automobile emissions model : a statistical analysis

    DOT National Transportation Integrated Search

    2000-09-01

    The Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model provides an external validation capability for hot stabilized option; the model is one of several new modal emissions models designed to predict hot stabilized e...

  9. Transfer Student Success: Educationally Purposeful Activities Predictive of Undergraduate GPA

    ERIC Educational Resources Information Center

    Fauria, Renee M.; Fuller, Matthew B.

    2015-01-01

    Researchers evaluated the effects of Educationally Purposeful Activities (EPAs) on transfer and nontransfer students' cumulative GPAs. Hierarchical, linear, and multiple regression models yielded seven statistically significant educationally purposeful items that influenced undergraduate student GPAs. Statistically significant positive EPAs for…

  10. Principal component analysis in construction of 3D human knee joint models using a statistical shape model method.

    PubMed

    Tsai, Tsung-Yuan; Li, Jing-Sheng; Wang, Shaobai; Li, Pingyue; Kwon, Young-Min; Li, Guoan

    2015-01-01

    The statistical shape model (SSM) method that uses 2D images of the knee joint to predict the three-dimensional (3D) joint surface model has been reported in the literature. In this study, we constructed a SSM database using 152 human computed tomography (CT) knee joint models, including the femur, tibia and patella and analysed the characteristics of each principal component of the SSM. The surface models of two in vivo knees were predicted using the SSM and their 2D bi-plane fluoroscopic images. The predicted models were compared to their CT joint models. The differences between the predicted 3D knee joint surfaces and the CT image-based surfaces were 0.30 ± 0.81 mm, 0.34 ± 0.79 mm and 0.36 ± 0.59 mm for the femur, tibia and patella, respectively (average ± standard deviation). The computational time for each bone of the knee joint was within 30 s using a personal computer. The analysis of this study indicated that the SSM method could be a useful tool to construct 3D surface models of the knee with sub-millimeter accuracy in real time. Thus, it may have a broad application in computer-assisted knee surgeries that require 3D surface models of the knee.

  11. Optimal population prediction of sandhill crane recruitment based on climate-mediated habitat limitations

    USGS Publications Warehouse

    Gerber, Brian D.; Kendall, William L.; Hooten, Mevin B.; Dubovsky, James A.; Drewien, Roderick C.

    2015-01-01

    Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment.Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression.Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring–summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect.Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond.Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions.

  12. Flexible parametric survival models built on age-specific antimüllerian hormone percentiles are better predictors of menopause.

    PubMed

    Ramezani Tehrani, Fahimeh; Mansournia, Mohammad Ali; Solaymani-Dodaran, Masoud; Steyerberg, Ewout; Azizi, Fereidoun

    2016-06-01

    This study aimed to improve existing prediction models for age at menopause. We identified all reproductive aged women with regular menstrual cycles who met our eligibility criteria (n = 1,015) in the Tehran Lipid and Glucose Study-an ongoing population-based cohort study initiated in 1998. Participants were examined every 3 years and their reproductive histories were recorded. Blood levels of antimüllerian hormone (AMH) were measured at the time of recruitment. Age at menopause was estimated based on serum concentrations of AMH using flexible parametric survival models. The optimum model was selected according to Akaike Information Criteria and the realness of the range of predicted median menopause age. We followed study participants for a median of 9.8 years during which 277 women reached menopause and found that a spline-based proportional odds model including age-specific AMH percentiles as the covariate performed well in terms of statistical criteria and provided the most clinically relevant and realistic predictions. The range of predicted median age at menopause for this model was 47.1 to 55.9 years. For those who reached menopause, the median of the absolute mean difference between actual and predicted age at menopause was 1.9 years (interquartile range 2.9). The model including the age-specific AMH percentiles as the covariate and using proportional odds as its covariate metrics meets all the statistical criteria for the best model and provides the most clinically relevant and realistic predictions for age at menopause for reproductive-aged women.

  13. Vaginal birth after caesarean section prediction models: a UK comparative observational study.

    PubMed

    Mone, Fionnuala; Harrity, Conor; Mackie, Adam; Segurado, Ricardo; Toner, Brenda; McCormick, Timothy R; Currie, Aoife; McAuliffe, Fionnuala M

    2015-10-01

    Primarily, to assess the performance of three statistical models in predicting successful vaginal birth in patients attempting a trial of labour after one previous lower segment caesarean section (TOLAC). The statistically most reliable models were subsequently subjected to validation testing in a local antenatal population. A retrospective observational study was performed with study data collected from the Northern Ireland Maternity Service Database (NIMATs). The study population included all women that underwent a TOLAC (n=385) from 2010 to 2012 in a regional UK obstetric unit. Data was collected from the Northern Ireland Maternity Service Database (NIMATs). Area under the curve (AUC) and correlation analysis was performed. Of the three prediction models evaluated, AUC calculations for the Smith et al., Grobman et al. and Troyer and Parisi Models were 0.74, 0.72 and 0.65, respectively. Using the Smith et al. model, 52% of women had a low risk of caesarean section (CS) (predicted VBAC >72%) and 20% had a high risk of CS (predicted VBAC <60%), of whom 20% and 63% had delivery by CS. The fit between observed and predicted outcome in this study cohort using the Smith et al. and Grobman et al. models were greatest (Chi-square test, p=0.228 and 0.904), validating both within the population. The Smith et al. and Grobman et al. models could potentially be utilized within the UK to provide women with an informed choice when deciding on mode of delivery after a previous CS. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  14. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2014-09-30

    were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were consistent with the Anderson model in that both the data and...column of a random, oceanic waveguide,” J. Acoust. Soc. Am., DOI 10.1121/1.4881925 [published, refereed] Stanton, T.K., Bhatia, S., J. Paramo , and F

  15. Using Predictive Uncertainty Analysis to Assess Hydrologic Model Performance for a Watershed in Oregon

    NASA Astrophysics Data System (ADS)

    Brannan, K. M.; Somor, A.

    2016-12-01

    A variety of statistics are used to assess watershed model performance but these statistics do not directly answer the question: what is the uncertainty of my prediction. Understanding predictive uncertainty is important when using a watershed model to develop a Total Maximum Daily Load (TMDL). TMDLs are a key component of the US Clean Water Act and specify the amount of a pollutant that can enter a waterbody when the waterbody meets water quality criteria. TMDL developers use watershed models to estimate pollutant loads from nonpoint sources of pollution. We are developing a TMDL for bacteria impairments in a watershed in the Coastal Range of Oregon. We setup an HSPF model of the watershed and used the calibration software PEST to estimate HSPF hydrologic parameters and then perform predictive uncertainty analysis of stream flow. We used Monte-Carlo simulation to run the model with 1,000 different parameter sets and assess predictive uncertainty. In order to reduce the chance of specious parameter sets, we accounted for the relationships among parameter values by using mathematically-based regularization techniques and an estimate of the parameter covariance when generating random parameter sets. We used a novel approach to select flow data for predictive uncertainty analysis. We set aside flow data that occurred on days that bacteria samples were collected. We did not use these flows in the estimation of the model parameters. We calculated a percent uncertainty for each flow observation based 1,000 model runs. We also used several methods to visualize results with an emphasis on making the data accessible to both technical and general audiences. We will use the predictive uncertainty estimates in the next phase of our work, simulating bacteria fate and transport in the watershed.

  16. A Statistical Skull Geometry Model for Children 0-3 Years Old

    PubMed Central

    Li, Zhigang; Park, Byoung-Keon; Liu, Weiguo; Zhang, Jinhuan; Reed, Matthew P.; Rupp, Jonathan D.; Hoff, Carrie N.; Hu, Jingwen

    2015-01-01

    Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO). To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0–3 YO population. In this study, head CT scans from fifty-six 0–3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models. PMID:25992998

  17. A statistical skull geometry model for children 0-3 years old.

    PubMed

    Li, Zhigang; Park, Byoung-Keon; Liu, Weiguo; Zhang, Jinhuan; Reed, Matthew P; Rupp, Jonathan D; Hoff, Carrie N; Hu, Jingwen

    2015-01-01

    Head injury is the leading cause of fatality and long-term disability for children. Pediatric heads change rapidly in both size and shape during growth, especially for children under 3 years old (YO). To accurately assess the head injury risks for children, it is necessary to understand the geometry of the pediatric head and how morphologic features influence injury causation within the 0-3 YO population. In this study, head CT scans from fifty-six 0-3 YO children were used to develop a statistical model of pediatric skull geometry. Geometric features important for injury prediction, including skull size and shape, skull thickness and suture width, along with their variations among the sample population, were quantified through a series of image and statistical analyses. The size and shape of the pediatric skull change significantly with age and head circumference. The skull thickness and suture width vary with age, head circumference and location, which will have important effects on skull stiffness and injury prediction. The statistical geometry model developed in this study can provide a geometrical basis for future development of child anthropomorphic test devices and pediatric head finite element models.

  18. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  19. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  20. Which Type of Risk Information to Use for Whom? Moderating Role of Outcome-Relevant Involvement in the Effects of Statistical and Exemplified Risk Information on Risk Perceptions.

    PubMed

    So, Jiyeon; Jeong, Se-Hoon; Hwang, Yoori

    2017-04-01

    The extant empirical research examining the effectiveness of statistical and exemplar-based health information is largely inconsistent. Under the premise that the inconsistency may be due to an unacknowledged moderator (O'Keefe, 2002), this study examined a moderating role of outcome-relevant involvement (Johnson & Eagly, 1989) in the effects of statistical and exemplified risk information on risk perception. Consistent with predictions based on elaboration likelihood model (Petty & Cacioppo, 1984), findings from an experiment (N = 237) concerning alcohol consumption risks showed that statistical risk information predicted risk perceptions of individuals with high, rather than low, involvement, while exemplified risk information predicted risk perceptions of those with low, rather than high, involvement. Moreover, statistical risk information contributed to negative attitude toward drinking via increased risk perception only for highly involved individuals, while exemplified risk information influenced the attitude through the same mechanism only for individuals with low involvement. Theoretical and practical implications for health risk communication are discussed.

  1. Multinomial Logistic Regression Predicted Probability Map To Visualize The Influence Of Socio-Economic Factors On Breast Cancer Occurrence in Southern Karnataka

    NASA Astrophysics Data System (ADS)

    Madhu, B.; Ashok, N. C.; Balasubramanian, S.

    2014-11-01

    Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.

  2. Towards the feasibility of using ultrasound to determine mechanical properties of tissues in a bioreactor.

    PubMed

    Mansour, Joseph M; Gu, Di-Win Marine; Chung, Chen-Yuan; Heebner, Joseph; Althans, Jake; Abdalian, Sarah; Schluchter, Mark D; Liu, Yiying; Welter, Jean F

    2014-10-01

    Our ultimate goal is to non-destructively evaluate mechanical properties of tissue-engineered (TE) cartilage using ultrasound (US). We used agarose gels as surrogates for TE cartilage. Previously, we showed that mechanical properties measured using conventional methods were related to those measured using US, which suggested a way to non-destructively predict mechanical properties of samples with known volume fractions. In this study, we sought to determine whether the mechanical properties of samples, with unknown volume fractions could be predicted by US. Aggregate moduli were calculated for hydrogels as a function of SOS, based on concentration and density using a poroelastic model. The data were used to train a statistical model, which we then used to predict volume fractions and mechanical properties of unknown samples. Young's and storage moduli were measured mechanically. The statistical model generally predicted the Young's moduli in compression to within <10% of their mechanically measured value. We defined positive linear correlations between the aggregate modulus predicted from US and both the storage and Young's moduli determined from mechanical tests. Mechanical properties of hydrogels with unknown volume fractions can be predicted successfully from US measurements. This method has the potential to predict mechanical properties of TE cartilage non-destructively in a bioreactor.

  3. Towards the feasibility of using ultrasound to determine mechanical properties of tissues in a bioreactor

    PubMed Central

    Mansour, Joseph M.; Gu, Di-Win Marine; Chung, Chen-Yuan; Heebner, Joseph; Althans, Jake; Abdalian, Sarah; Schluchter, Mark D.; Liu, Yiying; Welter, Jean F.

    2016-01-01

    Introduction Our ultimate goal is to non-destructively evaluate mechanical properties of tissue-engineered (TE) cartilage using ultrasound (US). We used agarose gels as surrogates for TE cartilage. Previously, we showed that mechanical properties measured using conventional methods were related to those measured using US, which suggested a way to non-destructively predict mechanical properties of samples with known volume fractions. In this study, we sought to determine whether the mechanical properties of samples, with unknown volume fractions could be predicted by US. Methods Aggregate moduli were calculated for hydrogels as a function of SOS, based on concentration and density using a poroelastic model. The data were used to train a statistical model, which we then used to predict volume fractions and mechanical properties of unknown samples. Young's and storage moduli were measured mechanically. Results The statistical model generally predicted the Young's moduli in compression to within < 10% of their mechanically measured value. We defined positive linear correlations between the aggregate modulus predicted from US and both the storage and Young's moduli determined from mechanical tests. Conclusions Mechanical properties of hydrogels with unknown volume fractions can be predicted successfully from US measurements. This method has the potential to predict mechanical properties of TE cartilage non-destructively in a bioreactor. PMID:25092421

  4. Perceptual quality prediction on authentically distorted images using a bag of features approach

    PubMed Central

    Ghadiyaram, Deepti; Bovik, Alan C.

    2017-01-01

    Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models. PMID:28129417

  5. Principal component analysis and neurocomputing-based models for total ozone concentration over different urban regions of India

    NASA Astrophysics Data System (ADS)

    Chattopadhyay, Goutami; Chattopadhyay, Surajit; Chakraborthy, Parthasarathi

    2012-07-01

    The present study deals with daily total ozone concentration time series over four metro cities of India namely Kolkata, Mumbai, Chennai, and New Delhi in the multivariate environment. Using the Kaiser-Meyer-Olkin measure, it is established that the data set under consideration are suitable for principal component analysis. Subsequently, by introducing rotated component matrix for the principal components, the predictors suitable for generating artificial neural network (ANN) for daily total ozone prediction are identified. The multicollinearity is removed in this way. Models of ANN in the form of multilayer perceptron trained through backpropagation learning are generated for all of the study zones, and the model outcomes are assessed statistically. Measuring various statistics like Pearson correlation coefficients, Willmott's indices, percentage errors of prediction, and mean absolute errors, it is observed that for Mumbai and Kolkata the proposed ANN model generates very good predictions. The results are supported by the linearly distributed coordinates in the scatterplots.

  6. Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: A case of ragweed (Ambrosia artemisiifolia L.) distribution in China

    USGS Publications Warehouse

    Hao, Chen; LiJun, Chen; Albright, Thomas P.

    2007-01-01

    Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.

  7. Reliability formulation for the strength and fire endurance of glued-laminated beams

    Treesearch

    D. A. Bender

    A model was developed for predicting the statistical distribution of glued-laminated beam strength and stiffness under normal temperature conditions using available long span modulus of elasticity data, end joint tension test data, and tensile strength data for laminating-grade lumber. The beam strength model predictions compared favorably with test data for glued-...

  8. Kalman filter to update forest cover estimates

    Treesearch

    Raymond L. Czaplewski

    1990-01-01

    The Kalman filter is a statistical estimator that combines a time-series of independent estimates, using a prediction model that describes expected changes in the state of a system over time. An expensive inventory can be updated using model predictions that are adjusted with more recent, but less expensive and precise, monitoring data. The concepts of the Kalman...

  9. Forecasting Performance of Grey Prediction for Education Expenditure and School Enrollment

    ERIC Educational Resources Information Center

    Tang, Hui-Wen Vivian; Yin, Mu-Shang

    2012-01-01

    GM(1,1) and GM(1,1) rolling models derived from grey system theory were estimated using time-series data from projection studies by National Center for Education Statistics (NCES). An out-of-sample forecasting competition between the two grey prediction models and exponential smoothing used by NCES was conducted for education expenditure and…

  10. Prediction of five-year all-cause mortality in Chinese patients with type 2 diabetes mellitus - A population-based retrospective cohort study.

    PubMed

    Wan, Eric Yuk Fai; Fong, Daniel Yee Tak; Fung, Colman Siu Cheung; Yu, Esther Yee Tak; Chin, Weng Yee; Chan, Anca Ka Chun; Lam, Cindy Lo Kuen

    2017-06-01

    This study aimed to develop and validate an all-cause mortality risk prediction model for Chinese primary care patients with type 2 diabetes mellitus(T2DM) in Hong Kong. A population-based retrospective cohort study was conducted on 132,462 Chinese patients who had received public primary care services during 2010. Each gender sample was randomly split on a 2:1 basis into derivation and validation cohorts and was followed-up for a median period of 5years. Gender-specific mortality risk prediction models showing the interaction effect between predictors and age were derived using Cox proportional hazards regression with forward stepwise approach. Developed models were compared with pre-existing models by Harrell's C-statistic and calibration plot using validation cohort. Common predictors of increased mortality risk in both genders included: age; smoking habit; diabetes duration; use of anti-hypertensive agents, insulin and lipid-lowering drugs; body mass index; hemoglobin A1c; systolic blood pressure(BP); total cholesterol to high-density lipoprotein-cholesterol ratio; urine albumin to creatinine ratio(urine ACR); and estimated glomerular filtration rate(eGFR). Prediction models showed better discrimination with Harrell"'s C-statistics of 0.768(males) and 0.782(females) and calibration power from the plots than previously established models. Our newly developed gender-specific models provide a more accurate predicted 5-year mortality risk for Chinese diabetic patients than other established models. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Data-adaptive Harmonic Decomposition and Real-time Prediction of Arctic Sea Ice Extent

    NASA Astrophysics Data System (ADS)

    Kondrashov, Dmitri; Chekroun, Mickael; Ghil, Michael

    2017-04-01

    Decline in the Arctic sea ice extent (SIE) has profound socio-economic implications and is a focus of active scientific research. Of particular interest is prediction of SIE on subseasonal time scales, i.e. from early summer into fall, when sea ice coverage in Arctic reaches its minimum. However, subseasonal forecasting of SIE is very challenging due to the high variability of ocean and atmosphere over Arctic in summer, as well as shortness of observational data and inadequacies of the physics-based models to simulate sea-ice dynamics. The Sea Ice Outlook (SIO) by Sea Ice Prediction Network (SIPN, http://www.arcus.org/sipn) is a collaborative effort to facilitate and improve subseasonal prediction of September SIE by physics-based and data-driven statistical models. Data-adaptive Harmonic Decomposition (DAH) and Multilayer Stuart-Landau Models (MSLM) techniques [Chekroun and Kondrashov, 2017], have been successfully applied to the nonlinear stochastic modeling, as well as retrospective and real-time forecasting of Multisensor Analyzed Sea Ice Extent (MASIE) dataset in key four Arctic regions. In particular, DAH-MSLM predictions outperformed most statistical models and physics-based models in real-time 2016 SIO submissions. The key success factors are associated with DAH ability to disentangle complex regional dynamics of MASIE by data-adaptive harmonic spatio-temporal patterns that reduce the data-driven modeling effort to elemental MSLMs stacked per frequency with fixed and small number of model coefficients to estimate.

  12. Predicting September sea ice: Ensemble skill of the SEARCH Sea Ice Outlook 2008-2013

    NASA Astrophysics Data System (ADS)

    Stroeve, Julienne; Hamilton, Lawrence C.; Bitz, Cecilia M.; Blanchard-Wrigglesworth, Edward

    2014-04-01

    Since 2008, the Study of Environmental Arctic Change Sea Ice Outlook has solicited predictions of September sea-ice extent from the Arctic research community. Individuals and teams employ a variety of modeling, statistical, and heuristic approaches to make these predictions. Viewed as monthly ensembles each with one or two dozen individual predictions, they display a bimodal pattern of success. In years when observed ice extent is near its trend, the median predictions tend to be accurate. In years when the observed extent is anomalous, the median and most individual predictions are less accurate. Statistical analysis suggests that year-to-year variability, rather than methods, dominate the variation in ensemble prediction success. Furthermore, ensemble predictions do not improve as the season evolves. We consider the role of initial ice, atmosphere and ocean conditions, and summer storms and weather in contributing to the challenge of sea-ice prediction.

  13. Fast mean and variance computation of the diffuse sound transmission through finite-sized thick and layered wall and floor systems

    NASA Astrophysics Data System (ADS)

    Decraene, Carolina; Dijckmans, Arne; Reynders, Edwin P. B.

    2018-05-01

    A method is developed for computing the mean and variance of the diffuse field sound transmission loss of finite-sized layered wall and floor systems that consist of solid, fluid and/or poroelastic layers. This is achieved by coupling a transfer matrix model of the wall or floor to statistical energy analysis subsystem models of the adjacent room volumes. The modal behavior of the wall is approximately accounted for by projecting the wall displacement onto a set of sinusoidal lateral basis functions. This hybrid modal transfer matrix-statistical energy analysis method is validated on multiple wall systems: a thin steel plate, a polymethyl methacrylate panel, a thick brick wall, a sandwich panel, a double-leaf wall with poro-elastic material in the cavity, and a double glazing. The predictions are compared with experimental data and with results obtained using alternative prediction methods such as the transfer matrix method with spatial windowing, the hybrid wave based-transfer matrix method, and the hybrid finite element-statistical energy analysis method. These comparisons confirm the prediction accuracy of the proposed method and the computational efficiency against the conventional hybrid finite element-statistical energy analysis method.

  14. STATISTICAL METHODOLOGY FOR ESTIMATING PARAMETERS IN PBPK/PD MODELS

    EPA Science Inventory

    PBPK/PD models are large dynamic models that predict tissue concentration and biological effects of a toxicant before PBPK/PD models can be used in risk assessments in the arena of toxicological hypothesis testing, models allow the consequences of alternative mechanistic hypothes...

  15. Syndromic surveillance models using Web data: the case of scarlet fever in the UK.

    PubMed

    Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel

    2012-03-01

    Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.

  16. QSAR models for anti-malarial activity of 4-aminoquinolines.

    PubMed

    Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

    2014-03-01

    In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.

  17. Model variations in predicting incidence of Plasmodium falciparum malaria using 1998-2007 morbidity and meteorological data from south Ethiopia

    PubMed Central

    2010-01-01

    Background Malaria transmission is complex and is believed to be associated with local climate changes. However, simple attempts to extrapolate malaria incidence rates from averaged regional meteorological conditions have proven unsuccessful. Therefore, the objective of this study was to determine if variations in specific meteorological factors are able to consistently predict P. falciparum malaria incidence at different locations in south Ethiopia. Methods Retrospective data from 42 locations were collected including P. falciparum malaria incidence for the period of 1998-2007 and meteorological variables such as monthly rainfall (all locations), temperature (17 locations), and relative humidity (three locations). Thirty-five data sets qualified for the analysis. Ljung-Box Q statistics was used for model diagnosis, and R squared or stationary R squared was taken as goodness of fit measure. Time series modelling was carried out using Transfer Function (TF) models and univariate auto-regressive integrated moving average (ARIMA) when there was no significant predictor meteorological variable. Results Of 35 models, five were discarded because of the significant value of Ljung-Box Q statistics. Past P. falciparum malaria incidence alone (17 locations) or when coupled with meteorological variables (four locations) was able to predict P. falciparum malaria incidence within statistical significance. All seasonal AIRMA orders were from locations at altitudes above 1742 m. Monthly rainfall, minimum and maximum temperature was able to predict incidence at four, five and two locations, respectively. In contrast, relative humidity was not able to predict P. falciparum malaria incidence. The R squared values for the models ranged from 16% to 97%, with the exception of one model which had a negative value. Models with seasonal ARIMA orders were found to perform better. However, the models for predicting P. falciparum malaria incidence varied from location to location, and among lagged effects, data transformation forms, ARIMA and TF orders. Conclusions This study describes P. falciparum malaria incidence models linked with meteorological data. Variability in the models was principally attributed to regional differences, and a single model was not found that fits all locations. Past P. falciparum malaria incidence appeared to be a superior predictor than meteorology. Future efforts in malaria modelling may benefit from inclusion of non-meteorological factors. PMID:20553590

  18. Developing and Testing a Model to Predict Outcomes of Organizational Change

    PubMed Central

    Gustafson, David H; Sainfort, François; Eichler, Mary; Adams, Laura; Bisognano, Maureen; Steudel, Harold

    2003-01-01

    Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects. Data Sources Experts' subjective assessment data for model development and independent retrospective data on 221 healthcare improvement projects in the United States, Canada, and the Netherlands collected between 1996 and 2000 for validation. Methods A panel of theoretical and practical experts and literature in organizational change were used to identify factors predicting the outcome of improvement efforts. A Bayesian model was developed to estimate probability of successful change using subjective estimates of likelihood ratios and prior odds elicited from the panel of experts. A subsequent retrospective empirical analysis of change efforts in 198 health care organizations was performed to validate the model. Logistic regression and ROC analysis were used to evaluate the model's performance using three alternative definitions of success. Data Collection For the model development, experts' subjective assessments were elicited using an integrative group process. For the validation study, a staff person intimately involved in each improvement project responded to a written survey asking questions about model factors and project outcomes. Results Logistic regression chi-square statistics and areas under the ROC curve demonstrated a high level of model performance in predicting success. Chi-square statistics were significant at the 0.001 level and areas under the ROC curve were greater than 0.84. Conclusions A subjective Bayesian model was effective in predicting the outcome of actual improvement projects. Additional prospective evaluations as well as testing the impact of this model as an intervention are warranted. PMID:12785571

  19. Statistical Modeling of Fire Occurrence Using Data from the Tōhoku, Japan Earthquake and Tsunami.

    PubMed

    Anderson, Dana; Davidson, Rachel A; Himoto, Keisuke; Scawthorn, Charles

    2016-02-01

    In this article, we develop statistical models to predict the number and geographic distribution of fires caused by earthquake ground motion and tsunami inundation in Japan. Using new, uniquely large, and consistent data sets from the 2011 Tōhoku earthquake and tsunami, we fitted three types of models-generalized linear models (GLMs), generalized additive models (GAMs), and boosted regression trees (BRTs). This is the first time the latter two have been used in this application. A simple conceptual framework guided identification of candidate covariates. Models were then compared based on their out-of-sample predictive power, goodness of fit to the data, ease of implementation, and relative importance of the framework concepts. For the ground motion data set, we recommend a Poisson GAM; for the tsunami data set, a negative binomial (NB) GLM or NB GAM. The best models generate out-of-sample predictions of the total number of ignitions in the region within one or two. Prefecture-level prediction errors average approximately three. All models demonstrate predictive power far superior to four from the literature that were also tested. A nonlinear relationship is apparent between ignitions and ground motion, so for GLMs, which assume a linear response-covariate relationship, instrumental intensity was the preferred ground motion covariate because it captures part of that nonlinearity. Measures of commercial exposure were preferred over measures of residential exposure for both ground motion and tsunami ignition models. This may vary in other regions, but nevertheless highlights the value of testing alternative measures for each concept. Models with the best predictive power included two or three covariates. © 2015 Society for Risk Analysis.

  20. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury.

    PubMed

    van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W

    2016-10-01

    Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Development of a funding, cost, and spending model for satellite projects

    NASA Technical Reports Server (NTRS)

    Johnson, Jesse P.

    1989-01-01

    The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.

  2. Filter Tuning Using the Chi-Squared Statistic

    NASA Technical Reports Server (NTRS)

    Lilly-Salkowski, Tyler B.

    2017-01-01

    This paper examines the use of the Chi-square statistic as a means of evaluating filter performance. The goal of the process is to characterize the filter performance in the metric of covariance realism. The Chi-squared statistic is the value calculated to determine the realism of a covariance based on the prediction accuracy and the covariance values at a given point in time. Once calculated, it is the distribution of this statistic that provides insight on the accuracy of the covariance. The process of tuning an Extended Kalman Filter (EKF) for Aqua and Aura support is described, including examination of the measurement errors of available observation types, and methods of dealing with potentially volatile atmospheric drag modeling. Predictive accuracy and the distribution of the Chi-squared statistic, calculated from EKF solutions, are assessed.

  3. Statistical variation in progressive scrambling

    NASA Astrophysics Data System (ADS)

    Clark, Robert D.; Fox, Peter C.

    2004-07-01

    The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, non-parametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis - the deprecated predictivity (Q_s^{ast^2}) and standard error of prediction (SDEP s * ) - that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (Q_0^{ast^2} and SDEP 0 * ) and the sensitivity to perturbation (d q 2/d r yy ' 2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.

  4. Estimation of trabecular bone parameters in children from multisequence MRI using texture-based regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lekadir, Karim, E-mail: karim.lekadir@upf.edu; Hoogendoorn, Corné; Armitage, Paul

    Purpose: This paper presents a statistical approach for the prediction of trabecular bone parameters from low-resolution multisequence magnetic resonance imaging (MRI) in children, thus addressing the limitations of high-resolution modalities such as HR-pQCT, including the significant exposure of young patients to radiation and the limited applicability of such modalities to peripheral bones in vivo. Methods: A statistical predictive model is constructed from a database of MRI and HR-pQCT datasets, to relate the low-resolution MRI appearance in the cancellous bone to the trabecular parameters extracted from the high-resolution images. The description of the MRI appearance is achieved between subjects by usingmore » a collection of feature descriptors, which describe the texture properties inside the cancellous bone, and which are invariant to the geometry and size of the trabecular areas. The predictive model is built by fitting to the training data a nonlinear partial least square regression between the input MRI features and the output trabecular parameters. Results: Detailed validation based on a sample of 96 datasets shows correlations >0.7 between the trabecular parameters predicted from low-resolution multisequence MRI based on the proposed statistical model and the values extracted from high-resolution HRp-QCT. Conclusions: The obtained results indicate the promise of the proposed predictive technique for the estimation of trabecular parameters in children from multisequence MRI, thus reducing the need for high-resolution radiation-based scans for a fragile population that is under development and growth.« less

  5. Comparison of in silico models for prediction of mutagenicity.

    PubMed

    Bakhtyari, Nazanin G; Raitano, Giuseppa; Benfenati, Emilio; Martin, Todd; Young, Douglas

    2013-01-01

    Using a dataset with more than 6000 compounds, the performance of eight quantitative structure activity relationships (QSAR) models was evaluated: ACD/Tox Suite, Absorption, Distribution, Metabolism, Elimination, and Toxicity of chemical substances (ADMET) predictor, Derek, Toxicity Estimation Software Tool (T.E.S.T.), TOxicity Prediction by Komputer Assisted Technology (TOPKAT), Toxtree, CEASAR, and SARpy (SAR in python). In general, the results showed a high level of performance. To have a realistic estimate of the predictive ability, the results for chemicals inside and outside the training set for each model were considered. The effect of applicability domain tools (when available) on the prediction accuracy was also evaluated. The predictive tools included QSAR models, knowledge-based systems, and a combination of both methods. Models based on statistical QSAR methods gave better results.

  6. Evaluating model accuracy for model-based reasoning

    NASA Technical Reports Server (NTRS)

    Chien, Steve; Roden, Joseph

    1992-01-01

    Described here is an approach to automatically assessing the accuracy of various components of a model. In this approach, actual data from the operation of a target system is used to drive statistical measures to evaluate the prediction accuracy of various portions of the model. We describe how these statistical measures of model accuracy can be used in model-based reasoning for monitoring and design. We then describe the application of these techniques to the monitoring and design of the water recovery system of the Environmental Control and Life Support System (ECLSS) of Space Station Freedom.

  7. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensitymore » that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting, and reliably estimates believable prediction errors. For the 50% of the real data sets fit well by both methods, spline and logistic predictions are practically indistinguishable, varying in accuracy by less than 15%. The spline method may be useful when automated prediction across simultaneous assays of numerous proteins must be applied routinely with minimal user intervention.« less

  8. Human Thermal Model Evaluation Using the JSC Human Thermal Database

    NASA Technical Reports Server (NTRS)

    Bue, Grant; Makinen, Janice; Cognata, Thomas

    2012-01-01

    Human thermal modeling has considerable long term utility to human space flight. Such models provide a tool to predict crew survivability in support of vehicle design and to evaluate crew response in untested space environments. It is to the benefit of any such model not only to collect relevant experimental data to correlate it against, but also to maintain an experimental standard or benchmark for future development in a readily and rapidly searchable and software accessible format. The Human thermal database project is intended to do just so; to collect relevant data from literature and experimentation and to store the data in a database structure for immediate and future use as a benchmark to judge human thermal models against, in identifying model strengths and weakness, to support model development and improve correlation, and to statistically quantify a model s predictive quality. The human thermal database developed at the Johnson Space Center (JSC) is intended to evaluate a set of widely used human thermal models. This set includes the Wissler human thermal model, a model that has been widely used to predict the human thermoregulatory response to a variety of cold and hot environments. These models are statistically compared to the current database, which contains experiments of human subjects primarily in air from a literature survey ranging between 1953 and 2004 and from a suited experiment recently performed by the authors, for a quantitative study of relative strength and predictive quality of the models.

  9. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    PubMed

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

  10. Reflexion on linear regression trip production modelling method for ensuring good model quality

    NASA Astrophysics Data System (ADS)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  11. Review of Factors, Methods, and Outcome Definition in Designing Opioid Abuse Predictive Models.

    PubMed

    Alzeer, Abdullah H; Jones, Josette; Bair, Matthew J

    2018-05-01

    Several opioid risk assessment tools are available to prescribers to evaluate opioid analgesic abuse among chronic patients. The objectives of this study are to 1) identify variables available in the literature to predict opioid abuse; 2) explore and compare methods (population, database, and analysis) used to develop statistical models that predict opioid abuse; and 3) understand how outcomes were defined in each statistical model predicting opioid abuse. The OVID database was searched for this study. The search was limited to articles written in English and published from January 1990 to April 2016. This search generated 1,409 articles. Only seven studies and nine models met our inclusion-exclusion criteria. We found nine models and identified 75 distinct variables. Three studies used administrative claims data, and four studies used electronic health record data. The majority, four out of seven articles (six out of nine models), were primarily dependent on the presence or absence of opioid abuse or dependence (ICD-9 diagnosis code) to define opioid abuse. However, two articles used a predefined list of opioid-related aberrant behaviors. We identified variables used to predict opioid abuse from electronic health records and administrative data. Medication variables are the recurrent variables in the articles reviewed (33 variables). Age and gender are the most consistent demographic variables in predicting opioid abuse. Overall, there is similarity in the sampling method and inclusion/exclusion criteria (age, number of prescriptions, follow-up period, and data analysis methods). Intuitive research to utilize unstructured data may increase opioid abuse models' accuracy.

  12. Generalized Polynomial Chaos Based Uncertainty Quantification for Planning MRgLITT Procedures

    PubMed Central

    Fahrenholtz, S.; Stafford, R. J.; Maier, F.; Hazle, J. D.; Fuentes, D.

    2014-01-01

    Purpose A generalized polynomial chaos (gPC) method is used to incorporate constitutive parameter uncertainties within the Pennes representation of bioheat transfer phenomena. The stochastic temperature predictions of the mathematical model are critically evaluated against MR thermometry data for planning MR-guided Laser Induced Thermal Therapies (MRgLITT). Methods Pennes bioheat transfer model coupled with a diffusion theory approximation of laser tissue interaction was implemented as the underlying deterministic kernel. A probabilistic sensitivity study was used to identify parameters that provide the most variance in temperature output. Confidence intervals of the temperature predictions are compared to MR temperature imaging (MRTI) obtained during phantom and in vivo canine (n=4) MRgLITT experiments. The gPC predictions were quantitatively compared to MRTI data using probabilistic linear and temporal profiles as well as 2-D 60 °C isotherms. Results Within the range of physically meaningful constitutive values relevant to the ablative temperature regime of MRgLITT, the sensitivity study indicated that the optical parameters, particularly the anisotropy factor, created the most variance in the stochastic model's output temperature prediction. Further, within the statistical sense considered, a nonlinear model of the temperature and damage dependent perfusion, absorption, and scattering is captured within the confidence intervals of the linear gPC method. Multivariate stochastic model predictions using parameters with the dominant sensitivities show good agreement with experimental MRTI data. Conclusions Given parameter uncertainties and mathematical modeling approximations of the Pennes bioheat model, the statistical framework demonstrates conservative estimates of the therapeutic heating and has potential for use as a computational prediction tool for thermal therapy planning. PMID:23692295

  13. Prediction of Hydrologic Characteristics for Ungauged Catchments to Support Hydroecological Modeling

    NASA Astrophysics Data System (ADS)

    Bond, Nick R.; Kennard, Mark J.

    2017-11-01

    Hydrologic variability is a fundamental driver of ecological processes and species distribution patterns within river systems, yet the paucity of gauges in many catchments means that streamflow data are often unavailable for ecological survey sites. Filling this data gap is an important challenge in hydroecological research. To address this gap, we first test the ability to spatially extrapolate hydrologic metrics calculated from gauged streamflow data to ungauged sites as a function of stream distance and catchment area. Second, we examine the ability of statistical models to predict flow regime metrics based on climate and catchment physiographic variables. Our assessment focused on Australia's largest catchment, the Murray-Darling Basin (MDB). We found that hydrologic metrics were predictable only between sites within ˜25 km of one another. Beyond this, correlations between sites declined quickly. We found less than 40% of fish survey sites from a recent basin-wide monitoring program (n = 777 sites) to fall within this 25 km range, thereby greatly limiting the ability to utilize gauge data for direct spatial transposition of hydrologic metrics to biological survey sites. In contrast, statistical model-based transposition proved effective in predicting ecologically relevant aspects of the flow regime (including metrics describing central tendency, high- and low-flows intermittency, seasonality, and variability) across the entire gauge network (median R2 ˜ 0.54, range 0.39-0.94). Modeled hydrologic metrics thus offer a useful alternative to empirical data when examining biological survey data from ungauged sites. More widespread use of these statistical tools and modeled metrics could expand our understanding of flow-ecology relationships.

  14. Modelling the hydraulic conductivity of porous media using physical-statistical model

    NASA Astrophysics Data System (ADS)

    Usowicz, B.; Usowicz, L. B.; Lipiec, J.

    2009-04-01

    Soils and other porous media can be represented by a pattern (net) of more or less cylindrically interconnected channels. The capillary radius, r can represent an elementary capillary formed in between soil particles in one case, and in another case it can represent a mean hydrodynamic radius. When we view a porous medium as a net of interconnected capillaries, we can apply a statistical approach for the description of the liquid or gas flow. A soil phase is included in the porous medium and its configuration is decisive for pore distribution in this medium and hence, it conditions the course of the water retention curve of this medium. In this work method of estimating hydraulic conductivity of porous media based on physical-statistical model proposed by B. Usowicz is presented. The physical-statistical model considers the pore space as the capillary net. The net of capillary connections is represented by parallel and serial connections of hydraulic resistors in the layer and between the layers, respectively. The polynomial distribution was used in this model to determine probability of the occurrence of a given capillary configuration. The model was calibrated using measured water retention curve and two values of hydraulic conductivity saturated and unsaturated and model parameters were determined. The model was used for predicting hydraulic conductivity as a function of soil water content K(theta). The model was validated by comparing the measured and predicted K data for various soils and other porous media (e.g. sandstone). A good agreement between measured and predicted data was reasonable as indicated by values R2 (>0.9). It was also confirmed that the random variables used for the calculations and model parameters were chosen and selected correctly. The study was funded in part by the Polish Ministry of Science and Higher Education by Grant No. N305 046 31/1707).

  15. Predicting dire outcomes of patients with community acquired pneumonia.

    PubMed

    Cooper, Gregory F; Abraham, Vijoy; Aliferis, Constantin F; Aronis, John M; Buchanan, Bruce G; Caruana, Richard; Fine, Michael J; Janosky, Janine E; Livingston, Gary; Mitchell, Tom; Monti, Stefano; Spirtes, Peter

    2005-10-01

    Community-acquired pneumonia (CAP) is an important clinical condition with regard to patient mortality, patient morbidity, and healthcare resource utilization. The assessment of the likely clinical course of a CAP patient can significantly influence decision making about whether to treat the patient as an inpatient or as an outpatient. That decision can in turn influence resource utilization, as well as patient well being. Predicting dire outcomes, such as mortality or severe clinical complications, is a particularly important component in assessing the clinical course of patients. We used a training set of 1601 CAP patient cases to construct 11 statistical and machine-learning models that predict dire outcomes. We evaluated the resulting models on 686 additional CAP-patient cases. The primary goal was not to compare these learning algorithms as a study end point; rather, it was to develop the best model possible to predict dire outcomes. A special version of an artificial neural network (NN) model predicted dire outcomes the best. Using the 686 test cases, we estimated the expected healthcare quality and cost impact of applying the NN model in practice. The particular, quantitative results of this analysis are based on a number of assumptions that we make explicit; they will require further study and validation. Nonetheless, the general implication of the analysis seems robust, namely, that even small improvements in predictive performance for prevalent and costly diseases, such as CAP, are likely to result in significant improvements in the quality and efficiency of healthcare delivery. Therefore, seeking models with the highest possible level of predictive performance is important. Consequently, seeking ever better machine-learning and statistical modeling methods is of great practical significance.

  16. Deep Flare Net (DeFN) Model for Solar Flare Prediction

    NASA Astrophysics Data System (ADS)

    Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Ishii, M.

    2018-05-01

    We developed a solar flare prediction model using a deep neural network (DNN) named Deep Flare Net (DeFN). This model can calculate the probability of flares occurring in the following 24 hr in each active region, which is used to determine the most likely maximum classes of flares via a binary classification (e.g., ≥M class versus

  17. Identified state-space prediction model for aero-optical wavefronts

    NASA Astrophysics Data System (ADS)

    Faghihi, Azin; Tesch, Jonathan; Gibson, Steve

    2013-07-01

    A state-space disturbance model and associated prediction filter for aero-optical wavefronts are described. The model is computed by system identification from a sequence of wavefronts measured in an airborne laboratory. Estimates of the statistics and flow velocity of the wavefront data are shown and can be computed from the matrices in the state-space model without returning to the original data. Numerical results compare velocity values and power spectra computed from the identified state-space model with those computed from the aero-optical data.

  18. Identifying trends in climate: an application to the cenozoic

    NASA Astrophysics Data System (ADS)

    Richards, Gordon R.

    1998-05-01

    The recent literature on trending in climate has raised several issues, whether trends should be modeled as deterministic or stochastic, whether trends are nonlinear, and the relative merits of statistical models versus models based on physics. This article models trending since the late Cretaceous. This 68 million-year interval is selected because the reliability of tests for trending is critically dependent on the length of time spanned by the data. Two main hypotheses are tested, that the trend has been caused primarily by CO2 forcing, and that it reflects a variety of forcing factors which can be approximated by statistical methods. The CO2 data is obtained from model simulations. Several widely-used statistical models are found to be inadequate. ARIMA methods parameterize too much of the short-term variation, and do not identify low frequency movements. Further, the unit root in the ARIMA process does not predict the long-term path of temperature. Spectral methods also have little ability to predict temperature at long horizons. Instead, the statistical trend is estimated using a nonlinear smoothing filter. Both of these paradigms make it possible to model climate as a cointegrated process, in which temperature can wander quite far from the trend path in the intermediate term, but converges back over longer horizons. Comparing the forecasting properties of the two trend models demonstrates that the optimal forecasting model includes CO2 forcing and a parametric representation of the nonlinear variability in climate.

  19. Statistical analysis and modeling of intermittent transport events in the tokamak scrape-off layer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Johan, E-mail: anderson.johan@gmail.com; Halpern, Federico D.; Ricci, Paolo

    The turbulence observed in the scrape-off-layer of a tokamak is often characterized by intermittent events of bursty nature, a feature which raises concerns about the prediction of heat loads on the physical boundaries of the device. It appears thus necessary to delve into the statistical properties of turbulent physical fields such as density, electrostatic potential, and temperature, focusing on the mathematical expression of tails of the probability distribution functions. The method followed here is to generate statistical information from time-traces of the plasma density stemming from Braginskii-type fluid simulations and check this against a first-principles theoretical model. The analysis ofmore » the numerical simulations indicates that the probability distribution function of the intermittent process contains strong exponential tails, as predicted by the analytical theory.« less

  20. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

    PubMed

    Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

    2017-01-01

    The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable and computationally efficient.

  1. US EPA 2012 Air Quality Fused Surface for the Conterminous U.S. Map Service

    EPA Pesticide Factsheets

    This web service contains a polygon layer that depicts fused air quality predictions for 2012 for census tracts in the conterminous United States. Fused air quality predictions (for ozone and PM2.5) are modeled using a Bayesian space-time downscaling fusion model approach described in a series of three published journal papers: 1) (Berrocal, V., Gelfand, A. E. and Holland, D. M. (2012). Space-time fusion under error in computer model output: an application to modeling air quality. Biometrics 68, 837-848; 2) Berrocal, V., Gelfand, A. E. and Holland, D. M. (2010). A bivariate space-time downscaler under space and time misalignment. The Annals of Applied Statistics 4, 1942-1975; and 3) Berrocal, V., Gelfand, A. E., and Holland, D. M. (2010). A spatio-temporal downscaler for output from numerical models. J. of Agricultural, Biological,and Environmental Statistics 15, 176-197) is used to provide daily, predictive PM2.5 (daily average) and O3 (daily 8-hr maximum) surfaces for 2012. Summer (O3) and annual (PM2.5) means calculated and published. The downscaling fusion model uses both air quality monitoring data from the National Air Monitoring Stations/State and Local Air Monitoring Stations (NAMS/SLAMS) and numerical output from the Models-3/Community Multiscale Air Quality (CMAQ). Currently, predictions at the US census tract centroid locations within the 12 km CMAQ domain are archived. Predictions at the CMAQ grid cell centroids, or any desired set of locations co

  2. Predicting Readmission at Early Hospitalization Using Electronic Clinical Data: An Early Readmission Risk Score.

    PubMed

    Tabak, Ying P; Sun, Xiaowu; Nunez, Carlos M; Gupta, Vikas; Johannes, Richard S

    2017-03-01

    Identifying patients at high risk for readmission early during hospitalization may aid efforts in reducing readmissions. We sought to develop an early readmission risk predictive model using automated clinical data available at hospital admission. We developed an early readmission risk model using a derivation cohort and validated the model with a validation cohort. We used a published Acute Laboratory Risk of Mortality Score as an aggregated measure of clinical severity at admission and the number of hospital discharges in the previous 90 days as a measure of disease progression. We then evaluated the administrative data-enhanced model by adding principal and secondary diagnoses and other variables. We examined the c-statistic change when additional variables were added to the model. There were 1,195,640 adult discharges from 70 hospitals with 39.8% male and the median age of 63 years (first and third quartile: 43, 78). The 30-day readmission rate was 11.9% (n=142,211). The early readmission model yielded a graded relationship of readmission and the Acute Laboratory Risk of Mortality Score and the number of previous discharges within 90 days. The model c-statistic was 0.697 with good calibration. When administrative variables were added to the model, the c-statistic increased to 0.722. Automated clinical data can generate a readmission risk score early at hospitalization with fair discrimination. It may have applied value to aid early care transition. Adding administrative data increases predictive accuracy. The administrative data-enhanced model may be used for hospital comparison and outcome research.

  3. Development and validation of a climate-based ensemble prediction model for West Nile Virus infection rates in Culex mosquitoes, Suffolk County, New York.

    PubMed

    Little, Eliza; Campbell, Scott R; Shaman, Jeffrey

    2016-08-09

    West Nile Virus (WNV) is an endemic public health concern in the United States that produces periodic seasonal epidemics. Underlying these outbreaks is the enzootic cycle of WNV between mosquito vectors and bird hosts. Identifying the key environmental conditions that facilitate and accelerate this cycle can be used to inform effective vector control. Here, we model and forecast WNV infection rates among mosquito vectors in Suffolk County, New York using readily available meteorological and hydrological conditions. We first validate a statistical model built with surveillance data between 2001 and 2009 (m09) and specify a set of new statistical models using surveillance data from 2001 to 2012 (m12). This ensemble of new models is then used to make predictions for 2013-2015, and multimodel inference is employed to provide a formal probabilistic interpretation across the disparate individual model predictions. The findings of the m09 and m12 models align; with the ensemble of m12 models indicating an association between warm, dry early spring (April) conditions and increased annual WNV infection rates in Culex mosquitoes. This study shows that real-time climate information can be used to predict WNV infection rates in Culex mosquitoes prior to its seasonal peak and before WNV spillover transmission risk to humans is greatest.

  4. A statistical method for predicting seizure onset zones from human single-neuron recordings

    NASA Astrophysics Data System (ADS)

    Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.

    2013-02-01

    Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.

  5. The Stochastic predictability limits of GCM internal variability and the Stochastic Seasonal to Interannual Prediction System (StocSIPS)

    NASA Astrophysics Data System (ADS)

    Del Rio Amador, Lenin; Lovejoy, Shaun

    2017-04-01

    Over the past ten years, a key advance in our understanding of atmospheric variability is the discovery that between the weather and climate regime lies an intermediate "macroweather" regime, spanning the range of scales from ≈10 days to ≈30 years. Macroweather statistics are characterized by two fundamental symmetries: scaling and the factorization of the joint space-time statistics. In the time domain, the scaling has low intermittency with the additional property that successive fluctuations tend to cancel. In space, on the contrary the scaling has high (multifractal) intermittency corresponding to the existence of different climate zones. These properties have fundamental implications for macroweather forecasting: a) the temporal scaling implies that the system has a long range memory that can be exploited for forecasting; b) the low temporal intermittency implies that mathematically well-established (Gaussian) forecasting techniques can be used; and c), the statistical factorization property implies that although spatial correlations (including teleconnections) may be large, if long enough time series are available, they are not necessarily useful in improving forecasts. Theoretically, these conditions imply the existence of stochastic predictability limits in our talk, we show that these limits apply to GCM's. Based on these statistical implications, we developed the Stochastic Seasonal and Interannual Prediction System (StocSIPS) for the prediction of temperature from regional to global scales and from one month to many years horizons. One of the main components of StocSIPS is the separation and prediction of both the internal and externally forced variabilities. In order to test the theoretical assumptions and consequences for predictability and predictions, we use 41 different CMIP5 model outputs from preindustrial control runs that have fixed external forcings: whose variability is purely internally generated. We first show that these statistical assumptions hold with relatively good accuracy and then we performed hindcasts at global and regional scales from monthly to annual time resolutions using StocSIPS. We obtained excellent agreement between the hindcast Mean Square Skill Score (MSSS) and the theoretical stochastic limits. We also show the application of StocSIPS to the prediction of average global temperature and compare our results with those obtained using multi-model ensemble approaches. StocSIPS has numerous advantages including a) higher MSSS for large time horizons, b) the from convergence to the real - not model - climate, c) much higher computational speed, d) no need for data assimilation, e) no ad hoc post processing and f) no need for downscaling.

  6. Multiplicative Modeling of Children's Growth and Its Statistical Properties

    NASA Astrophysics Data System (ADS)

    Kuninaka, Hiroto; Matsushita, Mitsugu

    2014-03-01

    We develop a numerical growth model that can predict the statistical properties of the height distribution of Japanese children. Our previous studies have clarified that the height distribution of schoolchildren shows a transition from the lognormal distribution to the normal distribution during puberty. In this study, we demonstrate by simulation that the transition occurs owing to the variability of the onset of puberty.

  7. Assessment of Current Jet Noise Prediction Capabilities

    NASA Technical Reports Server (NTRS)

    Hunter, Craid A.; Bridges, James E.; Khavaran, Abbas

    2008-01-01

    An assessment was made of the capability of jet noise prediction codes over a broad range of jet flows, with the objective of quantifying current capabilities and identifying areas requiring future research investment. Three separate codes in NASA s possession, representative of two classes of jet noise prediction codes, were evaluated, one empirical and two statistical. The empirical code is the Stone Jet Noise Module (ST2JET) contained within the ANOPP aircraft noise prediction code. It is well documented, and represents the state of the art in semi-empirical acoustic prediction codes where virtual sources are attributed to various aspects of noise generation in each jet. These sources, in combination, predict the spectral directivity of a jet plume. A total of 258 jet noise cases were examined on the ST2JET code, each run requiring only fractions of a second to complete. Two statistical jet noise prediction codes were also evaluated, JeNo v1, and Jet3D. Fewer cases were run for the statistical prediction methods because they require substantially more resources, typically a Reynolds-Averaged Navier-Stokes solution of the jet, volume integration of the source statistical models over the entire plume, and a numerical solution of the governing propagation equation within the jet. In the evaluation process, substantial justification of experimental datasets used in the evaluations was made. In the end, none of the current codes can predict jet noise within experimental uncertainty. The empirical code came within 2dB on a 1/3 octave spectral basis for a wide range of flows. The statistical code Jet3D was within experimental uncertainty at broadside angles for hot supersonic jets, but errors in peak frequency and amplitude put it out of experimental uncertainty at cooler, lower speed conditions. Jet3D did not predict changes in directivity in the downstream angles. The statistical code JeNo,v1 was within experimental uncertainty predicting noise from cold subsonic jets at all angles, but did not predict changes with heating of the jet and did not account for directivity changes at supersonic conditions. Shortcomings addressed here give direction for future work relevant to the statistical-based prediction methods. A full report will be released as a chapter in a NASA publication assessing the state of the art in aircraft noise prediction.

  8. Inverse modeling with RZWQM2 to predict water quality

    USDA-ARS?s Scientific Manuscript database

    Agricultural systems models such as RZWQM2 are complex and have numerous parameters that are unknown and difficult to estimate. Inverse modeling provides an objective statistical basis for calibration that involves simultaneous adjustment of model parameters and yields parameter confidence intervals...

  9. Methods for evaluating the predictive accuracy of structural dynamic models

    NASA Technical Reports Server (NTRS)

    Hasselman, Timothy K.; Chrostowski, Jon D.

    1991-01-01

    Modeling uncertainty is defined in terms of the difference between predicted and measured eigenvalues and eigenvectors. Data compiled from 22 sets of analysis/test results was used to create statistical databases for large truss-type space structures and both pretest and posttest models of conventional satellite-type space structures. Modeling uncertainty is propagated through the model to produce intervals of uncertainty on frequency response functions, both amplitude and phase. This methodology was used successfully to evaluate the predictive accuracy of several structures, including the NASA CSI Evolutionary Structure tested at Langley Research Center. Test measurements for this structure were within + one-sigma intervals of predicted accuracy for the most part, demonstrating the validity of the methodology and computer code.

  10. Can multivariate models based on MOAKS predict OA knee pain? Data from the Osteoarthritis Initiative

    NASA Astrophysics Data System (ADS)

    Luna-Gómez, Carlos D.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Galván-Tejada, Carlos E.; Celaya-Padilla, José M.

    2017-03-01

    Osteoarthritis is the most common rheumatic disease in the world. Knee pain is the most disabling symptom in the disease, the prediction of pain is one of the targets in preventive medicine, this can be applied to new therapies or treatments. Using the magnetic resonance imaging and the grading scales, a multivariate model based on genetic algorithms is presented. Using a predictive model can be useful to associate minor structure changes in the joint with the future knee pain. Results suggest that multivariate models can be predictive with future knee chronic pain. All models; T0, T1 and T2, were statistically significant, all p values were < 0.05 and all AUC > 0.60.

  11. Model Update of a Micro Air Vehicle (MAV) Flexible Wing Frame with Uncertainty Quantification

    NASA Technical Reports Server (NTRS)

    Reaves, Mercedes C.; Horta, Lucas G.; Waszak, Martin R.; Morgan, Benjamin G.

    2004-01-01

    This paper describes a procedure to update parameters in the finite element model of a Micro Air Vehicle (MAV) to improve displacement predictions under aerodynamics loads. Because of fabrication, materials, and geometric uncertainties, a statistical approach combined with Multidisciplinary Design Optimization (MDO) is used to modify key model parameters. Static test data collected using photogrammetry are used to correlate with model predictions. Results show significant improvements in model predictions after parameters are updated; however, computed probabilities values indicate low confidence in updated values and/or model structure errors. Lessons learned in the areas of wing design, test procedures, modeling approaches with geometric nonlinearities, and uncertainties quantification are all documented.

  12. Statistical prediction of space motion sickness

    NASA Technical Reports Server (NTRS)

    Reschke, Millard F.

    1990-01-01

    Studies designed to empirically examine the etiology of motion sickness to develop a foundation for enhancing its prediction are discussed. Topics addressed include early attempts to predict space motion sickness, multiple test data base that uses provocative and vestibular function tests, and data base subjects; reliability of provocative tests of motion sickness susceptibility; prediction of space motion sickness using linear discriminate analysis; and prediction of space motion sickness susceptibility using the logistic model.

  13. Conservation Risks: When Will Rhinos be Extinct?

    PubMed

    Haas, Timothy C; Ferreira, Sam M

    2016-08-01

    We develop a risk intelligence system for biodiversity enterprises. Such enterprises depend on a supply of endangered species for their revenue. Many of these enterprises, however, cannot purchase a supply of this resource and are largely unable to secure the resource against theft in the form of poaching. Because replacements are not available once a species becomes extinct, insurance products are not available to reduce the risk exposure of these enterprises to an extinction event. For many species, the dynamics of anthropogenic impacts driven by economic as well as noneconomic values of associated wildlife products along with their ecological stressors can help meaningfully predict extinction risks. We develop an agent/individual-based economic-ecological model that captures these effects and apply it to the case of South African rhinos. Our model uses observed rhino dynamics and poaching statistics. It seeks to predict rhino extinction under the present scenario. This scenario has no legal horn trade, but allows live African rhino trade and legal hunting. Present rhino populations are small and threatened by a rising onslaught of poaching. This present scenario and associated dynamics predicts continued decline in rhino population size with accelerated extinction risks of rhinos by 2036. Our model supports the computation of extinction risks at any future time point. This capability can be used to evaluate the effectiveness of proposed conservation strategies at reducing a species' extinction risk. Models used to compute risk predictions, however, need to be statistically estimated. We point out that statistically fitting such models to observations will involve massive numbers of observations on consumer behavior and time-stamped location observations on thousands of animals. Finally, we propose Big Data algorithms to perform such estimates and to interpret the fitted model's output.

  14. Sound transmission loss of composite sandwich panels

    NASA Astrophysics Data System (ADS)

    Zhou, Ran

    Light composite sandwich panels are increasingly used in automobiles, ships and aircraft, because of the advantages they offer of high strength-to-weight ratios. However, the acoustical properties of these light and stiff structures can be less desirable than those of equivalent metal panels. These undesirable properties can lead to high interior noise levels. A number of researchers have studied the acoustical properties of honeycomb and foam sandwich panels. Not much work, however, has been carried out on foam-filled honeycomb sandwich panels. In this dissertation, governing equations for the forced vibration of asymmetric sandwich panels are developed. An analytical expression for modal densities of symmetric sandwich panels is derived from a sixth-order governing equation. A boundary element analysis model for the sound transmission loss of symmetric sandwich panels is proposed. Measurements of the modal density, total loss factor, radiation loss factor, and sound transmission loss of foam-filled honeycomb sandwich panels with different configurations and thicknesses are presented. Comparisons between the predicted sound transmission loss values obtained from wave impedance analysis, statistical energy analysis, boundary element analysis, and experimental values are presented. The wave impedance analysis model provides accurate predictions of sound transmission loss for the thin foam-filled honeycomb sandwich panels at frequencies above their first resonance frequencies. The predictions from the statistical energy analysis model are in better agreement with the experimental transmission loss values of the sandwich panels when the measured radiation loss factor values near coincidence are used instead of the theoretical values for single-layer panels. The proposed boundary element analysis model provides more accurate predictions of sound transmission loss for the thick foam-filled honeycomb sandwich panels than either the wave impedance analysis model or the statistical energy analysis model.

  15. Control Theory and Statistical Generalizations.

    ERIC Educational Resources Information Center

    Powers, William T.

    1990-01-01

    Contrasts modeling methods in control theory to the methods of statistical generalizations in empirical studies of human or animal behavior. Presents a computer simulation that predicts behavior based on variables (effort and rewards) determined by the invariable (desired reward). Argues that control theory methods better reflect relationships to…

  16. Epidemiology and Long-term Clinical and Biologic Risk Factors for Pneumonia in Community-Dwelling Older Americans

    PubMed Central

    Alvarez, Karina; Loehr, Laura; Folsom, Aaron R.; Newman, Anne B.; Weissfeld, Lisa A.; Wunderink, Richard G.; Kritchevsky, Stephen B.; Mukamal, Kenneth J.; London, Stephanie J.; Harris, Tamara B.; Bauer, Doug C.; Angus, Derek C.

    2013-01-01

    Background: Preventing pneumonia requires better understanding of incidence, mortality, and long-term clinical and biologic risk factors, particularly in younger individuals. Methods: This was a cohort study in three population-based cohorts of community-dwelling individuals. A derivation cohort (n = 16,260) was used to determine incidence and survival and develop a risk prediction model. The prediction model was validated in two cohorts (n = 8,495). The primary outcome was 10-year risk of pneumonia hospitalization. Results: The crude and age-adjusted incidences of pneumonia were 6.71 and 9.43 cases/1,000 person-years (10-year risk was 6.15%). The 30-day and 1-year mortality were 16.5% and 31.5%. Although age was the most important risk factor (range of crude incidence rates, 1.69-39.13 cases/1,000 person-years for each 5-year increment from 45-85 years), 38% of pneumonia cases occurred in adults < 65 years of age. The 30-day and 1-year mortality were 12.5% and 25.7% in those < 65 years of age. Although most comorbidities were associated with higher risk of pneumonia, reduced lung function was the most important risk factor (relative risk = 6.61 for severe reduction based on FEV1 by spirometry). A clinical risk prediction model based on age, smoking, and lung function predicted 10-year risk (area under curve [AUC] = 0.77 and Hosmer-Lemeshow [HL] C statistic = 0.12). Model discrimination and calibration were similar in the internal validation cohort (AUC = 0.77; HL C statistic, 0.65) but lower in the external validation cohort (AUC = 0.62; HL C statistic, 0.45). The model also calibrated well in blacks and younger adults. C-reactive protein and IL-6 were associated with higher pneumonia risk but did not improve model performance. Conclusions: Pneumonia hospitalization is common and associated with high mortality, even in younger healthy adults. Long-term risk of pneumonia can be predicted in community-dwelling adults with a simple clinical risk prediction model. PMID:23744106

  17. Climate change and the eco-hydrology of fire: Will area burned increase in a warming western USA?

    Treesearch

    Donald McKenzie; Jeremy S. Littell

    2017-01-01

    Wildfire area is predicted to increase with global warming. Empirical statistical models and process-based simulations agree almost universally. The key relationship for this unanimity, observed at multiple spatial and temporal scales, is between drought and fire. Predictive models often focus on ecosystems in which this relationship appears to be particularly strong,...

  18. Can Statistical Modeling Increase Annual Fund Performance? An Experiment at the University of Maryland, College Park.

    ERIC Educational Resources Information Center

    Porter, Stephen R.

    Annual funds face pressures to contact all alumni to maximize participation, but these efforts are costly. This paper uses a logistic regression model to predict likely donors among alumni from the College of Arts & Humanities at the University of Maryland, College Park. Alumni were grouped according to their predicted probability of donating…

  19. Within tree variation of lignin, extractives, and microfibril angle coupled with the theoretical and near infrared modeling of microfibril angle

    Treesearch

    Brian K. Via; chi L. So; Leslie H. Groom; Todd F. Shupe; michael Stine; Jan Wikaira

    2007-01-01

    A theoretical model was built predicting the relationship between microfibril angle and lignin content at the Angstrom (A) level. Both theoretical and statistical examination of experimental data supports a square root transformation of lignin to predict microfibril angle. The experimental material used came from 10 longleaf pine (Pinus palustris)...

  20. Simulation skill of APCC set of global climate models for Asian summer monsoon rainfall variability

    NASA Astrophysics Data System (ADS)

    Singh, U. K.; Singh, G. P.; Singh, Vikas

    2015-04-01

    The performance of 11 Asia-Pacific Economic Cooperation Climate Center (APCC) global climate models (coupled and uncoupled both) in simulating the seasonal summer (June-August) monsoon rainfall variability over Asia (especially over India and East Asia) has been evaluated in detail using hind-cast data (3 months advance) generated from APCC which provides the regional climate information product services based on multi-model ensemble dynamical seasonal prediction systems. The skill of each global climate model over Asia was tested separately in detail for the period of 21 years (1983-2003), and simulated Asian summer monsoon rainfall (ASMR) has been verified using various statistical measures for Indian and East Asian land masses separately. The analysis found a large variation in spatial ASMR simulated with uncoupled model compared to coupled models (like Predictive Ocean Atmosphere Model for Australia, National Centers for Environmental Prediction and Japan Meteorological Agency). The simulated ASMR in coupled model was closer to Climate Prediction Centre Merged Analysis of Precipitation (CMAP) compared to uncoupled models although the amount of ASMR was underestimated in both models. Analysis also found a high spread in simulated ASMR among the ensemble members (suggesting that the model's performance is highly dependent on its initial conditions). The correlation analysis between sea surface temperature (SST) and ASMR shows that that the coupled models are strongly associated with ASMR compared to the uncoupled models (suggesting that air-sea interaction is well cared in coupled models). The analysis of rainfall using various statistical measures suggests that the multi-model ensemble (MME) performed better compared to individual model and also separate study indicate that Indian and East Asian land masses are more useful compared to Asia monsoon rainfall as a whole. The results of various statistical measures like skill of multi-model ensemble, large spread among the ensemble members of individual model, strong teleconnection (correlation analysis) with SST, coefficient of variation, inter-annual variability, analysis of Taylor diagram, etc. suggest that there is a need to improve coupled model instead of uncoupled model for the development of a better dynamical seasonal forecast system.

Top